Updates documentation with details on the Consul operator actions.
This commit is contained in:
parent
6be1e07fec
commit
c063a1a8d0
|
@ -8,7 +8,7 @@ description: >
|
|||
|
||||
# Operator HTTP Endpoint
|
||||
|
||||
The Operator endpoints provide cluster-level tools for Consul operators, such
|
||||
The Operator endpoint provides cluster-level tools for Consul operators, such
|
||||
as interacting with the Raft subsystem. This was added in Consul 0.7.
|
||||
|
||||
~> Use this interface with extreme caution, as improper use could lead to a Consul
|
||||
|
@ -40,9 +40,93 @@ The Raft configuration endpoint supports the `GET` method.
|
|||
|
||||
#### GET Method
|
||||
|
||||
When using the `GET` method, the request will be forwarded to the cluster
|
||||
leader to retrieve its latest Raft peer configuration.
|
||||
|
||||
If the cluster doesn't currently have a leader an error will be returned. You
|
||||
can use the "?stale" query parameter to read the Raft configuration from any
|
||||
of the Consul servers.
|
||||
|
||||
By default, the datacenter of the agent is queried; however, the `dc` can be
|
||||
provided using the "?dc=" query parameter.
|
||||
|
||||
If ACLs are enabled, the client will need to supply an ACL Token with
|
||||
[`operator`](/docs/internals/acl.html#operator) read privileges.
|
||||
|
||||
A JSON body is returned that looks like this:
|
||||
|
||||
```javascript
|
||||
{
|
||||
"Servers": [
|
||||
{
|
||||
"ID": "127.0.0.1:8300",
|
||||
"Node": "alice",
|
||||
"Address": "127.0.0.1:8300",
|
||||
"Leader": true,
|
||||
"Voter": true
|
||||
},
|
||||
{
|
||||
"ID": "127.0.0.2:8300",
|
||||
"Node": "bob",
|
||||
"Address": "127.0.0.2:8300",
|
||||
"Leader": false,
|
||||
"Voter": true
|
||||
},
|
||||
{
|
||||
"ID": "127.0.0.3:8300",
|
||||
"Node": "carol",
|
||||
"Address": "127.0.0.3:8300",
|
||||
"Leader": false,
|
||||
"Voter": true
|
||||
}
|
||||
],
|
||||
"Index": 22
|
||||
}
|
||||
```
|
||||
|
||||
The `Servers` array has information about the servers in the Raft peer
|
||||
configuration:
|
||||
|
||||
`ID` is the ID of the server. This is the same as the `Address` in Consul 0.7
|
||||
but may be upgraded to a GUID in a future version of Consul.
|
||||
|
||||
`Node` is the node name of the server, as known to Consul, or "(unknown)" if
|
||||
the node is stale and not known.
|
||||
|
||||
`Address` is the IP:port for the server.
|
||||
|
||||
`Leader` is either "true" or "false" depending on the server's role in the
|
||||
Raft configuration.
|
||||
|
||||
`Voter` is "true" or "false", indicating if the server has a vote in the Raft
|
||||
configuration. Future versions of Consul may add support for non-voting servers.
|
||||
|
||||
The `Index` value is the Raft corresponding to this configuration. Note that
|
||||
the latest configuration may not yet be committed if changes are in flight.
|
||||
|
||||
### <a name="raft-peer"></a> /v1/operator/raft/peer
|
||||
|
||||
The Raft peer endpoint supports the `DELETE` method.
|
||||
|
||||
#### DELETE Method
|
||||
|
||||
Using the `DELETE` method, this endpoint will remove the Consul server with
|
||||
given address from the Raft configuration.
|
||||
|
||||
There are rare cases where a peer may be left behind in the Raft configuration
|
||||
even though the server is no longer present and known to the cluster. This
|
||||
endpoint can be used to remove the failed server so that it is no longer
|
||||
affects the Raft quorum.
|
||||
|
||||
An "?address=" query parameter is required and should be set to the
|
||||
"IP:port" for the server to remove. The port number is usually 8300, unless
|
||||
configured otherwise. Nothing is required in the body of the request.
|
||||
|
||||
By default, the datacenter of the agent is targeted; however, the `dc` can be
|
||||
provided using the "?dc=" query parameter.
|
||||
|
||||
If ACLs are enabled, the client will need to supply an ACL Token with
|
||||
[`operator`](/docs/internals/acl.html#operator) write privileges.
|
||||
|
||||
The return code will indicate success or failure.
|
||||
|
||||
|
|
|
@ -49,13 +49,14 @@ The `raft` subcommand is used to view and modify Consul's Raft configuration.
|
|||
Two actions are available, as detailed in this section.
|
||||
|
||||
<a name="raft-list-peers"></a>
|
||||
`raft -list-peers -stale=[true|false]`
|
||||
|
||||
#### Display Peer Configuration
|
||||
This action displays the current Raft peer configuration.
|
||||
|
||||
The `-stale` argument defaults to "false" which means the leader provides the
|
||||
result. If the cluster is in an outage state without a leader, you may need
|
||||
to set `-stale` to "true" to get the configuration from a non-leader server.
|
||||
Usage: `raft -list-peers -stale=[true|false]`
|
||||
|
||||
* `-stale` - Optional and defaults to "false" which means the leader provides
|
||||
the result. If the cluster is in an outage state without a leader, you may need
|
||||
to set this to "true" to get the configuration from a non-leader server.
|
||||
|
||||
The output looks like this:
|
||||
|
||||
|
@ -66,35 +67,36 @@ bob 127.0.0.2:8300 127.0.0.2:8300 leader true
|
|||
carol 127.0.0.3:8300 127.0.0.3:8300 follower true
|
||||
```
|
||||
|
||||
* `Node` is the node name of the server, as known to Consul, or "(unknown)" if
|
||||
the node is stale at not known.
|
||||
`Node` is the node name of the server, as known to Consul, or "(unknown)" if
|
||||
the node is stale and not known.
|
||||
|
||||
* `ID` is the ID of the server. This is the same as the `Address` in Consul 0.7
|
||||
but may be upgraded to a GUID in a future version of Consul.
|
||||
`ID` is the ID of the server. This is the same as the `Address` in Consul 0.7
|
||||
but may be upgraded to a GUID in a future version of Consul.
|
||||
|
||||
* `Address` is the IP:port for the server.
|
||||
`Address` is the IP:port for the server.
|
||||
|
||||
* `State` is either "follower" or "leader" depending on the server's role in the
|
||||
Raft configuration.
|
||||
`State` is either "follower" or "leader" depending on the server's role in the
|
||||
Raft configuration.
|
||||
|
||||
* `Voter` is "true" or "false", indicating if the server has a vote in the Raft
|
||||
configuration. Future versions of Consul may add support for non-voting
|
||||
servers.
|
||||
`Voter` is "true" or "false", indicating if the server has a vote in the Raft
|
||||
configuration. Future versions of Consul may add support for non-voting servers.
|
||||
|
||||
<a name="raft-remove-peer"></a>
|
||||
`raft -remove-peer -address="IP:port"`
|
||||
#### Remove a Peer
|
||||
This command removes Consul server with given address from the Raft configuration.
|
||||
|
||||
This command removes Consul server with given -address from the Raft
|
||||
configuration.
|
||||
|
||||
The `-address` argument is required and is the "IP:port" for the server to
|
||||
remove. The port number is usually 8300, unless configured otherwise.
|
||||
|
||||
There are rare cases where a peer may be left behind in the Raft quorum even
|
||||
though the server is no longer present and known to the cluster. This command
|
||||
There are rare cases where a peer may be left behind in the Raft configuration
|
||||
even though the server is no longer present and known to the cluster. This command
|
||||
can be used to remove the failed server so that it is no longer affects the
|
||||
Raft quorum. If the server still shows in the output of the
|
||||
[`consul members`](/docs/commands/members.html) command, it is preferable to
|
||||
clean up by simply running
|
||||
[`consul force-leave`](http://localhost:4567/docs/commands/force-leave.html)
|
||||
instead of this command.
|
||||
|
||||
Usage: `raft -remove-peer -address="IP:port"`
|
||||
|
||||
* `-address` - "IP:port" for the server to remove. The port number is usually
|
||||
8300, unless configured otherwise.
|
||||
|
||||
The return code will indicate success or failure.
|
||||
|
|
|
@ -38,20 +38,72 @@ comes online as agents perform [anti-entropy](/docs/internals/anti-entropy.html)
|
|||
## Failure of a Server in a Multi-Server Cluster
|
||||
|
||||
If you think the failed server is recoverable, the easiest option is to bring
|
||||
it back online and have it rejoin the cluster, returning the cluster to a fully
|
||||
healthy state. Similarly, even if you need to rebuild a new Consul server to
|
||||
replace the failed node, you may wish to do that immediately. Keep in mind that
|
||||
the rebuilt server needs to have the same IP as the failed server. Again, once
|
||||
this server is online, the cluster will return to a fully healthy state.
|
||||
it back online and have it rejoin the cluster with the same IP address, returning
|
||||
the cluster to a fully healthy state. Similarly, even if you need to rebuild a
|
||||
new Consul server to replace the failed node, you may wish to do that immediately.
|
||||
Keep in mind that the rebuilt server needs to have the same IP address as the failed
|
||||
server. Again, once this server is online and has rejoined, the cluster will return
|
||||
to a fully healthy state.
|
||||
|
||||
Both of these strategies involve a potentially lengthy time to reboot or rebuild
|
||||
a failed server. If this is impractical or if building a new server with the same
|
||||
IP isn't an option, you need to remove the failed server. Usually, you can issue
|
||||
a [`force-leave`](/docs/commands/force-leave.html) command to remove the failed
|
||||
a [`consul force-leave`](/docs/commands/force-leave.html) command to remove the failed
|
||||
server if it's still a member of the cluster.
|
||||
|
||||
If the `force-leave` isn't able to remove the server, you can remove it manually
|
||||
using the `raft/peers.json` recovery file on all remaining servers.
|
||||
If [`consul force-leave`](/docs/commands/force-leave.html) isn't able to remove the
|
||||
server, you have two methods available to remove it, depending on your version of Consul:
|
||||
|
||||
* In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator.html#raft-remove-peer)
|
||||
command to remove the stale peer server on the fly with no downtime.
|
||||
|
||||
* In versions of Consul prior to 0.7, you can manually remove the stale peer
|
||||
server using the `raft/peers.json` recovery file on all remaining servers. See
|
||||
the [section below](#peers.json) for details on this procedure. This process
|
||||
requires a Consul downtime to complete.
|
||||
|
||||
In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator.html#raft-list-peers)
|
||||
command to inspect the Raft configuration:
|
||||
|
||||
```
|
||||
$ consul operator raft -list-peers
|
||||
Node ID Address State Voter
|
||||
alice 10.0.1.8:8300 10.0.1.8:8300 follower true
|
||||
bob 10.0.1.6:8300 10.0.1.6:8300 leader true
|
||||
carol 10.0.1.7:8300 10.0.1.7:8300 follower true
|
||||
```
|
||||
|
||||
## Failure of Multiple Servers in a Multi-Server Cluster
|
||||
|
||||
In the event that multiple servers are lost, causing a loss of quorum and a
|
||||
complete outage, partial recovery is possible using data on the remaining
|
||||
servers in the cluster. There may be data loss in this situation because multiple
|
||||
servers were lost, so information about what's committed could be incomplete.
|
||||
The recovery process implicitly commits all outstanding Raft log entries, so
|
||||
it's also possible to commit data that was uncommitted before the failure.
|
||||
|
||||
See the [section below](#peers.json) for details of the recovery procedure. You
|
||||
simply include just the remaining servers in the `raft/peers.json` recovery file.
|
||||
The cluster should be able to elect a leader once the remaining servers are all
|
||||
restarted with an identical `raft/peers.json` configuration.
|
||||
|
||||
Any new servers you introduce later can be fresh with totally clean data directories
|
||||
and joined using Consul's `join` command.
|
||||
|
||||
In extreme cases, it should be possible to recover with just a single remaining
|
||||
server by starting that single server with itself as the only peer in the
|
||||
`raft/peers.json` recovery file.
|
||||
|
||||
Note that prior to Consul 0.7 it wasn't always possible to recover from certain
|
||||
types of outages with `raft/peers.json` because this was ingested before any Raft
|
||||
log entries were played back. In Consul 0.7 and later, the `raft/peers.json`
|
||||
recovery file is final, and a snapshot is taken after it is ingested, so you are
|
||||
guaranteed to start with your recovered configuration. This does implicitly commit
|
||||
all Raft log entries, so should only be used to recover from an outage, but it
|
||||
should allow recovery from any situation where there's some cluster data available.
|
||||
|
||||
<a name="peers.json"></a>
|
||||
## Manual Recovery Using peers.json
|
||||
|
||||
To begin, stop all remaining servers. You can attempt a graceful leave,
|
||||
but it will not work in most cases. Do not worry if the leave exits with an
|
||||
|
@ -70,11 +122,6 @@ implicitly committed, so this should only be used after an outage where no
|
|||
other option is available to recover a lost server. Make sure you don't have
|
||||
any automated processes that will put the peers file in place on a periodic basis,
|
||||
for example.
|
||||
<br>
|
||||
<br>
|
||||
When the final version of Consul 0.7 ships, it should include a command to
|
||||
remove a dead peer without having to stop servers and edit the `raft/peers.json`
|
||||
recovery file.
|
||||
|
||||
The next step is to go to the [`-data-dir`](/docs/agent/options.html#_data_dir)
|
||||
of each Consul server. Inside that directory, there will be a `raft/`
|
||||
|
@ -83,9 +130,9 @@ something like:
|
|||
|
||||
```javascript
|
||||
[
|
||||
"10.0.1.8:8300",
|
||||
"10.0.1.6:8300",
|
||||
"10.0.1.7:8300"
|
||||
"10.0.1.8:8300",
|
||||
"10.0.1.6:8300",
|
||||
"10.0.1.7:8300"
|
||||
]
|
||||
```
|
||||
|
||||
|
@ -126,56 +173,13 @@ nodes should claim leadership and emit a log like:
|
|||
[INFO] consul: cluster leadership acquired
|
||||
```
|
||||
|
||||
Additionally, the [`info`](/docs/commands/info.html) command can be a useful
|
||||
debugging tool:
|
||||
In Consul 0.7 and later, you can use the [`consul operator`](/docs/commands/operator.html#raft-list-peers)
|
||||
command to inspect the Raft configuration:
|
||||
|
||||
```text
|
||||
$ consul info
|
||||
...
|
||||
raft:
|
||||
applied_index = 47244
|
||||
commit_index = 47244
|
||||
fsm_pending = 0
|
||||
last_log_index = 47244
|
||||
last_log_term = 21
|
||||
last_snapshot_index = 40966
|
||||
last_snapshot_term = 20
|
||||
num_peers = 2
|
||||
state = Leader
|
||||
term = 21
|
||||
...
|
||||
```
|
||||
|
||||
You should verify that one server claims to be the `Leader` and all the
|
||||
others should be in the `Follower` state. All the nodes should agree on the
|
||||
peer count as well. This count is (N-1), since a server does not count itself
|
||||
as a peer.
|
||||
|
||||
## Failure of Multiple Servers in a Multi-Server Cluster
|
||||
|
||||
In the event that multiple servers are lost, causing a loss of quorum and a
|
||||
complete outage, partial recovery is possible using data on the remaining
|
||||
servers in the cluster. There may be data loss in this situation because multiple
|
||||
servers were lost, so information about what's committed could be incomplete.
|
||||
The recovery process implicitly commits all outstanding Raft log entries, so
|
||||
it's also possible to commit data that was uncommitted before the failure.
|
||||
|
||||
The procedure is the same as for the single-server case above; you simply include
|
||||
just the remaining servers in the `raft/peers.json` recovery file. The cluster
|
||||
should be able to elect a leader once the remaining servers are all restarted with
|
||||
an identical `raft/peers.json` configuration.
|
||||
|
||||
Any new servers you introduce later can be fresh with totally clean data directories
|
||||
and joined using Consul's `join` command.
|
||||
|
||||
In extreme cases, it should be possible to recover with just a single remaining
|
||||
server by starting that single server with itself as the only peer in the
|
||||
`raft/peers.json` recovery file.
|
||||
|
||||
Note that prior to Consul 0.7 it wasn't always possible to recover from certain
|
||||
types of outages with `raft/peers.json` because this was ingested before any Raft
|
||||
log entries were played back. In Consul 0.7 and later, the `raft/peers.json`
|
||||
recovery file is final, and a snapshot is taken after it is ingested, so you are
|
||||
guaranteed to start with your recovered configuration. This does implicitly commit
|
||||
all Raft log entries, so should only be used to recover from an outage, but it
|
||||
should allow recovery from any situation where there's some cluster data available.
|
||||
$ consul operator raft -list-peers
|
||||
Node ID Address State Voter
|
||||
alice 10.0.1.8:8300 10.0.1.8:8300 follower true
|
||||
bob 10.0.1.6:8300 10.0.1.6:8300 leader true
|
||||
carol 10.0.1.7:8300 10.0.1.7:8300 follower true
|
||||
```
|
||||
|
|
|
@ -210,6 +210,9 @@ query "" {
|
|||
|
||||
# Read-only mode for the encryption keyring by default (list only)
|
||||
keyring = "read"
|
||||
|
||||
# Read-only mode for Consul operator interfaces (list only)
|
||||
operator = "read"
|
||||
```
|
||||
|
||||
This is equivalent to the following JSON input:
|
||||
|
@ -248,13 +251,14 @@ This is equivalent to the following JSON input:
|
|||
"policy": "read"
|
||||
}
|
||||
},
|
||||
"keyring": "read"
|
||||
"keyring": "read",
|
||||
"operator": "read"
|
||||
}
|
||||
```
|
||||
|
||||
## Building ACL Policies
|
||||
|
||||
#### Blacklist mode and `consul exec`
|
||||
#### Blacklist Mode and `consul exec`
|
||||
|
||||
If you set [`acl_default_policy`](/docs/agent/options.html#acl_default_policy)
|
||||
to `deny`, the `anonymous` token won't have permission to read the default
|
||||
|
@ -279,7 +283,7 @@ Alternatively, you can, of course, add an explicit
|
|||
[`acl_token`](/docs/agent/options.html#acl_token) to each agent, giving it access
|
||||
to that prefix.
|
||||
|
||||
#### Blacklist mode and Service Discovery
|
||||
#### Blacklist Mode and Service Discovery
|
||||
|
||||
If your [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
|
||||
set to `deny`, the `anonymous` token will be unable to read any service
|
||||
|
@ -327,12 +331,12 @@ event "" {
|
|||
As always, the more secure way to handle user events is to explicitly grant
|
||||
access to each API token based on the events they should be able to fire.
|
||||
|
||||
#### Blacklist mode and Prepared Queries
|
||||
#### Blacklist Mode and Prepared Queries
|
||||
|
||||
After Consul 0.6.3, significant changes were made to ACLs for prepared queries,
|
||||
including a new `query` ACL policy. See [Prepared Query ACLs](#prepared_query_acls) below for more details.
|
||||
|
||||
#### Blacklist mode and Keyring Operations
|
||||
#### Blacklist Mode and Keyring Operations
|
||||
|
||||
Consul 0.6 and later supports securing the encryption keyring operations using
|
||||
ACL's. Encryption is an optional component of the gossip layer. More information
|
||||
|
@ -353,6 +357,28 @@ Encryption keyring operations are sensitive and should be properly secured. It
|
|||
is recommended that instead of configuring a wide-open policy like above, a
|
||||
per-token policy is applied to maximize security.
|
||||
|
||||
<a name="operator"></a>
|
||||
#### Blacklist Mode and Consul Operator Actions
|
||||
|
||||
Consul 0.7 added special Consul operator actions which are protected by a new
|
||||
`operator` ACL policy. The operator actions cover:
|
||||
|
||||
* [Operator HTTP endpoint](/docs/agent/http/operator.html)
|
||||
* [Operator CLI command](/docs/commands/operator.html)
|
||||
|
||||
If your [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
|
||||
set to `deny`, then the `anonymous` token will not have access to Consul operator
|
||||
actions. Granting `read` access allows reading information for diagnostic purposes
|
||||
without making any changes to state. Granting `write` access allows reading
|
||||
information and changing state. Here's an example policy:
|
||||
|
||||
```
|
||||
operator = "write"
|
||||
```
|
||||
|
||||
~> Grant `write` access to operator actions with extreme caution, as improper use
|
||||
could lead to a Consul outage and even loss of data.
|
||||
|
||||
#### Services and Checks with ACLs
|
||||
|
||||
Consul allows configuring ACL policies which may control access to service and
|
||||
|
|
Loading…
Reference in a new issue