Updating documentation for new bootstrap method

This commit is contained in:
Armon Dadgar 2014-07-01 15:02:26 -07:00
parent 020802f7a5
commit 358b473e01
9 changed files with 64 additions and 71 deletions

View file

@ -537,6 +537,7 @@ Options:
-advertise=addr Sets the advertise address to use
-bootstrap Sets server to bootstrap mode
-bind=0.0.0.0 Sets the bind address for cluster communication
-bootstrap-expect=0 Sets server to expect bootstrap mode.
-client=127.0.0.1 Sets the address to bind for client access.
This includes RPC, DNS and HTTP
-config-file=foo Path to a JSON file to read configuration from.
@ -547,7 +548,6 @@ Options:
order.
-data-dir=path Path to a data directory to store agent state
-dc=east-aws Datacenter of the agent
-expect=0 Sets server to expect bootstrap mode.
-join=1.2.3.4 Address of an agent to join at start time.
Can be specified multiple times.
-log-level=info Log level of the agent.

View file

@ -57,8 +57,7 @@ There are several important components that `consul agent` outputs:
* **Server**: This shows if the agent is running in the server or client mode.
Server nodes have the extra burden of participating in the consensus quorum,
storing cluster state, and handling queries. Additionally, a server may be
in "bootstrap" mode. The first server must be in this mode to allow additional
servers to join the cluster. Multiple servers cannot be in bootstrap mode,
in "bootstrap" mode. Multiple servers cannot be in bootstrap mode,
otherwise the cluster state will be inconsistent.
* **Client Addr**: This is the address used for client interfaces to the agent.

View file

@ -35,11 +35,16 @@ The options below are all specified on the command-line.
as other nodes will treat the non-routability as a failure.
* `-bootstrap` - This flag is used to control if a server is in "bootstrap" mode. It is important that
no more than one server *per* datacenter be running in this mode. The initial server **must** be in bootstrap
mode. Technically, a server in bootstrap mode is allowed to self-elect as the Raft leader. It is important
that only a single node is in this mode, because otherwise consistency cannot be guaranteed if multiple
nodes are able to self-elect. Once there are multiple servers in a datacenter, it is generally a good idea
to disable bootstrap mode on all of them.
no more than one server *per* datacenter be running in this mode. Technically, a server in bootstrap mode
is allowed to self-elect as the Raft leader. It is important that only a single node is in this mode,
because otherwise consistency cannot be guaranteed if multiple nodes are able to self-elect.
It is not recommended to use this flag after a cluster has been bootstrapped.
* `-bootstrap-expect` - This flag provides the number of expected servers in the datacenter.
Either this value should not be provided, or the value must agree with other servers in
the cluster. When provided, Consul waits until the specified number of servers are
available, and then bootstraps the cluster. This allows an initial leader to be elected
automatically. This cannot be used in conjunction with the `-bootstrap` flag.
* `-bind` - The address that should be bound to for internal cluster communications.
This is an IP address that should be reachable by all other nodes in the cluster.
@ -148,6 +153,8 @@ definitions support being updated during a reload.
* `bootstrap` - Equivalent to the `-bootstrap` command-line flag.
* `bootstrap_expect` - Equivalent to the `-bootstrap-expect` command-line flag.
* `bind_addr` - Equivalent to the `-bind` command-line flag.
* `client_addr` - Equivalent to the `-client` command-line flag.

View file

@ -6,74 +6,62 @@ sidebar_current: "docs-guides-bootstrapping"
# Bootstrapping a Datacenter
When deploying Consul to a datacenter for the first time, there is an initial bootstrapping that
must be done. Generally, the first nodes that are started are the server nodes. Remember that an
agent can run in both client and server mode. Server nodes are responsible for running
Before a Consul cluster can begin to service requests, it is necessary for a server node to
be elected leader. For this reason, the first nodes that are started are generally the server nodes.
Remember that an agent can run in both client and server mode. Server nodes are responsible for running
the [consensus protocol](/docs/internals/consensus.html), and storing the cluster state.
The client nodes are mostly stateless and rely on the server nodes, so they can be started easily.
The first server that is deployed in a new datacenter must provide the `-bootstrap` [configuration
option](/docs/agent/options.html). This option allows the server to assert leadership of the cluster
without agreement from any other server. This is necessary because at this point, there are no other
servers running in the datacenter! Lets call this first server `Node A`. When starting `Node A` something
like the following will be logged:
The recommended way to bootstrap is to use the `-bootstrap-expect` [configuration
option](/docs/agent/options.html). This options informs Consul of the expected number of
server nodes, and automatically bootstraps when that many servers are available. To prevent
inconsistencies and split-brain situations, all servers should specify the same value for `-bootstrap-expect`
or specify no value at all. Any server that does not specify a value will not attempt to
bootstrap the cluster.
2014/02/22 19:23:32 [INFO] consul: cluster leadership acquired
There is a [deployment table](/docs/internals/consensus.html#toc_3) that covers various options,
but it is recommended to have 3 or 5 total servers per data center. A single server deployment is _**highly**_
discouraged as data loss is inevitable in a failure scenario.
Once `Node A` is running, we can start the next set of servers. There is a [deployment table](/docs/internals/consensus.html#toc_3)
that covers various options, but it is recommended to have 3 or 5 total servers per data center.
A single server deployment is _**highly**_ discouraged as data loss is inevitable in a failure scenario.
We start the next servers **without** specifying `-bootstrap`. This is critical, since only one server
should ever be running in bootstrap mode*. Once `Node B` and `Node C` are started, you should see a
message to the effect of:
Suppose we are starting a 3 server cluster, we can start `Node A`, `Node B` and `Node C` providing
the `-bootstrap-expect 3` flag. Once the nodes are started, you should see a message to the effect of:
[WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
This indicates that the node is not in bootstrap mode, and it will not elect itself as leader.
We can now join these machines together. Since a join operation is symmetric it does not matter
which node initiates it. From `Node B` and `Node C` you can do the following:
This indicates that the nodes are expecting 2 peers, but none are known yet. The servers will not elect
themselves leader to prevent a split-brain. We can now join these machines together. Since a join operation
is symmetric it does not matter which node initiates it. From any node you can do the following:
$ consul join <Node A Address>
Successfully joined cluster by contacting 1 nodes.
$ consul join <Node A Address> <Node B Address> <Node C Address>
Successfully joined cluster by contacting 3 nodes.
Alternatively, from `Node A` you can do the following:
Once the join is successful, one of the nodes will output something like:
$ consul join <Node B Address> <Node C Address>
Successfully joined cluster by contacting 2 nodes.
[INFO] consul: adding server foo (Addr: 127.0.0.2:8300) (DC: dc1)
[INFO] consul: adding server bar (Addr: 127.0.0.1:8300) (DC: dc1)
[INFO] consul: Attempting bootstrap with nodes: [127.0.0.3:8300 127.0.0.2:8300 127.0.0.1:8300]
...
[INFO] consul: cluster leadership acquired
Once the join is successful, `Node A` should output something like:
[INFO] raft: Added peer 127.0.0.2:8300, starting replication
....
[INFO] raft: Added peer 127.0.0.3:8300, starting replication
Another good check is to run the `consul info` command. When run on `Node A`, you can
As a sanity check, the `consul info` command is a useful tool. It can be used to
verify `raft.num_peers` is now 2, and you can view the latest log index under `raft.last_log_index`.
When running `consul info` on `Node B` and `Node C` you should see `raft.last_log_index`
When running `consul info` on the followers, you should see `raft.last_log_index`
converge to the same value as the leader begins replication. That value represents the last
log entry that has been stored on disk.
This indicates that `Node B` and `Node C` have been added as peers. At this point,
all three nodes see each other as peers, `Node A` is the leader, and replication
should be working.
The final step is to remove the `-bootstrap` flag. This is important since we don't
want the node to be able to make unilateral decisions in the case of a failure of the
other two nodes. To do this, we send a `SIGINT` to `Node A` to allow it to perform
a graceful leave. Then we remove the `-bootstrap` flag and restart the node. The node
will need to rejoin the cluster, since the graceful exit leaves the cluster. Any transactions
that took place while `Node A` was offline will be replicated and the node will catch up.
Now that the servers are all started and replicating to each other, all the remaining
clients can be joined. Clients are much easier, as they can be started and perform
a `join` against any existing node. All nodes participate in a gossip protocol to
perform basic discovery, so clients will automatically find the servers and register
themselves.
<div class="alert alert-block alert-info">
* If you accidentally start another server with the flag set, do not fret.
Shutdown the node, and remove the `raft/` folder from the data directory. This will
remove the bad state caused by being in `-bootstrap` mode. Then restart the
node and join the cluster normally.
</div>
It should be noted that it is not strictly necessary to start the server nodes
before the clients, however most operations will fail until the servers are available.
## Manual Bootstrapping
In versions of Consul previous to 0.4, bootstrapping was a more manual process.
For a guide on using the `-bootstrap` flag directly, see the [manual bootstrapping guide](/docs/guides/manual-bootstrap.html).
This is not recommended, as it is more error prone than automatic bootstrapping.

View file

@ -18,7 +18,7 @@ add or remove a server <a href="/docs/guides/servers.html">see this page</a>.
</div>
If you had only a single server and it has failed, simply restart it.
Note that a single server configuration requires the `-bootstrap` flag.
Note that a single server configuration requires the `-bootstrap` or `-bootstrap-expect 1` flag.
If that server cannot be recovered, you need to bring up a new server.
See the [bootstrapping guide](/docs/guides/bootstrapping.html). Data loss
is inevitable, since data was not replicated to any other servers. This

View file

@ -18,8 +18,7 @@ to first add the new nodes and then remove the old nodes.
## Adding New Servers
Adding new servers is generally straightforward. After the initial server, no further
servers should ever be started with the `-bootstrap` flag. Instead, simply start the new
Adding new servers is generally straightforward. Simply start the new
server with the `-server` flag. At this point, the server will not be a member of
any cluster, and should emit something like:

View file

@ -20,7 +20,8 @@ will be part of the cluster.
For simplicity, we'll run a single Consul agent in server mode right now:
```
$ consul agent -server -bootstrap -data-dir /tmp/consul
$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul
==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting Consul agent...
@ -67,15 +68,13 @@ joining clusters in the next section.
```
$ consul members
Armons-MacBook-Air 10.1.10.38:8301 alive role=consul,dc=dc1,vsn=1,vsn_min=1,vsn_max=1,port=8300,bootstrap=1
Node Address Status Type Build Protocol
Armons-MacBook-Air 10.1.10.38:8301 alive server 0.3.0 2
```
The output shows our own node, the address it is running on, its
health state, and some metadata associated with the node. Some important
metadata keys to recognize are the `role` and `dc` keys. These tell you
the service name and the datacenter that member is within. These can be
used to lookup nodes and services using the DNS interface, which is covered
shortly.
health state, its role in the cluster, as well as some versioning information.
Additional metadata can be viewed by providing the `-detailed` flag.
The output from the `members` command is generated based on the
[gossip protocol](/docs/internals/gossip.html) and is eventually consistent.

View file

@ -34,7 +34,7 @@ will act as our server in this cluster. We're still not making a cluster
of servers.
```
$ consul agent -server -bootstrap -data-dir /tmp/consul \
$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul \
-node=agent-one -bind=172.20.20.10
...
```
@ -70,9 +70,10 @@ run `consul members` against each agent, you'll see that both agents now
know about each other:
```
$ consul members
agent-one 172.20.20.10:8301 alive role=consul,dc=dc1,vsn=1,vsn_min=1,vsn_max=1,port=8300,bootstrap=1
agent-two 172.20.20.11:8301 alive role=node,dc=dc1,vsn=1,vsn_min=1,vsn_max=1
$ consul members -detailed
Node Address Status Tags
agent-one 172.20.20.10:8301 alive role=consul,dc=dc1,vsn=2,vsn_min=1,vsn_max=2,port=8300,bootstrap=1
agent-two 172.20.20.11:8301 alive role=node,dc=dc1,vsn=2,vsn_min=1,vsn_max=2
```
<div class="alert alert-block alert-info">

View file

@ -43,7 +43,7 @@ $ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80}}' \
Now, restart the agent we're running, providing the configuration directory:
```
$ consul agent -server -bootstrap -data-dir /tmp/consul -config-dir /etc/consul.d
$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -config-dir /etc/consul.d
==> Starting Consul agent...
...
[INFO] agent: Synced service 'web'