From 358b473e01ed4ec6c3eb7c467ac7751657d859fa Mon Sep 17 00:00:00 2001 From: Armon Dadgar Date: Tue, 1 Jul 2014 15:02:26 -0700 Subject: [PATCH] Updating documentation for new bootstrap method --- command/agent/command.go | 2 +- .../source/docs/agent/basics.html.markdown | 3 +- .../source/docs/agent/options.html.markdown | 17 ++-- .../docs/guides/bootstrapping.html.markdown | 84 ++++++++----------- .../source/docs/guides/outage.html.markdown | 2 +- .../source/docs/guides/servers.html.markdown | 3 +- .../intro/getting-started/agent.html.markdown | 13 ++- .../intro/getting-started/join.html.markdown | 9 +- .../getting-started/services.html.markdown | 2 +- 9 files changed, 64 insertions(+), 71 deletions(-) diff --git a/command/agent/command.go b/command/agent/command.go index f1f7b0d0b..6410383ea 100644 --- a/command/agent/command.go +++ b/command/agent/command.go @@ -537,6 +537,7 @@ Options: -advertise=addr Sets the advertise address to use -bootstrap Sets server to bootstrap mode -bind=0.0.0.0 Sets the bind address for cluster communication + -bootstrap-expect=0 Sets server to expect bootstrap mode. -client=127.0.0.1 Sets the address to bind for client access. This includes RPC, DNS and HTTP -config-file=foo Path to a JSON file to read configuration from. @@ -547,7 +548,6 @@ Options: order. -data-dir=path Path to a data directory to store agent state -dc=east-aws Datacenter of the agent - -expect=0 Sets server to expect bootstrap mode. -join=1.2.3.4 Address of an agent to join at start time. Can be specified multiple times. -log-level=info Log level of the agent. diff --git a/website/source/docs/agent/basics.html.markdown b/website/source/docs/agent/basics.html.markdown index ba05f8a19..a63d90a04 100644 --- a/website/source/docs/agent/basics.html.markdown +++ b/website/source/docs/agent/basics.html.markdown @@ -57,8 +57,7 @@ There are several important components that `consul agent` outputs: * **Server**: This shows if the agent is running in the server or client mode. Server nodes have the extra burden of participating in the consensus quorum, storing cluster state, and handling queries. Additionally, a server may be - in "bootstrap" mode. The first server must be in this mode to allow additional - servers to join the cluster. Multiple servers cannot be in bootstrap mode, + in "bootstrap" mode. Multiple servers cannot be in bootstrap mode, otherwise the cluster state will be inconsistent. * **Client Addr**: This is the address used for client interfaces to the agent. diff --git a/website/source/docs/agent/options.html.markdown b/website/source/docs/agent/options.html.markdown index c134b73a5..bfafab5e8 100644 --- a/website/source/docs/agent/options.html.markdown +++ b/website/source/docs/agent/options.html.markdown @@ -35,11 +35,16 @@ The options below are all specified on the command-line. as other nodes will treat the non-routability as a failure. * `-bootstrap` - This flag is used to control if a server is in "bootstrap" mode. It is important that - no more than one server *per* datacenter be running in this mode. The initial server **must** be in bootstrap - mode. Technically, a server in bootstrap mode is allowed to self-elect as the Raft leader. It is important - that only a single node is in this mode, because otherwise consistency cannot be guaranteed if multiple - nodes are able to self-elect. Once there are multiple servers in a datacenter, it is generally a good idea - to disable bootstrap mode on all of them. + no more than one server *per* datacenter be running in this mode. Technically, a server in bootstrap mode + is allowed to self-elect as the Raft leader. It is important that only a single node is in this mode, + because otherwise consistency cannot be guaranteed if multiple nodes are able to self-elect. + It is not recommended to use this flag after a cluster has been bootstrapped. + +* `-bootstrap-expect` - This flag provides the number of expected servers in the datacenter. + Either this value should not be provided, or the value must agree with other servers in + the cluster. When provided, Consul waits until the specified number of servers are + available, and then bootstraps the cluster. This allows an initial leader to be elected + automatically. This cannot be used in conjunction with the `-bootstrap` flag. * `-bind` - The address that should be bound to for internal cluster communications. This is an IP address that should be reachable by all other nodes in the cluster. @@ -148,6 +153,8 @@ definitions support being updated during a reload. * `bootstrap` - Equivalent to the `-bootstrap` command-line flag. +* `bootstrap_expect` - Equivalent to the `-bootstrap-expect` command-line flag. + * `bind_addr` - Equivalent to the `-bind` command-line flag. * `client_addr` - Equivalent to the `-client` command-line flag. diff --git a/website/source/docs/guides/bootstrapping.html.markdown b/website/source/docs/guides/bootstrapping.html.markdown index 6339e59cc..472a949f4 100644 --- a/website/source/docs/guides/bootstrapping.html.markdown +++ b/website/source/docs/guides/bootstrapping.html.markdown @@ -6,74 +6,62 @@ sidebar_current: "docs-guides-bootstrapping" # Bootstrapping a Datacenter -When deploying Consul to a datacenter for the first time, there is an initial bootstrapping that -must be done. Generally, the first nodes that are started are the server nodes. Remember that an -agent can run in both client and server mode. Server nodes are responsible for running +Before a Consul cluster can begin to service requests, it is necessary for a server node to +be elected leader. For this reason, the first nodes that are started are generally the server nodes. +Remember that an agent can run in both client and server mode. Server nodes are responsible for running the [consensus protocol](/docs/internals/consensus.html), and storing the cluster state. The client nodes are mostly stateless and rely on the server nodes, so they can be started easily. -The first server that is deployed in a new datacenter must provide the `-bootstrap` [configuration -option](/docs/agent/options.html). This option allows the server to assert leadership of the cluster -without agreement from any other server. This is necessary because at this point, there are no other -servers running in the datacenter! Lets call this first server `Node A`. When starting `Node A` something -like the following will be logged: +The recommended way to bootstrap is to use the `-bootstrap-expect` [configuration +option](/docs/agent/options.html). This options informs Consul of the expected number of +server nodes, and automatically bootstraps when that many servers are available. To prevent +inconsistencies and split-brain situations, all servers should specify the same value for `-bootstrap-expect` +or specify no value at all. Any server that does not specify a value will not attempt to +bootstrap the cluster. - 2014/02/22 19:23:32 [INFO] consul: cluster leadership acquired +There is a [deployment table](/docs/internals/consensus.html#toc_3) that covers various options, +but it is recommended to have 3 or 5 total servers per data center. A single server deployment is _**highly**_ +discouraged as data loss is inevitable in a failure scenario. -Once `Node A` is running, we can start the next set of servers. There is a [deployment table](/docs/internals/consensus.html#toc_3) -that covers various options, but it is recommended to have 3 or 5 total servers per data center. -A single server deployment is _**highly**_ discouraged as data loss is inevitable in a failure scenario. -We start the next servers **without** specifying `-bootstrap`. This is critical, since only one server -should ever be running in bootstrap mode*. Once `Node B` and `Node C` are started, you should see a -message to the effect of: +Suppose we are starting a 3 server cluster, we can start `Node A`, `Node B` and `Node C` providing +the `-bootstrap-expect 3` flag. Once the nodes are started, you should see a message to the effect of: [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election. -This indicates that the node is not in bootstrap mode, and it will not elect itself as leader. -We can now join these machines together. Since a join operation is symmetric it does not matter -which node initiates it. From `Node B` and `Node C` you can do the following: +This indicates that the nodes are expecting 2 peers, but none are known yet. The servers will not elect +themselves leader to prevent a split-brain. We can now join these machines together. Since a join operation +is symmetric it does not matter which node initiates it. From any node you can do the following: - $ consul join - Successfully joined cluster by contacting 1 nodes. + $ consul join + Successfully joined cluster by contacting 3 nodes. -Alternatively, from `Node A` you can do the following: +Once the join is successful, one of the nodes will output something like: - $ consul join - Successfully joined cluster by contacting 2 nodes. + [INFO] consul: adding server foo (Addr: 127.0.0.2:8300) (DC: dc1) + [INFO] consul: adding server bar (Addr: 127.0.0.1:8300) (DC: dc1) + [INFO] consul: Attempting bootstrap with nodes: [127.0.0.3:8300 127.0.0.2:8300 127.0.0.1:8300] + ... + [INFO] consul: cluster leadership acquired -Once the join is successful, `Node A` should output something like: - - [INFO] raft: Added peer 127.0.0.2:8300, starting replication - .... - [INFO] raft: Added peer 127.0.0.3:8300, starting replication - -Another good check is to run the `consul info` command. When run on `Node A`, you can +As a sanity check, the `consul info` command is a useful tool. It can be used to verify `raft.num_peers` is now 2, and you can view the latest log index under `raft.last_log_index`. -When running `consul info` on `Node B` and `Node C` you should see `raft.last_log_index` +When running `consul info` on the followers, you should see `raft.last_log_index` converge to the same value as the leader begins replication. That value represents the last log entry that has been stored on disk. -This indicates that `Node B` and `Node C` have been added as peers. At this point, -all three nodes see each other as peers, `Node A` is the leader, and replication -should be working. - -The final step is to remove the `-bootstrap` flag. This is important since we don't -want the node to be able to make unilateral decisions in the case of a failure of the -other two nodes. To do this, we send a `SIGINT` to `Node A` to allow it to perform -a graceful leave. Then we remove the `-bootstrap` flag and restart the node. The node -will need to rejoin the cluster, since the graceful exit leaves the cluster. Any transactions -that took place while `Node A` was offline will be replicated and the node will catch up. - Now that the servers are all started and replicating to each other, all the remaining clients can be joined. Clients are much easier, as they can be started and perform a `join` against any existing node. All nodes participate in a gossip protocol to perform basic discovery, so clients will automatically find the servers and register themselves. -
-* If you accidentally start another server with the flag set, do not fret. -Shutdown the node, and remove the `raft/` folder from the data directory. This will -remove the bad state caused by being in `-bootstrap` mode. Then restart the -node and join the cluster normally. -
+It should be noted that it is not strictly necessary to start the server nodes +before the clients, however most operations will fail until the servers are available. + +## Manual Bootstrapping + +In versions of Consul previous to 0.4, bootstrapping was a more manual process. +For a guide on using the `-bootstrap` flag directly, see the [manual bootstrapping guide](/docs/guides/manual-bootstrap.html). + +This is not recommended, as it is more error prone than automatic bootstrapping. diff --git a/website/source/docs/guides/outage.html.markdown b/website/source/docs/guides/outage.html.markdown index 13f437917..893cd6694 100644 --- a/website/source/docs/guides/outage.html.markdown +++ b/website/source/docs/guides/outage.html.markdown @@ -18,7 +18,7 @@ add or remove a server see this page. If you had only a single server and it has failed, simply restart it. -Note that a single server configuration requires the `-bootstrap` flag. +Note that a single server configuration requires the `-bootstrap` or `-bootstrap-expect 1` flag. If that server cannot be recovered, you need to bring up a new server. See the [bootstrapping guide](/docs/guides/bootstrapping.html). Data loss is inevitable, since data was not replicated to any other servers. This diff --git a/website/source/docs/guides/servers.html.markdown b/website/source/docs/guides/servers.html.markdown index 64b4583f0..9cf535bed 100644 --- a/website/source/docs/guides/servers.html.markdown +++ b/website/source/docs/guides/servers.html.markdown @@ -18,8 +18,7 @@ to first add the new nodes and then remove the old nodes. ## Adding New Servers -Adding new servers is generally straightforward. After the initial server, no further -servers should ever be started with the `-bootstrap` flag. Instead, simply start the new +Adding new servers is generally straightforward. Simply start the new server with the `-server` flag. At this point, the server will not be a member of any cluster, and should emit something like: diff --git a/website/source/intro/getting-started/agent.html.markdown b/website/source/intro/getting-started/agent.html.markdown index d9709eb16..1a75c7544 100644 --- a/website/source/intro/getting-started/agent.html.markdown +++ b/website/source/intro/getting-started/agent.html.markdown @@ -20,7 +20,8 @@ will be part of the cluster. For simplicity, we'll run a single Consul agent in server mode right now: ``` -$ consul agent -server -bootstrap -data-dir /tmp/consul +$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul +==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode. ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary ==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1 ==> Starting Consul agent... @@ -67,15 +68,13 @@ joining clusters in the next section. ``` $ consul members -Armons-MacBook-Air 10.1.10.38:8301 alive role=consul,dc=dc1,vsn=1,vsn_min=1,vsn_max=1,port=8300,bootstrap=1 +Node Address Status Type Build Protocol +Armons-MacBook-Air 10.1.10.38:8301 alive server 0.3.0 2 ``` The output shows our own node, the address it is running on, its -health state, and some metadata associated with the node. Some important -metadata keys to recognize are the `role` and `dc` keys. These tell you -the service name and the datacenter that member is within. These can be -used to lookup nodes and services using the DNS interface, which is covered -shortly. +health state, its role in the cluster, as well as some versioning information. +Additional metadata can be viewed by providing the `-detailed` flag. The output from the `members` command is generated based on the [gossip protocol](/docs/internals/gossip.html) and is eventually consistent. diff --git a/website/source/intro/getting-started/join.html.markdown b/website/source/intro/getting-started/join.html.markdown index e369b9102..53bc44fa6 100644 --- a/website/source/intro/getting-started/join.html.markdown +++ b/website/source/intro/getting-started/join.html.markdown @@ -34,7 +34,7 @@ will act as our server in this cluster. We're still not making a cluster of servers. ``` -$ consul agent -server -bootstrap -data-dir /tmp/consul \ +$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul \ -node=agent-one -bind=172.20.20.10 ... ``` @@ -70,9 +70,10 @@ run `consul members` against each agent, you'll see that both agents now know about each other: ``` -$ consul members -agent-one 172.20.20.10:8301 alive role=consul,dc=dc1,vsn=1,vsn_min=1,vsn_max=1,port=8300,bootstrap=1 -agent-two 172.20.20.11:8301 alive role=node,dc=dc1,vsn=1,vsn_min=1,vsn_max=1 +$ consul members -detailed +Node Address Status Tags +agent-one 172.20.20.10:8301 alive role=consul,dc=dc1,vsn=2,vsn_min=1,vsn_max=2,port=8300,bootstrap=1 +agent-two 172.20.20.11:8301 alive role=node,dc=dc1,vsn=2,vsn_min=1,vsn_max=2 ```
diff --git a/website/source/intro/getting-started/services.html.markdown b/website/source/intro/getting-started/services.html.markdown index f67cf31d9..8d1329cdb 100644 --- a/website/source/intro/getting-started/services.html.markdown +++ b/website/source/intro/getting-started/services.html.markdown @@ -43,7 +43,7 @@ $ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80}}' \ Now, restart the agent we're running, providing the configuration directory: ``` -$ consul agent -server -bootstrap -data-dir /tmp/consul -config-dir /etc/consul.d +$ consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -config-dir /etc/consul.d ==> Starting Consul agent... ... [INFO] agent: Synced service 'web'