Improve "Integrated Storage" documentation (#12200)
* Improve "Integrated Storage" documentation * add missing markup * add more links to the configuration pages * Improve the Raft Storage configuration page * More markup * Improve the "High Availability" documentation * More links to the configuration pages * More links * even more links
This commit is contained in:
parent
46e327de4e
commit
dd33777d17
|
@ -21,21 +21,24 @@ information is also available on the
|
|||
To be highly available, one of the Vault server nodes grabs a lock within the
|
||||
data store. The successful server node then becomes the active node; all other
|
||||
nodes become standby nodes. At this point, if the standby nodes receive a
|
||||
request, they will either forward the request or redirect the client depending
|
||||
on the current configuration and state of the cluster -- see the sections below
|
||||
for details. Due to this architecture, HA does not enable increased
|
||||
scalability. In general, the bottleneck of Vault is the data store itself, not
|
||||
Vault core. For example: to increase the scalability of Vault with Consul, you
|
||||
would generally scale Consul instead of Vault.
|
||||
request, they will either [forward the request](#request-forwarding) or
|
||||
[redirect the client](#client-redirection) depending on the current
|
||||
configuration and state of the cluster -- see the sections below for details.
|
||||
Due to this architecture, HA does not enable increased scalability. In general,
|
||||
the bottleneck of Vault is the data store itself, not Vault core. For example:
|
||||
to increase the scalability of Vault with Consul, you would generally scale
|
||||
Consul instead of Vault.
|
||||
|
||||
Certain storage backends can support high availability mode, which enable them
|
||||
to store both Vault's information in addition to the HA lock. However, Vault
|
||||
also supports split data/HA mode, whereby the lock value and the rest of the
|
||||
data live separately. This can be done by specifying both the `storage` and
|
||||
`ha_storage` stanzas in the configuration file with different backends. For
|
||||
instance, a Vault cluster can be set up to use Consul as the `ha_storage` to
|
||||
manage the lock, and use Amazon S3 as the `storage` for all other persisted
|
||||
data.
|
||||
data live separately. This can be done by specifying both the
|
||||
[`storage`](/docs/configuration#storage) and
|
||||
[`ha_storage`](/docs/configuration#ha_storage) stanzas in the configuration file
|
||||
with different backends. For instance, a Vault cluster can be set up to use
|
||||
Consul as the [`ha_storage`](/docs/configuration#ha_storage) to manage the lock,
|
||||
and use Amazon S3 as the [`storage`](/docs/configuration#storage) for all other
|
||||
persisted data.
|
||||
|
||||
The sections below explain the server communication patterns and each type of
|
||||
request handling in more detail. At a minimum, the requirements for redirection
|
||||
|
@ -84,28 +87,36 @@ always required for all HA setups.
|
|||
|
||||
Some HA data store drivers can autodetect the redirect address, but it is often
|
||||
necessary to configure it manually via a top-level value in the configuration
|
||||
file. The key for this value is `api_addr` and the value can also be specified
|
||||
by the `VAULT_API_ADDR` environment variable, which takes precedence.
|
||||
file. The key for this value is [`api_addr`](/docs/configuration#api_addr) and
|
||||
the value can also be specified by the `VAULT_API_ADDR` environment variable,
|
||||
which takes precedence.
|
||||
|
||||
What the `api_addr` value should be set to depends on how Vault is set up.
|
||||
There are two common scenarios: Vault servers accessed directly by clients, and
|
||||
Vault servers accessed via a load balancer.
|
||||
What the [`api_addr`](/docs/configuration#api_addr) value should be set to
|
||||
depends on how Vault is set up. There are two common scenarios: Vault servers
|
||||
accessed directly by clients, and Vault servers accessed via a load balancer.
|
||||
|
||||
In both cases, the `api_addr` should be a full URL including scheme
|
||||
(`http`/`https`), not simply an IP address and port.
|
||||
In both cases, the [`api_addr`](/docs/configuration#api_addr) should be a full
|
||||
URL including scheme (`http`/`https`), not simply an IP address and port.
|
||||
|
||||
### Direct Access
|
||||
|
||||
When clients are able to access Vault directly, the `api_addr` for each
|
||||
node should be that node's address. For instance, if there are two Vault nodes
|
||||
`A` (accessed via `https://a.vault.mycompany.com:8200`) and `B` (accessed via
|
||||
`https://b.vault.mycompany.com:8200`), node `A` would set its `api_addr`
|
||||
to `https://a.vault.mycompany.com:8200` and node `B` would set its
|
||||
`api_addr` to `https://b.vault.mycompany.com:8200`.
|
||||
When clients are able to access Vault directly, the
|
||||
[`api_addr`](/docs/configuration#api_addr) for each node should be that node's
|
||||
address. For instance, if there are two Vault nodes:
|
||||
|
||||
* `A`, accessed via `https://a.vault.mycompany.com:8200`
|
||||
* `B`, accessed via `https://b.vault.mycompany.com:8200`
|
||||
|
||||
Then node `A` would set its
|
||||
[`api_addr`](/docs/configuration#api_addr) to
|
||||
`https://a.vault.mycompany.com:8200` and node `B` would set its
|
||||
[`api_addr`](/docs/configuration#api_addr) to
|
||||
`https://b.vault.mycompany.com:8200`.
|
||||
|
||||
This way, when `A` is the active node, any requests received by node `B` will
|
||||
cause it to redirect the client to node `A`'s `api_addr` at
|
||||
`https://a.vault.mycompany.com`, and vice-versa.
|
||||
cause it to redirect the client to node `A`'s
|
||||
[`api_addr`](/docs/configuration#api_addr) at `https://a.vault.mycompany.com`,
|
||||
and vice-versa.
|
||||
|
||||
### Behind Load Balancers
|
||||
|
||||
|
@ -115,33 +126,42 @@ case, the Vault servers should actually be set up as described in the above
|
|||
section, since for redirection purposes the clients have direct access.
|
||||
|
||||
However, if the only access to the Vault servers is via the load balancer, the
|
||||
`api_addr` on each node should be the same: the address of the load
|
||||
balancer. Clients that reach a standby node will be redirected back to the load
|
||||
balancer; at that point hopefully the load balancer's configuration will have
|
||||
been updated to know the address of the current leader. This can cause a
|
||||
redirect loop and as such is not a recommended setup when it can be avoided.
|
||||
[`api_addr`](/docs/configuration#api_addr) on each node should be the same: the
|
||||
address of the load balancer. Clients that reach a standby node will be
|
||||
redirected back to the load balancer; at that point hopefully the load
|
||||
balancer's configuration will have been updated to know the address of the
|
||||
current leader. This can cause a redirect loop and as such is not a recommended
|
||||
setup when it can be avoided.
|
||||
|
||||
### Per-Node Cluster Listener Addresses
|
||||
|
||||
Each `listener` block in Vault's configuration file contains an `address` value
|
||||
on which Vault listens for requests. Similarly, each `listener` block can
|
||||
contain a `cluster_address` on which Vault listens for server-to-server cluster
|
||||
requests. If this value is not set, its IP address will be automatically set to
|
||||
same as the `address` value, and its port will be automatically set to the same
|
||||
as the `address` value plus one (so by default, port `8201`).
|
||||
Each [`listener`](/docs/configuration/listener) block in Vault's configuration
|
||||
file contains an [`address`](/docs/configuration/listener/tcp#address) value on
|
||||
which Vault listens for requests. Similarly, each
|
||||
[`listener`](/docs/configuration/listener) block can contain a
|
||||
[`cluster_address`](/docs/configuration/listener/tcp#cluster_address) on which
|
||||
Vault listens for server-to-server cluster requests. If this value is not set,
|
||||
its IP address will be automatically set to same as the
|
||||
[`address`](/docs/configuration/listener/tcp#address) value, and its port will
|
||||
be automatically set to the same as the
|
||||
[`address`](/docs/configuration/listener/tcp#address) value plus one (so by
|
||||
default, port `8201`).
|
||||
|
||||
Note that _only_ active nodes have active listeners. When a node becomes active
|
||||
it will start cluster listeners, and when it becomes standby it will stop them.
|
||||
|
||||
### Per-Node Cluster Address
|
||||
|
||||
Similar to the `api_addr`, `cluster_addr` is the value that each node, if
|
||||
active, should advertise to the standbys to use for server-to-server
|
||||
Similar to the [`api_addr`](/docs/configuration#api_addr),
|
||||
[`cluster_addr`](/docs/configuration#cluster_addr) is the value that each node,
|
||||
if active, should advertise to the standbys to use for server-to-server
|
||||
communications, and lives as a top-level value in the configuration file. On
|
||||
each node, this should be set to a host name or IP address that a standby can
|
||||
use to reach one of that node's `cluster_address` values set in the `listener`
|
||||
blocks, including port. (Note that this will always be forced to `https` since
|
||||
only TLS connections are used between servers.)
|
||||
use to reach one of that node's
|
||||
[`cluster_address`](/docs/configuration#cluster_address) values set in the
|
||||
[`listener`](/docs/configuration/listener) blocks, including port. (Note that
|
||||
this will always be forced to `https` since only TLS connections are used
|
||||
between servers.)
|
||||
|
||||
This value can also be specified by the `VAULT_CLUSTER_ADDR` environment
|
||||
variable, which takes precedence.
|
||||
|
@ -149,12 +169,16 @@ variable, which takes precedence.
|
|||
## Storage Support
|
||||
|
||||
Currently there are several storage backends that support high availability
|
||||
mode, including Consul, ZooKeeper and etcd. These may change over time, and the
|
||||
[configuration page](/docs/configuration) should be referenced.
|
||||
mode, including [Consul](/docs/storage/consul),
|
||||
[ZooKeeper](/docs/storage/zookeeper) and [etcd](/docs/storage/etcd). These may
|
||||
change over time, and the [configuration page](/docs/configuration) should be
|
||||
referenced.
|
||||
|
||||
The Consul backend is the recommended HA backend, as it is used in production
|
||||
The [Consul backend](/docs/storage/consul) is the recommended HA backend, as it is used in production
|
||||
by HashiCorp and its customers with commercial support.
|
||||
|
||||
If you're interested in implementing another backend or adding HA support to
|
||||
another backend, we'd love your contributions. Adding HA support requires
|
||||
implementing the `physical.HABackend` interface for the storage backend.
|
||||
implementing the
|
||||
[`physical.HABackend`](https://pkg.go.dev/github.com/hashicorp/vault/sdk/physical#HABackend)
|
||||
interface for the storage backend.
|
||||
|
|
|
@ -30,7 +30,7 @@ Vault's cluster port. The cluster port defaults to `8201`. The TLS information
|
|||
is exchanged at join time and is rotated on a cadence.
|
||||
|
||||
A requirement for integrated storage is that the
|
||||
[cluster_addr](/docs/concepts/ha#per-node-cluster-address) configuration option
|
||||
[`cluster_addr`](/docs/concepts/ha#per-node-cluster-address) configuration option
|
||||
is set. This allows Vault to assign an address to the node ID at join time.
|
||||
|
||||
## Cluster Membership
|
||||
|
@ -57,10 +57,10 @@ been joined it cannot be re-joined to a different cluster.
|
|||
|
||||
You can either join the node automatically via the config file or manually through the
|
||||
API (both methods described below). When joining a node, the API address of the leader node must be used. We
|
||||
recommend setting the [api_addr](/docs/concepts/ha#direct-access) configuration
|
||||
recommend setting the [`api_addr`](/docs/concepts/ha#direct-access) configuration
|
||||
option on all nodes to make joining simpler.
|
||||
|
||||
#### retry_join Configuration
|
||||
#### `retry_join` Configuration
|
||||
|
||||
This method enables setting one, or more, target leader nodes in the config file.
|
||||
When an uninitialized Vault server starts up it will attempt to join each potential
|
||||
|
@ -69,7 +69,7 @@ leaders become active this node will successfully join. When using Shamir seal,
|
|||
the joined nodes will still need to be unsealed manually. When using Auto Unseal
|
||||
the node will be able to join and unseal automatically.
|
||||
|
||||
An example [retry_join](/docs/configuration/storage/raft#retry_join-stanza)
|
||||
An example [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza)
|
||||
config can be seen below:
|
||||
|
||||
```hcl
|
||||
|
@ -86,14 +86,18 @@ storage "raft" {
|
|||
}
|
||||
```
|
||||
|
||||
Note, in each `retry_join` stanza, you may provide a single `leader_api_addr` or
|
||||
`auto_join` value. When a cloud `auto_join` configuration value is provided, Vault
|
||||
will use [go-discover](https://github.com/hashicorp/go-discover) to automatically
|
||||
attempt to discover and resolve potential Raft leader addresses.
|
||||
Note, in each [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza)
|
||||
stanza, you may provide a single
|
||||
[`leader_api_addr`](/docs/configuration/storage/raft#leader_api_addr) or
|
||||
[`auto_join`](/docs/configuration/storage/raft#auto_join) value. When a cloud
|
||||
[`auto_join`](/docs/configuration/storage/raft#auto_join) configuration value is
|
||||
provided, Vault will use [go-discover](https://github.com/hashicorp/go-discover)
|
||||
to automatically attempt to discover and resolve potential Raft leader
|
||||
addresses.
|
||||
|
||||
See the go-discover
|
||||
[README](https://github.com/hashicorp/go-discover/blob/master/README.md) for
|
||||
details on the format of the `auto_join` value.
|
||||
details on the format of the [`auto_join`](/docs/configuration/storage/raft#auto_join) value.
|
||||
|
||||
```hcl
|
||||
storage "raft" {
|
||||
|
@ -106,9 +110,11 @@ storage "raft" {
|
|||
}
|
||||
```
|
||||
|
||||
By default, Vault will attempt to reach discovered peers using HTTPS and port 8200.
|
||||
Operators may override these through the `auto_join_scheme` and `auto_join_port`
|
||||
fields respectively.
|
||||
By default, Vault will attempt to reach discovered peers using HTTPS and port
|
||||
8200. Operators may override these through the
|
||||
[`auto_join_scheme`](/docs/configuration/storage/raft#auto_join_scheme) and
|
||||
[`auto_join_port`](/docs/configuration/storage/raft#auto_join_port) fields
|
||||
respectively.
|
||||
|
||||
```hcl
|
||||
storage "raft" {
|
||||
|
@ -125,7 +131,7 @@ storage "raft" {
|
|||
|
||||
#### Join from the CLI
|
||||
|
||||
Alternatively you can use the [join CLI
|
||||
Alternatively you can use the [`join` CLI
|
||||
command](/docs/commands/operator/raft/#join) or the API to join a node. The
|
||||
active node's API address will need to be specified:
|
||||
|
||||
|
@ -154,7 +160,7 @@ the size of the cluster, or for many other reasons. Removing the peer will
|
|||
ensure the cluster stays at the desired size, and that quorum is maintained.
|
||||
|
||||
To remove the peer you can issue a
|
||||
[remove-peer](/docs/commands/operator/raft#remove-peer) command and provide the
|
||||
[`remove-peer`](/docs/commands/operator/raft#remove-peer) command and provide the
|
||||
node ID you wish to remove:
|
||||
|
||||
```shell-session
|
||||
|
@ -165,7 +171,7 @@ Peer removed successfully!
|
|||
### Listing Peers
|
||||
|
||||
To see the current peer set for the cluster you can issue a
|
||||
[list-peers](/docs/commands/operator/raft#list-peers) command. All the voting
|
||||
[`list-peers`](/docs/commands/operator/raft#list-peers) command. All the voting
|
||||
nodes that are listed here contribute to the quorum and a majority must be alive
|
||||
for integrated storage to continue to operate.
|
||||
|
||||
|
@ -183,22 +189,24 @@ node3 node3.vault.local:8201 leader true
|
|||
We've glossed over some details in the above sections on bootstrapping clusters.
|
||||
The instructions are sufficient for most cases, but some users have run into
|
||||
problems when using auto-join and TLS in conjunction with things like auto-scaling.
|
||||
The issue is that go-discover on most platforms returns IPs (not hostnames), and
|
||||
because the IPs aren't knowable in advance, the TLS certificates used to secure
|
||||
the Vault API port don't contain these IPs in their IP SANs.
|
||||
The issue is that [go-discover](https://github.com/hashicorp/go-discover) on
|
||||
most platforms returns IPs (not hostnames), and because the IPs aren't knowable
|
||||
in advance, the TLS certificates used to secure the Vault API port don't contain
|
||||
these IPs in their IP SANs.
|
||||
|
||||
### Vault networking recap
|
||||
|
||||
Before we explore solutions to this problem, let's recapitulate how Vault nodes
|
||||
speak to one another.
|
||||
|
||||
Vault exposes two TCP ports: the API port and the cluster port.
|
||||
Vault exposes two TCP ports: [the API port](/docs/configuration#api_addr) and
|
||||
[the cluster port](/docs/configuration#cluster_addr).
|
||||
|
||||
The API port is where clients send their Vault HTTP requests.
|
||||
|
||||
For a single-node Vault cluster you don't worry about a cluster port as it won't be used.
|
||||
|
||||
When you have multiple nodes you also need a cluster port. This is used by Vault
|
||||
When you have multiple nodes, you also need a cluster port. This is used by Vault
|
||||
nodes to issue RPCs to one another, e.g. to forward requests from a standby node
|
||||
to the active node, or when Raft is in use, to handle leader election and
|
||||
replication of stored data.
|
||||
|
@ -217,12 +225,12 @@ instead of the cluster port. This is currently the only situation in which
|
|||
OSS Vault does this (Vault Enterprise also does something similar when setting
|
||||
up replication.)
|
||||
|
||||
* node2 wants to join the cluster, so issues challenge API request to existing member node1
|
||||
* node1 replies to challenge request with (1) an encrypted random UUID and (2) seal config
|
||||
* node2 must decrypt UUID using seal; if using auto-unseal can do it directly, if using shamir must wait for user to provide enough unseal keys to perform decryption
|
||||
* node2 sends decrypted UUID back to node1 using answer API
|
||||
* node1 sees node2 can be trusted (since it has seal access) and replies with a bootstrap package which includes the cluster TLS certificate and private key
|
||||
* node2 gets sent a raft snapshot over the cluster port
|
||||
* `node2` wants to join the cluster, so issues challenge API request to existing member `node1`
|
||||
* `node1` replies to challenge request with (1) an encrypted random UUID and (2) seal config
|
||||
* `node2` must decrypt UUID using seal; if using auto-unseal can do it directly, if using Shamir must wait for user to provide enough unseal keys to perform decryption
|
||||
* `node2` sends decrypted UUID back to `node1` using answer API
|
||||
* `node1` sees `node2` can be trusted (since it has seal access) and replies with a bootstrap package which includes the cluster TLS certificate and private key
|
||||
* `node2` gets sent a raft snapshot over the cluster port
|
||||
|
||||
After this procedure the new node will never again send traffic to the API port.
|
||||
All subsequent inter-node communication will use the cluster port.
|
||||
|
@ -231,22 +239,26 @@ All subsequent inter-node communication will use the cluster port.
|
|||
|
||||
### Assisted raft join techniques
|
||||
|
||||
The simplest option is to do it by hand: issue raft join commands specifying the
|
||||
explicit names or IPs of the nodes to join to. In this section we look at other
|
||||
TLS-compatible options that lend themselves more to automation.
|
||||
The simplest option is to do it by hand: issue [`raft
|
||||
join`](/docs/commands/operator/raft#join) commands specifying the explicit names
|
||||
or IPs of the nodes to join to. In this section we look at other TLS-compatible
|
||||
options that lend themselves more to automation.
|
||||
|
||||
#### Autojoin with TLS servername
|
||||
|
||||
As of Vault 1.6.2, the simplest option might be to specify a leader_tls_servername
|
||||
in the retry_join stanza which matches a DNS SAN in the certificate.
|
||||
As of Vault 1.6.2, the simplest option might be to specify a
|
||||
[`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername)
|
||||
in the [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza) stanza
|
||||
which matches a [DNS
|
||||
SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) in the certificate.
|
||||
|
||||
Note that names in a certificate's DNS SAN don't actually have to be registered
|
||||
in a DNS server. Your nodes may have no names found in DNS, while still
|
||||
using certificate(s) that contain this shared "servername" in their DNS SANs.
|
||||
using certificate(s) that contain this shared `servername` in their DNS SANs.
|
||||
|
||||
#### Autojoin but constrain CIDR, list all possible IPs in certificate
|
||||
|
||||
If all the vault node IPs are assigned from a small subnet, e.g. a /28, it
|
||||
If all the vault node IPs are assigned from a small subnet, e.g. a `/28`, it
|
||||
becomes practical to put all the IPs that exist in that subnet into the IP SANs
|
||||
of the TLS certificate the nodes will share.
|
||||
|
||||
|
@ -258,8 +270,9 @@ using non-voting nodes and dynamically scaling clusters.
|
|||
|
||||
Most Vault instances are going to have a load balancer (LB) between clients and
|
||||
the Vault nodes. In that case, the LB knows how to route traffic to working
|
||||
Vault nodes, and there's no need for auto-join: we can just use retry_join
|
||||
with the LB address as the target.
|
||||
Vault nodes, and there's no need for auto-join: we can just use
|
||||
[`retry_join`](/docs/configuration/storage/raft#retry_join-stanza) with the LB
|
||||
address as the target.
|
||||
|
||||
One potential issue here: some users want a public facing LB for clients to
|
||||
connect to Vault, but aren't comfortable with Vault internal traffic
|
||||
|
@ -279,12 +292,13 @@ and have it reconnect to the cluster with the same host address. This will retur
|
|||
the cluster to a fully healthy state.
|
||||
|
||||
If this is impractical, you need to remove the failed server. Usually, you can
|
||||
issue a remove-peer command to remove the failed server if it's still a member
|
||||
of the cluster.
|
||||
issue a [`remove-peer`](/docs/commands/operator/raft#remove-peer) command to
|
||||
remove the failed server if it's still a member of the cluster.
|
||||
|
||||
If the remove-peer command isn't possible or you'd rather manually re-write the
|
||||
cluster membership a `raft/peers.json` file can be written to the configured
|
||||
data directory.
|
||||
If the [`remove-peer`](/docs/commands/operator/raft#remove-peer) command isn't
|
||||
possible or you'd rather manually re-write the cluster membership a
|
||||
[`raft/peers.json`](#manual-recovery-using-peers-json) file can be written to
|
||||
the configured data directory.
|
||||
|
||||
### Quorum Lost
|
||||
|
||||
|
@ -302,22 +316,24 @@ committed could be incomplete. The recovery process implicitly commits all
|
|||
outstanding Raft log entries, so it's also possible to commit data that was
|
||||
uncommitted before the failure.
|
||||
|
||||
See the section below on manual recovery using peers.json for details of the
|
||||
recovery procedure. You include only the remaining servers in the
|
||||
raft/peers.json recovery file. The cluster should be able to elect a leader once
|
||||
the remaining servers are all restarted with an identical raft/peers.json
|
||||
configuration.
|
||||
See the section below on manual recovery using
|
||||
[`peers.json`](#manual-recovery-using-peers-json) for details of the recovery
|
||||
procedure. You include only the remaining servers in the
|
||||
[`peers.json`](#manual-recovery-using-peers-json) recovery file. The
|
||||
cluster should be able to elect a leader once the remaining servers are all
|
||||
restarted with an identical
|
||||
[`peers.json`](#manual-recovery-using-peers-json) configuration.
|
||||
|
||||
Any servers you introduce later can be fresh with totally clean data
|
||||
directories and joined using Vault's join command.
|
||||
|
||||
In extreme cases, it should be possible to recover with just a single remaining
|
||||
server by starting that single server with itself as the only peer in the
|
||||
raft/peers.json recovery file.
|
||||
[`peers.json`](#manual-recovery-using-peers-json) recovery file.
|
||||
|
||||
### Manual Recovery Using peers.json
|
||||
|
||||
Using raft/peers.json for recovery can cause uncommitted Raft log entries to be
|
||||
Using `raft/peers.json` for recovery can cause uncommitted Raft log entries to be
|
||||
implicitly committed, so this should only be used after an outage where no other
|
||||
option is available to recover a lost server. Make sure you don't have any
|
||||
automated processes that will put the peers file in place on a periodic basis.
|
||||
|
@ -326,10 +342,10 @@ To begin, stop all remaining servers.
|
|||
|
||||
The next step is to go to the [configured data
|
||||
path](/docs/configuration/storage/raft/#path) of each Vault server. Inside that
|
||||
directory, there will be a raft/ sub-directory. We need to create a
|
||||
raft/peers.json file. The file should be formatted as a JSON array containing
|
||||
the node ID, address:port, and suffrage information of each Vault server you
|
||||
wish to be in the cluster.
|
||||
directory, there will be a `raft/` sub-directory. We need to create a
|
||||
`raft/peers.json` file. The file should be formatted as a JSON array containing
|
||||
the node ID, `address:port`, and suffrage information of each Vault server you
|
||||
wish to be in the cluster:
|
||||
|
||||
```json
|
||||
[
|
||||
|
@ -352,7 +368,7 @@ wish to be in the cluster.
|
|||
```
|
||||
|
||||
- `id` `(string: <required>)` - Specifies the node ID of the server. This can be
|
||||
found in the config file, or inside the node-id file in the server's data
|
||||
found in the config file, or inside the `node-id` file in the server's data
|
||||
directory if it was auto-generated.
|
||||
- `address` `(string: <required>)` - Specifies the host and port of the server. The
|
||||
port is the server's cluster port.
|
||||
|
|
|
@ -32,14 +32,15 @@ cluster_addr = "http://127.0.0.1:8201"
|
|||
```
|
||||
|
||||
~> **Note:** When using the Integrated Storage backend, it is required to provide
|
||||
`cluster_addr` to indicate the address and port to be used for communication
|
||||
[`cluster_addr`](/docs/concepts/ha#per-node-cluster-address) to indicate the address and port to be used for communication
|
||||
between the nodes in the Raft cluster.
|
||||
|
||||
~> **Note:** When using the Integrated Storage backend, a separate `ha_storage`
|
||||
~> **Note:** When using the Integrated Storage backend, a separate
|
||||
[`ha_storage`](/docs/configuration#ha_storage)
|
||||
backend cannot be declared.
|
||||
|
||||
~> **Note:** When using the Integrated Storage backend, it is strongly recommended to
|
||||
set `disable_mlock` to `true`, and to disable memory swapping on the system.
|
||||
set [`disable_mlock`](/docs/configuration#disable_mlock) to `true`, and to disable memory swapping on the system.
|
||||
|
||||
## `raft` Parameters
|
||||
|
||||
|
@ -73,43 +74,44 @@ set `disable_mlock` to `true`, and to disable memory swapping on the system.
|
|||
consider reducing write throughput or the amount of data stored on Vault. The
|
||||
default value is 10000 which is suitable for all normal workloads.
|
||||
|
||||
- `snapshot_threshold` `(integer: 8192)` - This controls the minimum number of raft
|
||||
- `snapshot_threshold` `(integer: 8192)` - This controls the minimum number of Raft
|
||||
commit entries between snapshots that are saved to disk. This is a low-level
|
||||
parameter that should rarely need to be changed. Very busy clusters
|
||||
experiencing excessive disk IO may increase this value to reduce disk IO and
|
||||
minimize the chances of all servers taking snapshots at the same time.
|
||||
Increasing this trades off disk IO for disk space since the log will grow much
|
||||
larger and the space in the raft.db file can't be reclaimed till the next
|
||||
larger and the space in the `raft.db` file can't be reclaimed till the next
|
||||
snapshot. Servers may take longer to recover from crashes or failover if this
|
||||
is increased significantly as more logs will need to be replayed.
|
||||
|
||||
- `retry_join` `(list: [])` - There can be one or more `retry_join` stanzas.
|
||||
When the raft cluster is getting bootstrapped, if the connection details of all
|
||||
the nodes are known beforehand, then specifying this config stanzas enables the
|
||||
nodes to automatically join a raft cluster. All the nodes would mention all
|
||||
other nodes that they could join using this config. When one of the nodes is
|
||||
initialized, it becomes the leader and all the other nodes will join the
|
||||
leader node to form the cluster. When using Shamir seal, the joined nodes will
|
||||
still need to be unsealed manually. See the section below that describes the
|
||||
parameters accepted by the `retry_join` stanza.
|
||||
- `retry_join` `(list: [])` - There can be one or more
|
||||
[`retry_join`](#retry_join-stanza) stanzas. When the Raft cluster is getting
|
||||
bootstrapped, if the connection details of all the nodes are known beforehand,
|
||||
then specifying this config stanzas enables the nodes to automatically join a
|
||||
Raft cluster. All the nodes would mention all other nodes that they could join
|
||||
using this config. When one of the nodes is initialized, it becomes the leader
|
||||
and all the other nodes will join the leader node to form the cluster. When
|
||||
using Shamir seal, the joined nodes will still need to be unsealed manually.
|
||||
See [the section below](#retry_join-stanza) that describes the parameters
|
||||
accepted by the [`retry_join`](#retry_join-stanza) stanza.
|
||||
|
||||
- `max_entry_size` `(integer: 1048576)` - This configures the maximum number of
|
||||
bytes for a raft entry. It applies to both Put operations and transactions.
|
||||
bytes for a Raft entry. It applies to both Put operations and transactions.
|
||||
Any put or transaction operation exceeding this configuration value will cause
|
||||
the respective operation to fail. Raft has a suggested max size of data in a
|
||||
raft log entry. This is based on current architecture, default timing, etc.
|
||||
Raft log entry. This is based on current architecture, default timing, etc.
|
||||
Integrated storage also uses a chunk size that is the threshold used for
|
||||
breaking a large value into chunks. By default, the chunk size is the same as
|
||||
raft's max size log entry. The default value for this configuration is 1048576
|
||||
Raft's max size log entry. The default value for this configuration is 1048576
|
||||
-- two times the chunking size.
|
||||
|
||||
- `autopilot_reconcile_interval` `(string: "")` - This is the interval after
|
||||
- `autopilot_reconcile_interval` `(string: "10s")` - This is the interval after
|
||||
which autopilot will pick up any state changes. State change could mean multiple
|
||||
things; for example a newly joined voter node, initially added as non-voter to
|
||||
the raft cluster by autopilot has successfully completed the stabilization
|
||||
the Raft cluster by autopilot has successfully completed the stabilization
|
||||
period thereby qualifying for being promoted as a voter, a node that has become
|
||||
unhealthy and needs to be shown as such in the state API, a node has been marked
|
||||
as dead needing eviction from raft configuration, etc. Defaults to 10s.
|
||||
as dead needing eviction from Raft configuration, etc.
|
||||
|
||||
### `retry_join` stanza
|
||||
|
||||
|
@ -123,8 +125,11 @@ set `disable_mlock` to `true`, and to disable memory swapping on the system.
|
|||
- `auto_join_port` `(uint: "")` - The optional port used for addressed discovered
|
||||
via auto-join.
|
||||
|
||||
- `leader_tls_servername` `(string: "")` - TLS servername to use when connecting with HTTPS.
|
||||
Should match one of the names in the DNS SANs of the remote server certificate.
|
||||
- `leader_tls_servername` `(string: "")` - The TLS server name to use when
|
||||
connecting with HTTPS.
|
||||
Should match one of the names in the [DNS
|
||||
SANs](https://en.wikipedia.org/wiki/Subject_Alternative_Name) of the remote
|
||||
server certificate.
|
||||
See also [Integrated Storage and TLS](https://www.vaultproject.io/docs/concepts/integrated-storage#autojoin-with-tls-servername)
|
||||
|
||||
- `leader_ca_cert_file` `(string: "")` - File path to the CA cert of the
|
||||
|
@ -145,20 +150,22 @@ set `disable_mlock` to `true`, and to disable memory swapping on the system.
|
|||
- `leader_client_key` `(string: "")` - Client key for the follower node to
|
||||
establish client authentication with the possible leader node.
|
||||
|
||||
Each `retry_join` block may provide TLS certificates via file paths or as a
|
||||
single-line certificate string value with newlines delimited by `\n`, but not a
|
||||
combination of both. Each `retry_join` stanza may contain either a `leader_api_addr`
|
||||
value or a cloud `auto_join` configuration value, but not both. When an `auto_join`
|
||||
value is provided, Vault will automatically attempt to discover and resolve
|
||||
potential Raft leader addresses.
|
||||
Each [`retry_join`](#retry_join-stanza) block may provide TLS certificates via
|
||||
file paths or as a single-line certificate string value with newlines delimited
|
||||
by `\n`, but not a combination of both. Each [`retry_join`](#retry_join-stanza)
|
||||
stanza may contain either a [`leader_api_addr`](#leader_api_addr) value or a
|
||||
cloud [`auto_join`](#auto_join) configuration value, but not both. When an
|
||||
[`auto_join`](#auto_join) value is provided, Vault will automatically attempt to
|
||||
discover and resolve potential Raft leader addresses.
|
||||
|
||||
By default, Vault will attempt to reach discovered peers using HTTPS and port 8200.
|
||||
Operators may override these through the `auto_join_scheme` and `auto_join_port`
|
||||
By default, Vault will attempt to reach discovered peers using HTTPS and port
|
||||
8200. Operators may override these through the
|
||||
[`auto_join_scheme`](#auto_join_scheme) and [`auto_join_port`](#auto_join_port)
|
||||
fields respectively.
|
||||
|
||||
Example Configuration:
|
||||
|
||||
```
|
||||
```hcl
|
||||
storage "raft" {
|
||||
path = "/Users/foo/raft/"
|
||||
node_id = "node1"
|
||||
|
|
Loading…
Reference in New Issue