389 lines
16 KiB
Plaintext
389 lines
16 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Integrated Storage
|
|
description: Learn about the integrated raft storage in Vault.
|
|
---
|
|
|
|
# Integrated Storage
|
|
|
|
Vault supports a number of storage options for the durable storage of Vault's
|
|
information. As of Vault 1.4 an Integrated Storage option is offered. This
|
|
storage backend does not rely on any third party systems, implements high
|
|
availability semantics, supports Enterprise Replication features, and provides
|
|
backup/restore workflows.
|
|
|
|
The option stores Vault's data on the server's filesystem and
|
|
uses a consensus protocol to replicate data to each server in the cluster. More
|
|
information on the internals of Integrated Storage can be found in the
|
|
[Integrated Storage internals
|
|
documentation](/docs/internals/integrated-storage/). Additionally, the
|
|
[Configuration](/docs/configuration/storage/raft/) docs can help in configuring
|
|
Vault to use Integrated Storage.
|
|
|
|
The sections below go into various details on how to operate Vault with
|
|
Integrated Storage.
|
|
|
|
## Server-to-Server Communication
|
|
|
|
Once nodes are joined to one another they begin to communicate using mTLS over
|
|
Vault's cluster port. The cluster port defaults to `8201`. The TLS information
|
|
is exchanged at join time and is rotated on a cadence.
|
|
|
|
A requirement for Integrated Storage is that the
|
|
[`cluster_addr`](/docs/concepts/ha#per-node-cluster-address) configuration option
|
|
is set. This allows Vault to assign an address to the node ID at join time.
|
|
|
|
## Cluster Membership
|
|
|
|
This section will outline how to bootstrap and manage a cluster of Vault nodes
|
|
running Integrated Storage.
|
|
|
|
Integrated Storage is bootstrapped during the [initialization
|
|
process](https://learn.hashicorp.com/vault/getting-started/deploy#initializing-the-vault),
|
|
and results in a cluster of size 1. Depending on the [desired deployment
|
|
size](/docs/internals/integrated-storage/#deployment-table), nodes can be joined
|
|
to the active Vault node.
|
|
|
|
### Joining Nodes
|
|
|
|
Joining is the process of taking an uninitialized Vault node and making it a
|
|
member of an existing cluster. In order to authenticate the new node to the
|
|
cluster it must use the same seal mechanism. If using a Auto Unseal the node
|
|
must be configured to use the same KMS provider and Key as the cluster it's
|
|
attempting to join. If using a Shamir seal the unseal keys must be provided to
|
|
the new node before the join process can complete. Once a node has successfully
|
|
joined, data from the active node can begin to replicate to it. Once a node has
|
|
been joined it cannot be re-joined to a different cluster.
|
|
|
|
You can either join the node automatically via the config file or manually through the
|
|
API (both methods described below). When joining a node, the API address of the leader node must be used. We
|
|
recommend setting the [`api_addr`](/docs/concepts/ha#direct-access) configuration
|
|
option on all nodes to make joining simpler.
|
|
|
|
#### `retry_join` Configuration
|
|
|
|
This method enables setting one, or more, target leader nodes in the config file.
|
|
When an uninitialized Vault server starts up it will attempt to join each potential
|
|
leader that is defined, retrying until successful. When one of the specified
|
|
leaders become active this node will successfully join. When using Shamir seal,
|
|
the joined nodes will still need to be unsealed manually. When using Auto Unseal
|
|
the node will be able to join and unseal automatically.
|
|
|
|
An example [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza)
|
|
config can be seen below:
|
|
|
|
```hcl
|
|
storage "raft" {
|
|
path = "/var/raft/"
|
|
node_id = "node3"
|
|
|
|
retry_join {
|
|
leader_api_addr = "https://node1.vault.local:8200"
|
|
}
|
|
retry_join {
|
|
leader_api_addr = "https://node2.vault.local:8200"
|
|
}
|
|
}
|
|
```
|
|
|
|
Note, in each [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza)
|
|
stanza, you may provide a single
|
|
[`leader_api_addr`](/docs/configuration/storage/raft#leader_api_addr) or
|
|
[`auto_join`](/docs/configuration/storage/raft#auto_join) value. When a cloud
|
|
[`auto_join`](/docs/configuration/storage/raft#auto_join) configuration value is
|
|
provided, Vault will use [go-discover](https://github.com/hashicorp/go-discover)
|
|
to automatically attempt to discover and resolve potential Raft leader
|
|
addresses.
|
|
|
|
See the go-discover
|
|
[README](https://github.com/hashicorp/go-discover/blob/master/README.md) for
|
|
details on the format of the [`auto_join`](/docs/configuration/storage/raft#auto_join) value.
|
|
|
|
```hcl
|
|
storage "raft" {
|
|
path = "/var/raft/"
|
|
node_id = "node3"
|
|
|
|
retry_join {
|
|
auto_join = "provider=aws region=eu-west-1 tag_key=vault tag_value=... access_key_id=... secret_access_key=..."
|
|
}
|
|
}
|
|
```
|
|
|
|
By default, Vault will attempt to reach discovered peers using HTTPS and port 8200. Operators may override these through the
|
|
[`auto_join_scheme`](/docs/configuration/storage/raft#auto_join_scheme) and
|
|
[`auto_join_port`](/docs/configuration/storage/raft#auto_join_port) fields
|
|
respectively.
|
|
|
|
```hcl
|
|
storage "raft" {
|
|
path = "/var/raft/"
|
|
node_id = "node3"
|
|
|
|
retry_join {
|
|
auto_join = "provider=aws region=eu-west-1 tag_key=vault tag_value=... access_key_id=... secret_access_key=..."
|
|
auto_join_scheme = "http"
|
|
auto_join_port = 8201
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Join from the CLI
|
|
|
|
Alternatively you can use the [`join` CLI
|
|
command](/docs/commands/operator/raft/#join) or the API to join a node. The
|
|
active node's API address will need to be specified:
|
|
|
|
```shell-session
|
|
$ vault operator raft join https://node1.vault.local:8200
|
|
```
|
|
|
|
#### Non-Voting Nodes (Enterprise Only)
|
|
|
|
Nodes that are joined to a cluster can be specified as non-voters. A non-voting
|
|
node has all of Vault's data replicated to it, but does not contribute to the
|
|
quorum count. This can be used in conjunction with [Performance
|
|
Standby](/docs/enterprise/performance-standby/) nodes to add read scalability to
|
|
a cluster in cases where a high volume of reads to servers are needed.
|
|
|
|
```shell-session
|
|
$ vault operator raft join -non-voter https://node1.vault.local:8200
|
|
```
|
|
|
|
### Removing Peers
|
|
|
|
Removing a peer node is a necessary step when you no longer want the node in the
|
|
cluster. This could happen if the node is rotated for a new one, the hostname
|
|
permanently changes and can no longer be accessed, you're attempting to shrink
|
|
the size of the cluster, or for many other reasons. Removing the peer will
|
|
ensure the cluster stays at the desired size, and that quorum is maintained.
|
|
|
|
To remove the peer you can issue a
|
|
[`remove-peer`](/docs/commands/operator/raft#remove-peer) command and provide the
|
|
node ID you wish to remove:
|
|
|
|
```shell-session
|
|
$ vault operator raft remove-peer node1
|
|
Peer removed successfully!
|
|
```
|
|
|
|
### Listing Peers
|
|
|
|
To see the current peer set for the cluster you can issue a
|
|
[`list-peers`](/docs/commands/operator/raft#list-peers) command. All the voting
|
|
nodes that are listed here contribute to the quorum and a majority must be alive
|
|
for Integrated Storage to continue to operate.
|
|
|
|
```shell-session
|
|
$ vault operator raft list-peers
|
|
Node Address State Voter
|
|
---- ------- ----- -----
|
|
node1 node1.vault.local:8201 follower true
|
|
node2 node2.vault.local:8201 follower true
|
|
node3 node3.vault.local:8201 leader true
|
|
```
|
|
|
|
## Integrated Storage and TLS
|
|
|
|
We've glossed over some details in the above sections on bootstrapping clusters.
|
|
The instructions are sufficient for most cases, but some users have run into
|
|
problems when using auto-join and TLS in conjunction with things like auto-scaling.
|
|
The issue is that [go-discover](https://github.com/hashicorp/go-discover) on
|
|
most platforms returns IPs (not hostnames), and because the IPs aren't knowable
|
|
in advance, the TLS certificates used to secure the Vault API port don't contain
|
|
these IPs in their IP SANs.
|
|
|
|
### Vault networking recap
|
|
|
|
Before we explore solutions to this problem, let's recapitulate how Vault nodes
|
|
speak to one another.
|
|
|
|
Vault exposes two TCP ports: [the API port](/docs/configuration#api_addr) and
|
|
[the cluster port](/docs/configuration#cluster_addr).
|
|
|
|
The API port is where clients send their Vault HTTP requests.
|
|
|
|
For a single-node Vault cluster you don't worry about a cluster port as it won't be used.
|
|
|
|
When you have multiple nodes, you also need a cluster port. This is used by Vault
|
|
nodes to issue RPCs to one another, e.g. to forward requests from a standby node
|
|
to the active node, or when Raft is in use, to handle leader election and
|
|
replication of stored data.
|
|
|
|
The cluster port is secured using a TLS certificate that the Vault active node
|
|
generates internally. It's clear how this can work when not using integrated
|
|
storage: every node has at least read access to storage, so once the active
|
|
node has persisted the certificate, the standby nodes can fetch it, and all
|
|
agree on how cluster traffic should be encrypted.
|
|
|
|
It's less clear how this works with Integrated Storage, as there is a chicken
|
|
and egg problem. Nodes don't have a shared view of storage until the raft
|
|
cluster has been formed, but we're trying to form the raft cluster! To solve
|
|
this problem, a Vault node must speak to another Vault node using the API port
|
|
instead of the cluster port. This is currently the only situation in which
|
|
OSS Vault does this (Vault Enterprise also does something similar when setting
|
|
up replication.)
|
|
|
|
- `node2` wants to join the cluster, so issues challenge API request to existing member `node1`
|
|
- `node1` replies to challenge request with (1) an encrypted random UUID and (2) seal config
|
|
- `node2` must decrypt UUID using seal; if using auto-unseal can do it directly, if using Shamir must wait for user to provide enough unseal keys to perform decryption
|
|
- `node2` sends decrypted UUID back to `node1` using answer API
|
|
- `node1` sees `node2` can be trusted (since it has seal access) and replies with a bootstrap package which includes the cluster TLS certificate and private key
|
|
- `node2` gets sent a raft snapshot over the cluster port
|
|
|
|
After this procedure the new node will never again send traffic to the API port.
|
|
All subsequent inter-node communication will use the cluster port.
|
|
|
|
![Raft Join Process](/img/raft-join-detailed.png)
|
|
|
|
### Assisted raft join techniques
|
|
|
|
The simplest option is to do it by hand: issue [`raft join`](/docs/commands/operator/raft#join) commands specifying the explicit names
|
|
or IPs of the nodes to join to. In this section we look at other TLS-compatible
|
|
options that lend themselves more to automation.
|
|
|
|
#### Autojoin with TLS servername
|
|
|
|
As of Vault 1.6.2, the simplest option might be to specify a
|
|
[`leader_tls_servername`](/docs/configuration/storage/raft#leader_tls_servername)
|
|
in the [`retry_join`](/docs/configuration/storage/raft#retry_join-stanza) stanza
|
|
which matches a [DNS
|
|
SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) in the certificate.
|
|
|
|
Note that names in a certificate's DNS SAN don't actually have to be registered
|
|
in a DNS server. Your nodes may have no names found in DNS, while still
|
|
using certificate(s) that contain this shared `servername` in their DNS SANs.
|
|
|
|
#### Autojoin but constrain CIDR, list all possible IPs in certificate
|
|
|
|
If all the vault node IPs are assigned from a small subnet, e.g. a `/28`, it
|
|
becomes practical to put all the IPs that exist in that subnet into the IP SANs
|
|
of the TLS certificate the nodes will share.
|
|
|
|
The drawback here is that the cluster may someday outgrow the CIDR and changing
|
|
it may be a pain. For similar reasons this solution may be impractical when
|
|
using non-voting nodes and dynamically scaling clusters.
|
|
|
|
#### Load balancer instead of autojoin
|
|
|
|
Most Vault instances are going to have a load balancer (LB) between clients and
|
|
the Vault nodes. In that case, the LB knows how to route traffic to working
|
|
Vault nodes, and there's no need for auto-join: we can just use
|
|
[`retry_join`](/docs/configuration/storage/raft#retry_join-stanza) with the LB
|
|
address as the target.
|
|
|
|
One potential issue here: some users want a public facing LB for clients to
|
|
connect to Vault, but aren't comfortable with Vault internal traffic
|
|
egressing from the internal network it normally runs on.
|
|
|
|
## Outage Recovery
|
|
|
|
### Quorum Maintained
|
|
|
|
This section outlines the steps to take when a single server or multiple servers
|
|
are in a failed state but quorum is still maintained. This means the remaining
|
|
alive servers are still operational, can elect a leader, and are able to process
|
|
write requests.
|
|
|
|
If the failed server is recoverable, the best option is to bring it back online
|
|
and have it reconnect to the cluster with the same host address. This will return
|
|
the cluster to a fully healthy state.
|
|
|
|
If this is impractical, you need to remove the failed server. Usually, you can
|
|
issue a [`remove-peer`](/docs/commands/operator/raft#remove-peer) command to
|
|
remove the failed server if it's still a member of the cluster.
|
|
|
|
If the [`remove-peer`](/docs/commands/operator/raft#remove-peer) command isn't
|
|
possible or you'd rather manually re-write the cluster membership a
|
|
[`raft/peers.json`](#manual-recovery-using-peers-json) file can be written to
|
|
the configured data directory.
|
|
|
|
### Quorum Lost
|
|
|
|
In the event that multiple servers are lost, causing a loss of quorum and a
|
|
complete outage, partial recovery is still possible.
|
|
|
|
If the failed servers are recoverable, the best option is to bring them back
|
|
online and have them reconnect to the cluster using the same host addresses.
|
|
This will return the cluster to a fully healthy state.
|
|
|
|
If the failed servers are not recoverable, partial recovery is possible using
|
|
data on the remaining servers in the cluster. There may be data loss in this
|
|
situation because multiple servers were lost, so information about what's
|
|
committed could be incomplete. The recovery process implicitly commits all
|
|
outstanding Raft log entries, so it's also possible to commit data that was
|
|
uncommitted before the failure.
|
|
|
|
See the section below on manual recovery using
|
|
[`peers.json`](#manual-recovery-using-peers-json) for details of the recovery
|
|
procedure. You include only the remaining servers in the
|
|
[`peers.json`](#manual-recovery-using-peers-json) recovery file. The
|
|
cluster should be able to elect a leader once the remaining servers are all
|
|
restarted with an identical
|
|
[`peers.json`](#manual-recovery-using-peers-json) configuration.
|
|
|
|
Any servers you introduce later can be fresh with totally clean data
|
|
directories and joined using Vault's join command.
|
|
|
|
In extreme cases, it should be possible to recover with just a single remaining
|
|
server by starting that single server with itself as the only peer in the
|
|
[`peers.json`](#manual-recovery-using-peers-json) recovery file.
|
|
|
|
### Manual Recovery Using peers.json
|
|
|
|
Using `raft/peers.json` for recovery can cause uncommitted Raft log entries to be
|
|
implicitly committed, so this should only be used after an outage where no other
|
|
option is available to recover a lost server. Make sure you don't have any
|
|
automated processes that will put the peers file in place on a periodic basis.
|
|
|
|
To begin, stop all remaining servers.
|
|
|
|
The next step is to go to the [configured data
|
|
path](/docs/configuration/storage/raft/#path) of each Vault server. Inside that
|
|
directory, there will be a `raft/` sub-directory. We need to create a
|
|
`raft/peers.json` file. The file should be formatted as a JSON array containing
|
|
the node ID, `address:port`, and suffrage information of each Vault server you
|
|
wish to be in the cluster:
|
|
|
|
```json
|
|
[
|
|
{
|
|
"id": "node1",
|
|
"address": "node1.vault.local:8201",
|
|
"non_voter": false
|
|
},
|
|
{
|
|
"id": "node2",
|
|
"address": "node2.vault.local:8201",
|
|
"non_voter": false
|
|
},
|
|
{
|
|
"id": "node3",
|
|
"address": "node3.vault.local:8201",
|
|
"non_voter": false
|
|
}
|
|
]
|
|
```
|
|
|
|
- `id` `(string: <required>)` - Specifies the node ID of the server. This can be
|
|
found in the config file, or inside the `node-id` file in the server's data
|
|
directory if it was auto-generated.
|
|
- `address` `(string: <required>)` - Specifies the host and port of the server. The
|
|
port is the server's cluster port.
|
|
- `non_voter` `(bool: <false>)` - This controls whether the server is a non-voter.
|
|
If omitted, it will default to false, which is typical for most clusters. This
|
|
is an enterprise only feature.
|
|
|
|
Create entries for all servers. You must confirm that servers you do not
|
|
include here have indeed failed and will not later rejoin the cluster. Ensure
|
|
that this file is the same across all remaining server nodes.
|
|
|
|
At this point, you can restart all the remaining servers. The cluster should be
|
|
in an operable state again. One of the nodes should claim leadership and become
|
|
active.
|
|
|
|
### Other Recovery Methods
|
|
|
|
For other, non-quorum related recovery [Vault's
|
|
recovery](/docs/concepts/recovery-mode/) mode can be used.
|