docs: Add integrated storage concepts page (#8673)
* docs: Add integrated storage concepts page * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Review feedback and add recovery information * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Review feedback Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Meggie <m.ladlow@gmail.com>
This commit is contained in:
parent
69118a2be8
commit
1a340a87cb
|
@ -34,6 +34,7 @@ export default [
|
|||
'response-wrapping',
|
||||
'policies',
|
||||
'ha',
|
||||
'integrated-storage',
|
||||
'pgp-gpg-keybase',
|
||||
'recovery-mode'
|
||||
]
|
||||
|
|
|
@ -0,0 +1,250 @@
|
|||
---
|
||||
layout: docs
|
||||
page_title: Integrated Storage
|
||||
sidebar_title: Integrated Storage
|
||||
description: Learn about the integrated raft storage in Vault.
|
||||
---
|
||||
|
||||
# Integrated Storage
|
||||
|
||||
Vault supports a number of storage options for the durable storage of Vault's
|
||||
information. As of Vault 1.4 an integrated storage option is offered. This
|
||||
storage backend does not rely on any third party systems, implements high
|
||||
availability semantics, supports Enterprise Replication features, and provides
|
||||
backup/restore workflows.
|
||||
|
||||
The integrated storage option stores Vault's data on the server's filesystem and
|
||||
uses a consensus protocol to replicate data to each server in the cluster. More
|
||||
information on the internals of integrated storage can be found in the
|
||||
[integrated storage internals
|
||||
documentation](/docs/internals/integrated-storage/). Additionally, the
|
||||
[Configuration](/docs/configuration/storage/raft/) docs can help in configuring
|
||||
Vault to use integrated storage.
|
||||
|
||||
The sections below go into various details on how to operate Vault with
|
||||
integrated storage.
|
||||
|
||||
## Cluster Membership
|
||||
|
||||
This section will outline how to bootstrap and manage a cluster of Vault nodes
|
||||
running integrated storage.
|
||||
|
||||
Integrated storage is bootstrapped during the [initialization
|
||||
process](https://learn.hashicorp.com/vault/getting-started/deploy#initializing-the-vault),
|
||||
and results in a cluster of size 1. Depending on the [desired deployment
|
||||
size](/docs/internals/integrated-storage/#deployment-table), nodes can be joined
|
||||
to the active Vault node.
|
||||
|
||||
### Joining Nodes
|
||||
|
||||
Joining is the process of taking an uninitialized Vault node and making it a
|
||||
member of an existing cluster. In order to authenticate the new node to the
|
||||
cluster it must use the same seal mechanism. If using a Auto Unseal the node
|
||||
must be configured to use the same KMS provider and Key as the cluster it's
|
||||
attempting to join. If using a Shamir seal the unseal keys must be provided to
|
||||
the new node before the join process can complete. Once a node has successfully
|
||||
joined, data from the active node can begin to replicate to it. Once a node has
|
||||
been joined it cannot be re-joined to a different cluster.
|
||||
|
||||
You can either join the node automatically via the config file or manually through the
|
||||
API (both methods described below). When joining a node, the API address of the leader node must be used. We
|
||||
recommend setting the [api_addr](/docs/concepts/ha/#direct-access) configuration
|
||||
option on all nodes to make joining simpler.
|
||||
|
||||
#### retry_join Configuration
|
||||
|
||||
This method enables setting one, or more, target leader nodes in the config file. When an
|
||||
uninitialized Vault server starts up it will attempt to join each potential
|
||||
leader that is defined, retrying until successful. When one of the specified
|
||||
leaders become active this node will successfully join. When using Shamir seal,
|
||||
the joined nodes will still need to be unsealed manually. When using Auto Unseal
|
||||
the node will be able to join and unseal automatically.
|
||||
|
||||
An example [retry_join](/docs/configuration/storage/raft/#retry_join-stanza)
|
||||
config can be seen below:
|
||||
|
||||
```hcl
|
||||
storage "raft" {
|
||||
path = "/var/raft/"
|
||||
node_id = "node3"
|
||||
retry_join {
|
||||
leader_api_addr = "https://node1.vault.local:8200"
|
||||
}
|
||||
retry_join {
|
||||
leader_api_addr = "https://node2.vault.local:8200"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Join from the CLI
|
||||
|
||||
Alternatively you can use the [join CLI
|
||||
command](/docs/commands/operator/raft/#join) or the API to join a node. The
|
||||
active node's API address will need to be specified:
|
||||
|
||||
```shell
|
||||
$ vault operator raft join https://node1.vault.local:8200
|
||||
```
|
||||
|
||||
#### Non-Voting Nodes (Enterprise Only)
|
||||
|
||||
Nodes that are joined to a cluster can be specified as non-voters. A non-voting
|
||||
node has all of Vault's data replicated to it, but does not contribute to the
|
||||
quorum count. This can be used in conjunction with [Performance
|
||||
Standby](/docs/enterprise/performance-standby/) nodes to add read scalability to
|
||||
a cluster in cases where a high volume of reads to servers are needed.
|
||||
|
||||
```shell
|
||||
$ vault operator raft join -non-voter https://node1.vault.local:8200
|
||||
```
|
||||
|
||||
### Removing Peers
|
||||
|
||||
Removing a peer node is a necessary step when you no longer want the node in the
|
||||
cluster. This could happen if the node is rotated for a new one, the hostname
|
||||
permanently changes and can no longer be accessed, you're attempting to shrink
|
||||
the size of the cluster, or for many other reasons. Removing the peer will
|
||||
ensure the cluster stays at the desired size, and that quorum is maintained.
|
||||
|
||||
To remove the peer you can issue a
|
||||
[remove-peer](/docs/commands/operator/raft/#remove-peer) command and provide the
|
||||
node ID you wish to remove:
|
||||
|
||||
```shell
|
||||
$ vault operator raft remove-peer node1
|
||||
Peer removed successfully!
|
||||
```
|
||||
|
||||
### Listing Peers
|
||||
|
||||
To see the current peer set for the cluster you can issue a
|
||||
[list-peers](/docs/commands/operator/raft/#list-peers) command. All the voting
|
||||
nodes that are listed here contribute to the quorum and a majority must be alive
|
||||
for integrated storage to continue to operate.
|
||||
|
||||
```shell
|
||||
$ vault operator raft list-peers
|
||||
Node Address State Voter
|
||||
---- ------- ----- -----
|
||||
node1 node1.vault.local:8201 follower true
|
||||
node2 node2.vault.local:8201 follower true
|
||||
node3 node3.vault.local:8201 leader true
|
||||
```
|
||||
|
||||
## Server-to-Server Communication
|
||||
|
||||
Once nodes are joined to one another they begin to communicate using mTLS over
|
||||
Vault's cluster port. The cluster port defaults to `8201`. The TLS information
|
||||
is exchanged at join time and is rotated on a cadence.
|
||||
|
||||
A requirement for integrated storage is that the
|
||||
[cluster_addr](/docs/concepts/ha/#per-node-cluster-address) configuration option
|
||||
is set. This allows Vault to assign an address to the node ID at join time.
|
||||
|
||||
## Outage Recovery
|
||||
|
||||
### Quorum Maintained
|
||||
|
||||
This section outlines the steps to take when a single server or multiple servers
|
||||
are in a failed state but quorum is still maintained. This means the remaining
|
||||
alive servers are still operational, can elect a leader, and are able to process
|
||||
write requests.
|
||||
|
||||
If the failed server is recoverable, the best option is to bring it back online
|
||||
and have it reconnect to the cluster with the same host address. This will return
|
||||
the cluster to a fully healthy state.
|
||||
|
||||
If this is impractical, you need to remove the failed server. Usually, you can
|
||||
issue a remove-peer command to remove the failed server if it's still a member
|
||||
of the cluster.
|
||||
|
||||
If the remove-peer command isn't possible or you'd rather manually re-write the
|
||||
cluster membership a `raft/peers.json` file can be written to the configured
|
||||
data directory.
|
||||
|
||||
### Quorum Lost
|
||||
|
||||
In the event that multiple servers are lost, causing a loss of quorum and a
|
||||
complete outage, partial recovery is still possible.
|
||||
|
||||
If the failed servers are recoverable, the best option is to bring them back
|
||||
online and have them reconnect to the cluster using the same host addresses.
|
||||
This will return the cluster to a fully healthy state.
|
||||
|
||||
If the failed servers are not recoverable, partial recovery is possible using
|
||||
data on the remaining servers in the cluster. There may be data loss in this
|
||||
situation because multiple servers were lost, so information about what's
|
||||
committed could be incomplete. The recovery process implicitly commits all
|
||||
outstanding Raft log entries, so it's also possible to commit data that was
|
||||
uncommitted before the failure.
|
||||
|
||||
See the section below on manual recovery using peers.json for details of the
|
||||
recovery procedure. You include only the remaining servers in the
|
||||
raft/peers.json recovery file. The cluster should be able to elect a leader once
|
||||
the remaining servers are all restarted with an identical raft/peers.json
|
||||
configuration.
|
||||
|
||||
Any servers you introduce later can be fresh with totally clean data
|
||||
directories and joined using Vault's join command.
|
||||
|
||||
In extreme cases, it should be possible to recover with just a single remaining
|
||||
server by starting that single server with itself as the only peer in the
|
||||
raft/peers.json recovery file.
|
||||
|
||||
### Manual Recovery Using peers.json
|
||||
|
||||
Using raft/peers.json for recovery can cause uncommitted Raft log entries to be
|
||||
implicitly committed, so this should only be used after an outage where no other
|
||||
option is available to recover a lost server. Make sure you don't have any
|
||||
automated processes that will put the peers file in place on a periodic basis.
|
||||
|
||||
To begin, stop all remaining servers.
|
||||
|
||||
The next step is to go to the [configured data
|
||||
path](/docs/configuration/storage/raft/#path) of each Vault server. Inside that
|
||||
directory, there will be a raft/ sub-directory. We need to create a
|
||||
raft/peers.json file. The file should be formatted as a JSON array containing
|
||||
the node ID, address:port, and suffrage information of each Vault server you
|
||||
wish to be in the cluster.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "node1",
|
||||
"address": "node1.vault.local:8201",
|
||||
"non_voter": false
|
||||
},
|
||||
{
|
||||
"id": "node2",
|
||||
"address": "node2.vault.local:8201",
|
||||
"non_voter": false
|
||||
},
|
||||
{
|
||||
"id": "node3",
|
||||
"address": "node3.vault.local:8201",
|
||||
"non_voter": false
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
* id (string: <required>) - Specifies the node ID of the server. This can be
|
||||
found in the config file, or inside the node-id file in the server's data
|
||||
directory if it was auto-generated.
|
||||
* address (string: <required>) - Specifies the host and port of the server. The
|
||||
port is the server's cluster port.
|
||||
* non_voter (bool: <false>) - This controls whether the server is a non-voter.
|
||||
If omitted, it will default to false, which is typical for most clusters. This
|
||||
is an enterprise only feature.
|
||||
|
||||
Create entries for all servers. You must confirm that servers you do not
|
||||
include here have indeed failed and will not later rejoin the cluster. Ensure
|
||||
that this file is the same across all remaining server nodes.
|
||||
|
||||
At this point, you can restart all the remaining servers. The cluster should be
|
||||
in an operable state again. One of the nodes should claim leadership and become
|
||||
active.
|
||||
|
||||
### Other Recovery Methods
|
||||
|
||||
For other, non-quorum related recovery [Vault's
|
||||
recovery](/docs/concepts/recovery-mode/) mode can be used.
|
Loading…
Reference in New Issue