docs: Add integrated storage concepts page (#8673)

* docs: Add integrated storage concepts page * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Vishal Nayak <vishalnayak@users.noreply.github.com> * Review feedback and add recovery information * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Update website/pages/docs/concepts/integrated-storage.mdx Co-Authored-By: Meggie <m.ladlow@gmail.com> * Review feedback Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com> Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com> Co-authored-by: Meggie <m.ladlow@gmail.com>
2020-04-03 15:05:55 -07:00 · 2020-04-03 15:05:55 -07:00 · 1a340a87cb
parent 69118a2be8
commit 1a340a87cb
2 changed files with 251 additions and 0 deletions
--- a/website/data/docs-navigation.js
+++ b/website/data/docs-navigation.js
@ -34,6 +34,7 @@ export default [
      'response-wrapping',
      'policies',
      'ha',
+      'integrated-storage',
      'pgp-gpg-keybase',
      'recovery-mode'
    ]
--- a/website/pages/docs/concepts/integrated-storage.mdx
+++ b/website/pages/docs/concepts/integrated-storage.mdx
@ -0,0 +1,250 @@
+---
+layout: docs
+page_title: Integrated Storage
+sidebar_title: Integrated Storage
+description: Learn about the integrated raft storage in Vault.
+---
+
+# Integrated Storage
+
+Vault supports a number of storage options for the durable storage of Vault's
+information. As of Vault 1.4 an integrated storage option is offered. This
+storage backend does not rely on any third party systems, implements high
+availability semantics, supports Enterprise Replication features, and provides
+backup/restore workflows.
+
+The integrated storage option stores Vault's data on the server's filesystem and
+uses a consensus protocol to replicate data to each server in the cluster. More
+information on the internals of integrated storage can be found in the
+[integrated storage internals
+documentation](/docs/internals/integrated-storage/). Additionally, the 
+[Configuration](/docs/configuration/storage/raft/) docs can help in configuring
+Vault to use integrated storage.
+
+The sections below go into various details on how to operate Vault with
+integrated storage.
+
+## Cluster Membership
+
+This section will outline how to bootstrap and manage a cluster of Vault nodes
+running integrated storage. 
+
+Integrated storage is bootstrapped during the [initialization
+process](https://learn.hashicorp.com/vault/getting-started/deploy#initializing-the-vault),
+and results in a cluster of size 1. Depending on the [desired deployment
+size](/docs/internals/integrated-storage/#deployment-table), nodes can be joined
+to the active Vault node.
+
+### Joining Nodes
+
+Joining is the process of taking an uninitialized Vault node and making it a
+member of an existing cluster. In order to authenticate the new node to the
+cluster it must use the same seal mechanism. If using a Auto Unseal the node
+must be configured to use the same KMS provider and Key as the cluster it's
+attempting to join. If using a Shamir seal the unseal keys must be provided to
+the new node before the join process can complete. Once a node has successfully
+joined, data from the active node can begin to replicate to it. Once a node has
+been joined it cannot be re-joined to a different cluster.
+
+You can either join the node automatically via the config file or manually through the
+API (both methods described below). When joining a node, the API address of the leader node must be used. We
+recommend setting the [api_addr](/docs/concepts/ha/#direct-access) configuration
+option on all nodes to make joining simpler. 
+
+#### retry_join Configuration
+
+This method enables setting one, or more, target leader nodes in the config file. When an
+uninitialized Vault server starts up it will attempt to join each potential
+leader that is defined, retrying until successful. When one of the specified
+leaders become active this node will successfully join. When using Shamir seal,
+the joined nodes will still need to be unsealed manually. When using Auto Unseal
+the node will be able to join and unseal automatically.
+
+An example [retry_join](/docs/configuration/storage/raft/#retry_join-stanza)
+config can be seen below:
+
+```hcl
+storage "raft" {
+  path    = "/var/raft/"
+  node_id = "node3"
+  retry_join {
+    leader_api_addr = "https://node1.vault.local:8200"
+  }
+  retry_join {
+    leader_api_addr = "https://node2.vault.local:8200"
+  }
+}
+```
+
+#### Join from the CLI
+
+Alternatively you can use the [join CLI
+command](/docs/commands/operator/raft/#join) or the API to join a node. The
+active node's API address will need to be specified:
+
+```shell
+$ vault operator raft join https://node1.vault.local:8200
+```
+
+#### Non-Voting Nodes (Enterprise Only)
+
+Nodes that are joined to a cluster can be specified as non-voters. A non-voting
+node has all of Vault's data replicated to it, but does not contribute to the
+quorum count. This can be used in conjunction with [Performance
+Standby](/docs/enterprise/performance-standby/) nodes to add read scalability to
+a cluster in cases where a high volume of reads to servers are needed.
+
+```shell
+$ vault operator raft join -non-voter https://node1.vault.local:8200
+```
+
+### Removing Peers
+
+Removing a peer node is a necessary step when you no longer want the node in the
+cluster. This could happen if the node is rotated for a new one, the hostname
+permanently changes and can no longer be accessed, you're attempting to shrink
+the size of the cluster, or for many other reasons. Removing the peer will
+ensure the cluster stays at the desired size, and that quorum is maintained. 
+
+To remove the peer you can issue a
+[remove-peer](/docs/commands/operator/raft/#remove-peer) command and provide the
+node ID you wish to remove:
+
+```shell
+$ vault operator raft remove-peer node1
+Peer removed successfully!
+```
+
+### Listing Peers
+
+To see the current peer set for the cluster you can issue a
+[list-peers](/docs/commands/operator/raft/#list-peers) command. All the voting
+nodes that are listed here contribute to the quorum and a majority must be alive
+for integrated storage to continue to operate. 
+
+```shell
+$ vault operator raft list-peers
+Node     Address                   State       Voter
+----     -------                   -----       -----
+node1    node1.vault.local:8201    follower    true
+node2    node2.vault.local:8201    follower    true
+node3    node3.vault.local:8201    leader      true
+```
+
+## Server-to-Server Communication
+
+Once nodes are joined to one another they begin to communicate using mTLS over
+Vault's cluster port. The cluster port defaults to `8201`. The TLS information
+is exchanged at join time and is rotated on a cadence. 
+
+A requirement for integrated storage is that the
+[cluster_addr](/docs/concepts/ha/#per-node-cluster-address) configuration option
+is set. This allows Vault to assign an address to the node ID at join time.
+
+## Outage Recovery
+
+### Quorum Maintained
+
+This section outlines the steps to take when a single server or multiple servers
+are in a failed state but quorum is still maintained. This means the remaining
+alive servers are still operational, can elect a leader, and are able to process
+write requests. 
+
+If the failed server is recoverable, the best option is to bring it back online
+and have it reconnect to the cluster with the same host address. This will return
+the cluster to a fully healthy state. 
+
+If this is impractical, you need to remove the failed server. Usually, you can
+issue a remove-peer command to remove the failed server if it's still a member
+of the cluster.
+
+If the remove-peer command isn't possible or you'd rather manually re-write the
+cluster membership a `raft/peers.json` file can be written to the configured
+data directory.
+
+### Quorum Lost
+
+In the event that multiple servers are lost, causing a loss of quorum and a
+complete outage, partial recovery is still possible.
+
+If the failed servers are recoverable, the best option is to bring them back
+online and have them reconnect to the cluster using the same host addresses.
+This will return the cluster to a fully healthy state. 
+
+If the failed servers are not recoverable, partial recovery is possible using
+data on the remaining servers in the cluster. There may be data loss in this
+situation because multiple servers were lost, so information about what's
+committed could be incomplete. The recovery process implicitly commits all
+outstanding Raft log entries, so it's also possible to commit data that was
+uncommitted before the failure.
+
+See the section below on manual recovery using peers.json for details of the
+recovery procedure. You include only the remaining servers in the
+raft/peers.json recovery file. The cluster should be able to elect a leader once
+the remaining servers are all restarted with an identical raft/peers.json
+configuration.
+
+Any servers you introduce later can be fresh with totally clean data
+directories and joined using Vault's join command.
+
+In extreme cases, it should be possible to recover with just a single remaining
+server by starting that single server with itself as the only peer in the
+raft/peers.json recovery file.
+
+### Manual Recovery Using peers.json
+
+Using raft/peers.json for recovery can cause uncommitted Raft log entries to be
+implicitly committed, so this should only be used after an outage where no other
+option is available to recover a lost server. Make sure you don't have any
+automated processes that will put the peers file in place on a periodic basis.
+
+To begin, stop all remaining servers. 
+
+The next step is to go to the [configured data
+path](/docs/configuration/storage/raft/#path) of each Vault server. Inside that
+directory, there will be a raft/ sub-directory. We need to create a
+raft/peers.json file. The file should be formatted as a JSON array containing
+the node ID, address:port, and suffrage information of each Vault server you
+wish to be in the cluster.
+
+```json
+[
+  {
+    "id": "node1",
+    "address": "node1.vault.local:8201",
+    "non_voter": false
+  },
+  {
+    "id": "node2",
+    "address": "node2.vault.local:8201",
+    "non_voter": false
+  },
+  {
+    "id": "node3",
+    "address": "node3.vault.local:8201",
+    "non_voter": false
+  }
+]
+```
+
+* id (string: <required>) - Specifies the node ID of the server. This can be
+  found in the config file, or inside the node-id file in the server's data
+  directory if it was auto-generated.
+* address (string: <required>) - Specifies the host and port of the server. The
+  port is the server's cluster port.
+* non_voter (bool: <false>) - This controls whether the server is a non-voter.
+  If omitted, it will default to false, which is typical for most clusters. This
+  is an enterprise only feature.
+
+Create entries for all servers. You must confirm that servers you do not
+include here have indeed failed and will not later rejoin the cluster. Ensure
+that this file is the same across all remaining server nodes.
+
+At this point, you can restart all the remaining servers. The cluster should be
+in an operable state again. One of the nodes should claim leadership and become
+active. 
+
+### Other Recovery Methods
+
+For other, non-quorum related recovery [Vault's
+recovery](/docs/concepts/recovery-mode/) mode can be used.