open-vault/website/content/docs/internals/architecture.mdx
Bryce Kalow b76a56d40c
feat(website): migrates nav data format and updates docs pages (#11242)
* migrates nav data format and updates docs pages

* removes sidebar_title from content files
2021-04-06 13:49:04 -04:00

174 lines
9.3 KiB
Plaintext

---
layout: docs
page_title: Architecture
description: Learn about the internal architecture of Vault.
---
# Architecture
Vault is a complex system that has many different pieces. To help both users and
developers of Vault build a mental model of how it works, this page documents
the system architecture.
~> **Advanced Topic!** This page covers technical details of Vault. You don't
need to understand these details to effectively use Vault. The details are
documented here for those who wish to learn about them without having to go
spelunking through the source code. However, if you're an operator of Vault, we
recommend learning about the architecture due to the importance of Vault in an
environment.
# Glossary
Before describing the architecture, we provide a glossary of terms to help
clarify what is being discussed:
- **Storage Backend** - A storage backend is responsible for durable storage of
_encrypted_ data. Backends are not trusted by Vault and are only expected to
provide durability. The storage backend is configured when starting the Vault
server.
- **Barrier** - The barrier is cryptographic steel and concrete around the
Vault. All data that flows between Vault and the storage backend passes
through the barrier. The barrier ensures that only encrypted data is written
out, and that data is verified and decrypted on the way in. Much like a bank
vault, the barrier must be "unsealed" before anything inside can be accessed.
- **Secrets Engine** - A secrets engine is responsible for managing secrets.
Simple secrets engines like the "kv" secrets engine simply return the same
secret when queried. Some secrets engines support using policies to
dynamically generate a secret each time they are queried. This allows for
unique secrets to be used which allows Vault to do fine-grained revocation and
policy updates. As an example, a MySQL secrets engine could be configured with
a "web" policy. When the "web" secret is read, a new MySQL user/password pair
will be generated with a limited set of privileges for the web server.
- **Audit Device** - An audit device is responsible for managing audit logs.
Every request to Vault and response from Vault goes through the configured
audit devices. This provides a simple way to integrate Vault with multiple
audit logging destinations of different types.
- **Auth Method** - An auth method is used to authenticate users or applications
which are connecting to Vault. Once authenticated, the auth method returns the
list of applicable policies which should be applied. Vault takes an
authenticated user and returns a client token that can be used for future
requests. As an example, the `userpass` auth method uses a username and
password to authenticate the user. Alternatively, the `github` auth method
allows users to authenticate via GitHub.
- **Client Token** - A client token (aka "Vault Token") is conceptually
similar to a session cookie on a web site. Once a user authenticates, Vault
returns a client token which is used for future requests. The token is used by
Vault to verify the identity of the client and to enforce the applicable ACL
policies. This token is passed via HTTP headers.
- **Secret** - A secret is the term for anything returned by Vault which
contains confidential or cryptographic material. Not everything returned by
Vault is a secret, for example system configuration, status information, or
policies are not considered secrets. Secrets always have an associated lease.
This means clients cannot assume that the secret contents can be used
indefinitely. Vault will revoke a secret at the end of the lease, and an
operator may intervene to revoke the secret before the lease is over. This
contract between Vault and its clients is critical, as it allows for changes
in keys and policies without manual intervention.
- **Server** - Vault depends on a long-running instance which operates as a
server. The Vault server provides an API which clients interact with and
manages the interaction between all the secrets engines, ACL enforcement, and
secret lease revocation. Having a server based architecture decouples clients
from the security keys and policies, enables centralized audit logging and
simplifies administration for operators.
# High-Level Overview
A very high level overview of Vault looks like this:
[![Architecture Overview](/img/layers.png)](/img/layers.png)
Let's begin to break down this picture. There is a clear separation of
components that are inside or outside of the security barrier. Only the storage
backend and the HTTP API are outside, all other components are inside the
barrier.
The storage backend is untrusted and is used to durably store encrypted data.
When the Vault server is started, it must be provided with a storage backend so
that data is available across restarts. The HTTP API similarly must be started
by the Vault server on start so that clients can interact with it.
Once started, the Vault is in a _sealed_ state. Before any operation can be
performed on the Vault it must be unsealed. This is done by providing the unseal
keys. When the Vault is initialized it generates an encryption key which is used
to protect all the data. That key is protected by a master key. By default,
Vault uses a technique known as [Shamir's secret sharing
algorithm](https://en.wikipedia.org/wiki/Shamir's_Secret_Sharing) to split the
master key into 5 shares, any 3 of which are required to reconstruct the master
key.
[![Vault Shamir Secret Sharing Algorithm](/img/vault-shamir-secret-sharing.svg)](/img/vault-shamir-secret-sharing.svg)
The number of shares and the minimum threshold required can both be specified.
Shamir's technique can be disabled, and the master key used directly for
unsealing. Once Vault retrieves the encryption key, it is able to decrypt the
data in the storage backend, and enters the _unsealed_ state. Once unsealed,
Vault loads all of the configured audit devices, auth methods, and secrets
engines.
The configuration of those audit devices, auth methods, and secrets engines must
be stored in Vault since they are security sensitive. Only users with the
correct permissions should be able to modify them, meaning they cannot be
specified outside of the barrier. By storing them in Vault, any changes to them
are protected by the ACL system and tracked by audit logs.
After the Vault is unsealed, requests can be processed from the HTTP API to the
Core. The core is used to manage the flow of requests through the system,
enforce ACLs, and ensure audit logging is done.
When a client first connects to Vault, it needs to authenticate. Vault provides
configurable auth methods providing flexibility in the authentication mechanism
used. Human friendly mechanisms such as username/password or GitHub might be
used for operators, while applications may use public/private keys or tokens to
authenticate. An authentication request flows through core and into an auth
method, which determines if the request is valid and returns a list of
associated policies.
Policies are just a named ACL rule. For example, the "root" policy is built-in
and permits access to all resources. You can create any number of named policies
with fine-grained control over paths. Vault operates exclusively in a whitelist
mode, meaning that unless access is explicitly granted via a policy, the action
is not allowed. Since a user may have multiple policies associated, an action is
allowed if any policy permits it. Policies are stored and managed by an internal
policy store. This internal store is manipulated through the system backend,
which is always mounted at `sys/`.
Once authentication takes place and an auth method provides a set of applicable
policies, a new client token is generated and managed by the token store. This
client token is sent back to the client, and is used to make future requests.
This is similar to a cookie sent by a website after a user logs in. The client
token may have a lease associated with it depending on the auth method
configuration. This means the client token may need to be periodically renewed
to avoid invalidation.
Once authenticated, requests are made providing the client token. The token is
used to verify the client is authorized and to load the relevant policies. The
policies are used to authorize the client request. The request is then routed to
the secrets engine, which is processed depending on its type. If the secrets
engine returns a secret, the core registers it with the expiration manager and
attaches a lease ID. The lease ID is used by clients to renew or revoke their
secret. If a client allows the lease to expire, the expiration manager
automatically revokes the secret.
The core handles logging of requests and responses to the audit broker, which
fans the request out to all the configured audit devices. Outside of the request
flow, the core performs certain background activity. Lease management is
critical, as it allows expired client tokens or secrets to be revoked
automatically. Additionally, Vault handles certain partial failure cases by
using write ahead logging with a rollback manager. This is managed transparently
within the core and is not user visible.
# Getting in Depth
This has been a brief high-level overview of the architecture of Vault. There
are more details available for each of the sub-systems.
For other details, either consult the code, ask in IRC or reach out to the
mailing list.