website: working on internals documentation
This commit is contained in:
parent
35881dc469
commit
9a0d50bdc2
|
@ -126,6 +126,7 @@ ensuring PCI compliant workloads run on appropriate servers.
|
||||||
# Getting in Depth
|
# Getting in Depth
|
||||||
|
|
||||||
This has been a brief high-level overview of the architecture of Nomad. There
|
This has been a brief high-level overview of the architecture of Nomad. There
|
||||||
are more details available for each of the sub-systems.
|
are more details available for each of the sub-systems. The [consensus protocol](/docs/internals/consensus.html) is
|
||||||
|
documented in detail as is the [gossip protocol](/docs/internals/gossip.html).
|
||||||
|
|
||||||
For other details, either consult the code, ask in IRC or reach out to the mailing list.
|
For other details, either consult the code, ask in IRC or reach out to the mailing list.
|
||||||
|
|
205
website/source/docs/internals/consensus.html.md
Normal file
205
website/source/docs/internals/consensus.html.md
Normal file
|
@ -0,0 +1,205 @@
|
||||||
|
---
|
||||||
|
layout: "docs"
|
||||||
|
page_title: "Consensus Protocol"
|
||||||
|
sidebar_current: "docs-internals-consensus"
|
||||||
|
description: |-
|
||||||
|
Nomad uses a consensus protocol to provide Consistency as defined by CAP. The consensus protocol is based on Raft: In search of an Understandable Consensus Algorithm. For a visual explanation of Raft, see The Secret Lives of Data.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Consensus Protocol
|
||||||
|
|
||||||
|
Nomad uses a [consensus protocol](http://en.wikipedia.org/wiki/Consensus_(computer_science))
|
||||||
|
to provide [Consistency (as defined by CAP)](http://en.wikipedia.org/wiki/CAP_theorem).
|
||||||
|
The consensus protocol is based on
|
||||||
|
["Raft: In search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf).
|
||||||
|
For a visual explanation of Raft, see [The Secret Lives of Data](http://thesecretlivesofdata.com/raft).
|
||||||
|
|
||||||
|
~> **Advanced Topic!** This page covers technical details of
|
||||||
|
the internals of Nomad. You don't need to know these details to effectively
|
||||||
|
operate and use Nomad. These details are documented here for those who wish
|
||||||
|
to learn about them without having to go spelunking through the source code.
|
||||||
|
|
||||||
|
## Raft Protocol Overview
|
||||||
|
|
||||||
|
Raft is a consensus algorithm that is based on
|
||||||
|
[Paxos](http://en.wikipedia.org/wiki/Paxos_%28computer_science%29). Compared
|
||||||
|
to Paxos, Raft is designed to have fewer states and a simpler, more
|
||||||
|
understandable algorithm.
|
||||||
|
|
||||||
|
There are a few key terms to know when discussing Raft:
|
||||||
|
|
||||||
|
* Log - The primary unit of work in a Raft system is a log entry. The problem
|
||||||
|
of consistency can be decomposed into a *replicated log*. A log is an ordered
|
||||||
|
sequence of entries. We consider the log consistent if all members agree on
|
||||||
|
the entries and their order.
|
||||||
|
|
||||||
|
* FSM - [Finite State Machine](http://en.wikipedia.org/wiki/Finite-state_machine).
|
||||||
|
An FSM is a collection of finite states with transitions between them. As new logs
|
||||||
|
are applied, the FSM is allowed to transition between states. Application of the
|
||||||
|
same sequence of logs must result in the same state, meaning behavior must be deterministic.
|
||||||
|
|
||||||
|
* Peer set - The peer set is the set of all members participating in log replication.
|
||||||
|
For Nomad's purposes, all server nodes are in the peer set of the local region.
|
||||||
|
|
||||||
|
* Quorum - A quorum is a majority of members from a peer set: for a set of size `n`,
|
||||||
|
quorum requires at least `(n/2)+1` members.
|
||||||
|
For example, if there are 5 members in the peer set, we would need 3 nodes
|
||||||
|
to form a quorum. If a quorum of nodes is unavailable for any reason, the
|
||||||
|
cluster becomes *unavailable* and no new logs can be committed.
|
||||||
|
|
||||||
|
* Committed Entry - An entry is considered *committed* when it is durably stored
|
||||||
|
on a quorum of nodes. Once an entry is committed it can be applied.
|
||||||
|
|
||||||
|
* Leader - At any given time, the peer set elects a single node to be the leader.
|
||||||
|
The leader is responsible for ingesting new log entries, replicating to followers,
|
||||||
|
and managing when an entry is considered committed.
|
||||||
|
|
||||||
|
Raft is a complex protocol and will not be covered here in detail (for those who
|
||||||
|
desire a more comprehensive treatment, the full specification is available in this
|
||||||
|
[paper](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf)).
|
||||||
|
We will, however, attempt to provide a high level description which may be useful
|
||||||
|
for building a mental model.
|
||||||
|
|
||||||
|
Raft nodes are always in one of three states: follower, candidate, or leader. All
|
||||||
|
nodes initially start out as a follower. In this state, nodes can accept log entries
|
||||||
|
from a leader and cast votes. If no entries are received for some time, nodes
|
||||||
|
self-promote to the candidate state. In the candidate state, nodes request votes from
|
||||||
|
their peers. If a candidate receives a quorum of votes, then it is promoted to a leader.
|
||||||
|
The leader must accept new log entries and replicate to all the other followers.
|
||||||
|
In addition, if stale reads are not acceptable, all queries must also be performed on
|
||||||
|
the leader.
|
||||||
|
|
||||||
|
Once a cluster has a leader, it is able to accept new log entries. A client can
|
||||||
|
request that a leader append a new log entry (from Raft's perspective, a log entry
|
||||||
|
is an opaque binary blob). The leader then writes the entry to durable storage and
|
||||||
|
attempts to replicate to a quorum of followers. Once the log entry is considered
|
||||||
|
*committed*, it can be *applied* to a finite state machine. The finite state machine
|
||||||
|
is application specific; in Nomad's case, we use
|
||||||
|
[MemDB](https://github.com/hashicorp/go-memdb) to maintain cluster state.
|
||||||
|
|
||||||
|
Obviously, it would be undesirable to allow a replicated log to grow in an unbounded
|
||||||
|
fashion. Raft provides a mechanism by which the current state is snapshotted and the
|
||||||
|
log is compacted. Because of the FSM abstraction, restoring the state of the FSM must
|
||||||
|
result in the same state as a replay of old logs. This allows Raft to capture the FSM
|
||||||
|
state at a point in time and then remove all the logs that were used to reach that
|
||||||
|
state. This is performed automatically without user intervention and prevents unbounded
|
||||||
|
disk usage while also minimizing time spent replaying logs. One of the advantages of
|
||||||
|
using MemDB is that it allows Nomad to continue accepting new transactions even while
|
||||||
|
old state is being snapshotted, preventing any availability issues.
|
||||||
|
|
||||||
|
Consensus is fault-tolerant up to the point where quorum is available.
|
||||||
|
If a quorum of nodes is unavailable, it is impossible to process log entries or reason
|
||||||
|
about peer membership. For example, suppose there are only 2 peers: A and B. The quorum
|
||||||
|
size is also 2, meaning both nodes must agree to commit a log entry. If either A or B
|
||||||
|
fails, it is now impossible to reach quorum. This means the cluster is unable to add
|
||||||
|
or remove a node or to commit any additional log entries. This results in
|
||||||
|
*unavailability*. At this point, manual intervention would be required to remove
|
||||||
|
either A or B and to restart the remaining node in bootstrap mode.
|
||||||
|
|
||||||
|
A Raft cluster of 3 nodes can tolerate a single node failure while a cluster
|
||||||
|
of 5 can tolerate 2 node failures. The recommended configuration is to either
|
||||||
|
run 3 or 5 Nomad servers per region. This maximizes availability without
|
||||||
|
greatly sacrificing performance. The [deployment table](#deployment_table) below
|
||||||
|
summarizes the potential cluster size options and the fault tolerance of each.
|
||||||
|
|
||||||
|
In terms of performance, Raft is comparable to Paxos. Assuming stable leadership,
|
||||||
|
committing a log entry requires a single round trip to half of the cluster.
|
||||||
|
Thus, performance is bound by disk I/O and network latency.
|
||||||
|
|
||||||
|
## Raft in Nomad
|
||||||
|
|
||||||
|
Only Nomad server nodes participate in Raft and are part of the peer set. All
|
||||||
|
client nodes forward requests to servers. The clients in Nomad only need to know
|
||||||
|
about their allocations and query that information from the servers, while the
|
||||||
|
servers need to maintain the global state of the cluster.
|
||||||
|
|
||||||
|
Since all servers participate as part of the peer set, they all know the current
|
||||||
|
leader. When an RPC request arrives at a non-leader server, the request is
|
||||||
|
forwarded to the leader. If the RPC is a *query* type, meaning it is read-only,
|
||||||
|
the leader generates the result based on the current state of the FSM. If
|
||||||
|
the RPC is a *transaction* type, meaning it modifies state, the leader
|
||||||
|
generates a new log entry and applies it using Raft. Once the log entry is committed
|
||||||
|
and applied to the FSM, the transaction is complete.
|
||||||
|
|
||||||
|
Because of the nature of Raft's replication, performance is sensitive to network
|
||||||
|
latency. For this reason, each region elects an independent leader and maintains
|
||||||
|
a disjoint peer set. Data is partitioned by region, so each leader is responsible
|
||||||
|
only for data in their region. When a request is received for a remote region,
|
||||||
|
the request is forwarded to the correct leader. This design allows for lower latency
|
||||||
|
transactions and higher availability without sacrificing consistency.
|
||||||
|
|
||||||
|
## Consistency Modes
|
||||||
|
|
||||||
|
Although all writes to the replicated log go through Raft, reads are more
|
||||||
|
flexible. To support various trade-offs that developers may want, Nomad
|
||||||
|
supports 2 different consistency modes for reads.
|
||||||
|
|
||||||
|
The two read modes are:
|
||||||
|
|
||||||
|
* `default` - Raft makes use of leader leasing, providing a time window
|
||||||
|
in which the leader assumes its role is stable. However, if a leader
|
||||||
|
is partitioned from the remaining peers, a new leader may be elected
|
||||||
|
while the old leader is holding the lease. This means there are 2 leader
|
||||||
|
nodes. There is no risk of a split-brain since the old leader will be
|
||||||
|
unable to commit new logs. However, if the old leader services any reads,
|
||||||
|
the values are potentially stale. The default consistency mode relies only
|
||||||
|
on leader leasing, exposing clients to potentially stale values. We make
|
||||||
|
this trade-off because reads are fast, usually strongly consistent, and
|
||||||
|
only stale in a hard-to-trigger situation. The time window of stale reads
|
||||||
|
is also bounded since the leader will step down due to the partition.
|
||||||
|
|
||||||
|
* `stale` - This mode allows any server to service the read regardless of if
|
||||||
|
it is the leader. This means reads can be arbitrarily stale but are generally
|
||||||
|
within 50 milliseconds of the leader. The trade-off is very fast and scalable
|
||||||
|
reads but with stale values. This mode allows reads without a leader meaning
|
||||||
|
a cluster that is unavailable will still be able to respond.
|
||||||
|
|
||||||
|
## <a name="deployment_table"></a>Deployment Table
|
||||||
|
|
||||||
|
Below is a table that shows quorum size and failure tolerance for various
|
||||||
|
cluster sizes. The recommended deployment is either 3 or 5 servers. A single
|
||||||
|
server deployment is _**highly**_ discouraged as data loss is inevitable in a
|
||||||
|
failure scenario.
|
||||||
|
|
||||||
|
<table class="table table-bordered table-striped">
|
||||||
|
<tr>
|
||||||
|
<th>Servers</th>
|
||||||
|
<th>Quorum Size</th>
|
||||||
|
<th>Failure Tolerance</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>1</td>
|
||||||
|
<td>1</td>
|
||||||
|
<td>0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>2</td>
|
||||||
|
<td>2</td>
|
||||||
|
<td>0</td>
|
||||||
|
</tr>
|
||||||
|
<tr class="warning">
|
||||||
|
<td>3</td>
|
||||||
|
<td>2</td>
|
||||||
|
<td>1</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>4</td>
|
||||||
|
<td>3</td>
|
||||||
|
<td>1</td>
|
||||||
|
</tr>
|
||||||
|
<tr class="warning">
|
||||||
|
<td>5</td>
|
||||||
|
<td>3</td>
|
||||||
|
<td>2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>6</td>
|
||||||
|
<td>4</td>
|
||||||
|
<td>2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>7</td>
|
||||||
|
<td>4</td>
|
||||||
|
<td>3</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
34
website/source/docs/internals/gossip.html.md
Normal file
34
website/source/docs/internals/gossip.html.md
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
---
|
||||||
|
layout: "docs"
|
||||||
|
page_title: "Gossip Protocol"
|
||||||
|
sidebar_current: "docs-internals-gossip"
|
||||||
|
description: |-
|
||||||
|
Nomad uses a gossip protocol to manage membership. All of this is provided through the use of the Serf library.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Gossip Protocol
|
||||||
|
|
||||||
|
Nomad uses a [gossip protocol](http://en.wikipedia.org/wiki/Gossip_protocol)
|
||||||
|
to manage membership. This is provided through the use of the [Serf library](https://www.serfdom.io/).
|
||||||
|
The gossip protocol used by Serf is based on
|
||||||
|
["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
|
||||||
|
with a few minor adaptations. There are more details about [Serf's protocol here](https://www.serfdom.io/docs/internals/gossip.html).
|
||||||
|
|
||||||
|
~> **Advanced Topic!** This page covers technical details of
|
||||||
|
the internals of Nomad. You don't need to know these details to effectively
|
||||||
|
operate and use Nomad. These details are documented here for those who wish
|
||||||
|
to learn about them without having to go spelunking through the source code.
|
||||||
|
|
||||||
|
## Gossip in Nomad
|
||||||
|
|
||||||
|
Named makes use of a single global WAN gossip pool that all servers participate in.
|
||||||
|
Membership information provided by the gossip pool allows servers to perform cross region
|
||||||
|
requests. The integrated failure detection allows Nomad to gracefully handle an entire region
|
||||||
|
losing connectivity, or just a single server in a remote region. The gossip protocol
|
||||||
|
is also used to detect servers in the same region to perform automatic clustering
|
||||||
|
via the [consensus protocol](/docs/internals/consensus.html).
|
||||||
|
|
||||||
|
All of these features are provided by leveraging [Serf](https://www.serfdom.io/). It
|
||||||
|
is used as an embedded library to provide these features. From a user perspective,
|
||||||
|
this is not important, since the abstraction should be masked by Nomad. It can be useful
|
||||||
|
however as a developer to understand how this library is leveraged.
|
|
@ -1,49 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "High Availability"
|
|
||||||
sidebar_current: "docs-internals-ha"
|
|
||||||
description: |-
|
|
||||||
Learn about the high availability design of Nomad.
|
|
||||||
---
|
|
||||||
|
|
||||||
# High Availability
|
|
||||||
|
|
||||||
Nomad is primarily used in production environments to manage secrets.
|
|
||||||
As a result, any downtime of the Nomad service can affect downstream clients.
|
|
||||||
Nomad is designed to support a highly available deploy to ensure a machine
|
|
||||||
or process failure is minimally disruptive.
|
|
||||||
|
|
||||||
~> **Advanced Topic!** This page covers technical details
|
|
||||||
of Nomad. You don't need to understand these details to
|
|
||||||
effectively use Nomad. The details are documented here for
|
|
||||||
those who wish to learn about them without having to go
|
|
||||||
spelunking through the source code. However, if you're an
|
|
||||||
operator of Nomad, we recommend learning about the architecture
|
|
||||||
due to the importance of Nomad in an environment.
|
|
||||||
|
|
||||||
# Design Overview
|
|
||||||
|
|
||||||
The primary design goal in making Nomad highly availability (HA) was to
|
|
||||||
minimize downtime and not horizontal scalability. Nomad is typically
|
|
||||||
bound by the IO limits of the storage backend rather than the compute
|
|
||||||
requirements. This simplifies the HA approach and allows more complex
|
|
||||||
coordination to be avoided.
|
|
||||||
|
|
||||||
Certain storage backends, such as Consul, provide additional coordination
|
|
||||||
functions that enable Nomad to run in an HA configuration. When supported
|
|
||||||
by the backend, Nomad will automatically run in HA mode without additional
|
|
||||||
configuration.
|
|
||||||
|
|
||||||
When running in HA mode, Nomad servers have two additional states they
|
|
||||||
can be in: standby and active. For multiple Nomad servers sharing a storage
|
|
||||||
backend, only a single instance will be active at any time while all other
|
|
||||||
instances are hot standbys.
|
|
||||||
|
|
||||||
The active server operates in a standard fashion and processes all requests.
|
|
||||||
The standby servers do not process requests, and instead redirect to the active
|
|
||||||
Nomad. Meanwhile, if the active server is sealed, fails, or loses network connectivity
|
|
||||||
then one of the standbys will take over and become the active instance.
|
|
||||||
|
|
||||||
It is important to note that only _unsealed_ servers act as a standby.
|
|
||||||
If a server is still in the sealed state, then it cannot act as a standby
|
|
||||||
as it would be unable to serve any requests should the active server fail.
|
|
|
@ -9,7 +9,7 @@ description: |-
|
||||||
# Nomad Internals
|
# Nomad Internals
|
||||||
|
|
||||||
This section covers the internals of Nomad and explains the technical
|
This section covers the internals of Nomad and explains the technical
|
||||||
details of how Nomad functions, its architecture and security properties.
|
details of how Nomad functions, its architecture and sub-systems.
|
||||||
|
|
||||||
-> **Note:** Knowledge of Nomad internals is not
|
-> **Note:** Knowledge of Nomad internals is not
|
||||||
required to use Nomad. If you aren't interested in the internals
|
required to use Nomad. If you aren't interested in the internals
|
||||||
|
|
|
@ -1,58 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Key Rotation"
|
|
||||||
sidebar_current: "docs-internals-rotation"
|
|
||||||
description: |-
|
|
||||||
Learn about the details of key rotation within Nomad.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Key Rotation
|
|
||||||
|
|
||||||
Nomad has multiple encryption keys that are used for various purposes. These keys support
|
|
||||||
rotation so that they can be periodically changed or in response to a potential leak or
|
|
||||||
compromise. It is useful to first understand the
|
|
||||||
[high-level architecture](/docs/internals/architecture.html) before learning about key rotation.
|
|
||||||
|
|
||||||
As a review, Nomad starts in a _sealed_ state. Nomad is unsealed by providing the unseal keys.
|
|
||||||
By default, Nomad uses a technique known as [Shamir's secret sharing algorithm](http://en.wikipedia.org/wiki/Shamir's_Secret_Sharing)
|
|
||||||
to split the master key into 5 shares, any 3 of which are required to reconstruct the master
|
|
||||||
key. The master key is used to protect the encryption key, which is ultimately used to protect
|
|
||||||
data written to the storage backend.
|
|
||||||
|
|
||||||
![Keys](/assets/images/keys.png)
|
|
||||||
|
|
||||||
To support key rotation, we need to support changing the unseal keys, master key, and the
|
|
||||||
backend encryption key. We split this into two seperate operations, `rekey` and `rotate`.
|
|
||||||
|
|
||||||
The `rekey` operation is used to generate a new master key. When this is being done,
|
|
||||||
it is possible to change the parameters of the key splitting, so that the number of shares
|
|
||||||
and the threshold required to unseal can be changed. To perform a rekey a threshold of the
|
|
||||||
current unseal keys must be provided. This is to prevent a single malicious operator from
|
|
||||||
performing a rekey and invaliding the existing master key.
|
|
||||||
|
|
||||||
Performing a rekey is fairly straightforward. The rekey operation must be initialized with
|
|
||||||
the new parameters for the split and threshold. Once initialized, the current unseal keys
|
|
||||||
must be provided until the threshold is met. Once met, Nomad will generate the new master
|
|
||||||
key, perform the splitting, and re-encrypt the encryption key with the new master key.
|
|
||||||
The new unseal keys are then provided to the operator, and the old unseal keys are no
|
|
||||||
longer usable.
|
|
||||||
|
|
||||||
The `rotate` operation is used to change the encryption key used to protect data written
|
|
||||||
to the storage backend. This key is never provided or visible to operators, who only
|
|
||||||
have unseal keys. This simplifies the rotation, as it does not require the current key
|
|
||||||
holders unlike the `rekey` operation. When `rotate` is triggered, a new encryption key
|
|
||||||
is generated and added to a keyring. All new values written to the storage backend are
|
|
||||||
encrypted with the new key. Old values written with previous encryption keys can still
|
|
||||||
be decrypted since older keys are saved in the keyring. This allows key rotation to be
|
|
||||||
done online, without an expensive re-encryption process.
|
|
||||||
|
|
||||||
Both the `rekey` and `rotate` operations can be done online and in a highly available
|
|
||||||
configuration. Only the active Nomad instance can perform either of the operations
|
|
||||||
but standby instances can still assume an active role after either operation. This is
|
|
||||||
done by providing an online upgrade path for standby instances. If the current encryption
|
|
||||||
key is `N` and a rotation installs `N+1`, Nomad creates a special "upgrade" key, which
|
|
||||||
provides the `N+1` encryption key protected by the `N` key. This upgrade key is only available
|
|
||||||
for a few minutes enabling standby instances do a periodic check for upgrades.
|
|
||||||
This allows standby instances to update their keys and stay in-sync with the active Nomad
|
|
||||||
without requiring operators to perform another unseal.
|
|
||||||
|
|
|
@ -1,148 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Security Model"
|
|
||||||
sidebar_current: "docs-internals-security"
|
|
||||||
description: |-
|
|
||||||
Learn about the security model of Nomad.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Security Model
|
|
||||||
|
|
||||||
Due to the nature of Nomad and the confidentiality of data it is managing,
|
|
||||||
the Nomad security model is very critical. The overall goal of Nomad's security
|
|
||||||
model is to provide [confidentiality, integrity, availability, accountability,
|
|
||||||
authentication](http://en.wikipedia.org/wiki/Information_security).
|
|
||||||
|
|
||||||
This means that data at rest and in transit must be secure from eavesdropping
|
|
||||||
or tampering. Clients must be appropriately authenticated and authorized
|
|
||||||
to access data or modify policy. All interactions must be auditable and traced
|
|
||||||
uniquely back to the origin entity. The system must be robust against intentional
|
|
||||||
attempts to bypass any of its access controls.
|
|
||||||
|
|
||||||
# Threat Model
|
|
||||||
|
|
||||||
The following are the various parts of the Nomad threat model:
|
|
||||||
|
|
||||||
* Eavesdropping on any Nomad communication. Client communication with Nomad
|
|
||||||
should be secure from eavesdropping as well as communication from Nomad to
|
|
||||||
its storage backend.
|
|
||||||
|
|
||||||
* Tampering with data at rest or in transit. Any tampering should be detectable
|
|
||||||
and cause Nomad to abort processing of the transaction.
|
|
||||||
|
|
||||||
* Access to data or controls without authentication or authorization. All requests
|
|
||||||
must be proceeded by the applicable security policies.
|
|
||||||
|
|
||||||
* Access to data or controls without accountability. If audit logging
|
|
||||||
is enabled, requests and responses must be logged before the client receives
|
|
||||||
any secret material.
|
|
||||||
|
|
||||||
* Confidentiality of stored secrets. Any data that leaves Nomad to rest in the
|
|
||||||
storage backend must be safe from eavesdropping. In practice, this means all
|
|
||||||
data at rest must be encrypted.
|
|
||||||
|
|
||||||
* Availability of secret material in the face of failure. Nomad supports
|
|
||||||
running in a highly available configuration to avoid loss of availability.
|
|
||||||
|
|
||||||
The following are not parts of the Nomad threat model:
|
|
||||||
|
|
||||||
* Protecting against arbitrary control of the storage backend. An attacker
|
|
||||||
that can perform arbitrary operations against the storage backend can
|
|
||||||
undermine security in any number of ways that are difficult or impossible to protect
|
|
||||||
against. As an example, an attacker could delete or corrupt all the contents
|
|
||||||
of the storage backend causing total data loss for Nomad. The ability to control
|
|
||||||
reads would allow an attacker to snapshot in a well-known state and rollback state
|
|
||||||
changes if that would be beneficial to them.
|
|
||||||
|
|
||||||
* Protecting against the leakage of the existence of secret material. An attacker
|
|
||||||
that can read from the storage backend may observe that secret material exists
|
|
||||||
and is stored, even if it is kept confidential.
|
|
||||||
|
|
||||||
* Protecting against memory analysis of a running Nomad. If an attacker is able
|
|
||||||
to inspect the memory state of a running Nomad instance then the confidentiality
|
|
||||||
of data may be compromised.
|
|
||||||
|
|
||||||
# External Threat Overview
|
|
||||||
|
|
||||||
Given the architecture of Nomad, there are 3 distinct systems we are concerned with
|
|
||||||
for Nomad. There is the client, which is speaking to Nomad over an API. There is Nomad
|
|
||||||
or the server more accurately, which is providing an API and serving requests. Lastly,
|
|
||||||
there is the storage backend, which the server is utilizing to read and write data.
|
|
||||||
|
|
||||||
There is no mutual trust between the Nomad client and server. Clients use
|
|
||||||
[TLS](http://en.wikipedia.org/wiki/Transport_Layer_Security) to verify the identity
|
|
||||||
of the server and to establish a secure communication channel. Servers require that
|
|
||||||
a client provides a client token for every request which is used to identify the client.
|
|
||||||
A client that does not provide their token is only permitted to make login requests.
|
|
||||||
|
|
||||||
The storage backends used by Nomad are also untrusted by design. Nomad uses a security
|
|
||||||
barrier for all requests made to the backend. The security barrier automatically encrypts
|
|
||||||
all data leaving Nomad using the [Advanced Encryption Standard (AES)](http://en.wikipedia.org/wiki/Advanced_Encryption_Standard)
|
|
||||||
cipher in the [Galois Counter Mode (GCM)](http://en.wikipedia.org/wiki/Galois/Counter_Mode).
|
|
||||||
The nonce is randomly generated for every encrypted object. When data is read from the
|
|
||||||
security barrier the GCM authentication tag is verified prior to decryption to detect
|
|
||||||
any tampering.
|
|
||||||
|
|
||||||
Depending on the backend used, Nomad may communicate with the backend over TLS
|
|
||||||
to provide an added layer of security. In some cases, such as a file backend this
|
|
||||||
is not applicable. Because storage backends are untrusted, an eavesdropper would
|
|
||||||
only gain access to encrypted data even if communication with the backend was intercepted.
|
|
||||||
|
|
||||||
# Internal Threat Overview
|
|
||||||
|
|
||||||
Within the Nomad system, a critical security concern is an attacker attempting
|
|
||||||
to gain access to secret material they are not authorized to. This is an internal
|
|
||||||
threat if the attacker is already permitted some level of access to Nomad and is
|
|
||||||
able to authenticate.
|
|
||||||
|
|
||||||
When a client first authenticates with Nomad, a credential backend is used to
|
|
||||||
verify the identity of the client and to return a list of associated ACL policies.
|
|
||||||
This association is configured by operators of Nomad ahead of time. For example,
|
|
||||||
GitHub users in the "engineering" team may be mapped to the "engineering" and "ops"
|
|
||||||
Nomad policies. Nomad then generates a client token which is a randomly generated
|
|
||||||
UUID and maps it to the policy list. This client token is then returned to the client.
|
|
||||||
|
|
||||||
On each request a client provides this token. Nomad then uses it to check that the token
|
|
||||||
is valid and has not been revoked or expired, and generates an ACL based on the associated
|
|
||||||
policies. Nomad uses a strict default deny or whitelist enforcement. This means unless
|
|
||||||
an associated policy allows for a given action, it will be denied. Each policy specifies
|
|
||||||
a level of access granted to a path in Nomad. When the policies are merged (if multiple
|
|
||||||
policies are associated with a client), the highest access level permitted is used.
|
|
||||||
For example, if the "engineering" policy permits read/write access to the "eng/" path,
|
|
||||||
and the "ops" policy permits read access to the "ops/" path, then the user gets the
|
|
||||||
union of those. Policy is matched using the most specific defined policy, which may be
|
|
||||||
an exact match or the longest-prefix match glob pattern.
|
|
||||||
|
|
||||||
Certain operations are only permitted by "root" users, which is a distinguished
|
|
||||||
policy built into Nomad. This is similar to the concept of a root user on a Unix system
|
|
||||||
or an Administrator on Windows. Although clients could be provided with root tokens
|
|
||||||
or associated with the root policy, instead Nomad supports the notion of "sudo" privilege.
|
|
||||||
As part of a policy, users may be granted "sudo" privileges to certain paths, so that
|
|
||||||
they can still perform security sensitive operations without being granted global
|
|
||||||
root access to Nomad.
|
|
||||||
|
|
||||||
Lastly, Nomad supports using a [Two-man rule](http://en.wikipedia.org/wiki/Two-man_rule) for
|
|
||||||
unsealing using [Shamir's Secret Sharing technique](http://en.wikipedia.org/wiki/Shamir's_Secret_Sharing).
|
|
||||||
When Nomad is started, it starts in an _sealed_ state. This means that the encryption key
|
|
||||||
needed to read and write from the storage backend is not yet known. The process of unsealing
|
|
||||||
requires providing the master key so that the encryption key can be retrieved. The risk of distributing
|
|
||||||
the master key is that a single malicious actor with access to it can decrypt the entire
|
|
||||||
Nomad. Instead, Shamir's technique allows us to split the master key into multiple shares or parts.
|
|
||||||
The number of shares and the threshold needed is configurable, but by default Nomad generates
|
|
||||||
5 shares, any 3 of which must be provided to reconstruct the master key.
|
|
||||||
|
|
||||||
By using a secret sharing technique, we avoid the need to place absolute trust in the holder
|
|
||||||
of the master key, and avoid storing the master key at all. The master key is only
|
|
||||||
retrievable by reconstructing the shares. The shares are not useful for making any requests
|
|
||||||
to Nomad, and can only be used for unsealing. Once unsealed the standard ACL mechanisms
|
|
||||||
are used for all requests.
|
|
||||||
|
|
||||||
To make an analogy, a bank puts security deposit boxes inside of a vault.
|
|
||||||
Each security deposit box has a key, while the vault door has both a combination and a key.
|
|
||||||
The vault is encased in steel and concrete so that the door is the only practical entrance.
|
|
||||||
The analogy to Nomad, is that the cryptosystem is the steel and concrete protecting the data.
|
|
||||||
While you could tunnel through the concrete or brute force the encryption keys, it would be
|
|
||||||
prohibitively time consuming. Opening the bank vault requires two-factors: the key and the combination.
|
|
||||||
Similarly, Nomad requires multiple shares be provided to reconstruct the master key.
|
|
||||||
Once unsealed, each security deposit boxes still requires the owner provide a key, and similarly
|
|
||||||
the Nomad ACL system protects all the secrets stored.
|
|
|
@ -25,24 +25,39 @@ as well as statsd based on providing the appropriate configuration options.
|
||||||
Below is sample output of a telemetry dump:
|
Below is sample output of a telemetry dump:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.num_goroutines': 12.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.free_count': 11882.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.total_gc_runs': 9.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.expire.num_leases': 1.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.alloc_bytes': 502992.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.sys_bytes': 3999992.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.malloc_count': 17315.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.heap_objects': 5433.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][G] 'vault.runtime.total_gc_pause_ns': 3794124.000
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.audit.log_response': Count: 2 Min: 0.001 Mean: 0.001 Max: 0.001 Stddev: 0.000 Sum: 0.002
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.route.read.secret-': Count: 1 Sum: 0.036
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.barrier.get': Count: 3 Min: 0.004 Mean: 0.021 Max: 0.050 Stddev: 0.025 Sum: 0.064
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.token.lookup': Count: 2 Min: 0.040 Mean: 0.074 Max: 0.108 Stddev: 0.048 Sum: 0.148
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.policy.get_policy': Count: 2 Min: 0.003 Mean: 0.004 Max: 0.005 Stddev: 0.001 Sum: 0.009
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.core.check_token': Count: 2 Min: 0.053 Mean: 0.087 Max: 0.121 Stddev: 0.048 Sum: 0.174
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.audit.log_request': Count: 2 Min: 0.001 Mean: 0.001 Max: 0.001 Stddev: 0.000 Sum: 0.002
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.barrier.put': Count: 3 Min: 0.004 Mean: 0.010 Max: 0.019 Stddev: 0.008 Sum: 0.029
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.route.write.secret-': Count: 1 Sum: 0.035
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.core.handle_request': Count: 2 Min: 0.097 Mean: 0.228 Max: 0.359 Stddev: 0.186 Sum: 0.457
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
|
||||||
[2015-04-20 12:24:30 -0700 PDT][S] 'vault.expire.register': Count: 1 Sum: 0.18
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
|
||||||
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
|
||||||
```
|
```
|
||||||
|
|
|
@ -1,53 +0,0 @@
|
||||||
---
|
|
||||||
layout: "docs"
|
|
||||||
page_title: "Token Authentication"
|
|
||||||
sidebar_current: "docs-internals-token"
|
|
||||||
description: |-
|
|
||||||
Learn about the client token authentication in Nomad.
|
|
||||||
---
|
|
||||||
|
|
||||||
# Token Authentication
|
|
||||||
|
|
||||||
The `token` authentication backend is built-in and is at the core of
|
|
||||||
client authentication. Other authentication backends may be used to
|
|
||||||
authenticate a client, but they eventually result in the generation of a client
|
|
||||||
token managed by the `token` backend.
|
|
||||||
|
|
||||||
Every token has a number of properties:
|
|
||||||
|
|
||||||
* ID - The primary ID of a token is a randomly generated UUID
|
|
||||||
* Display Name - Optionally, a human readable display name
|
|
||||||
* Metadata - Metadata used for audit logging
|
|
||||||
* Number of Uses - Optionally, a restricted use count
|
|
||||||
* Parent ID - Optionally, a parent token which created this child token
|
|
||||||
* Policies - An associated list of ACL policies
|
|
||||||
* Source Path - The path at which the token was generated (e.g. `auth/github/login`)
|
|
||||||
|
|
||||||
The properties of a token are immutable once created. The exception to this
|
|
||||||
is the number of uses, which is decremented on each request. Each of these
|
|
||||||
properties enable Nomad to do a number of interesting things.
|
|
||||||
|
|
||||||
Each token maintains the source path, or the login path, that was used
|
|
||||||
to create the token. This is used to allow source based revocation. For example,
|
|
||||||
if we believe our GitHub organization was compromised, we may want to revoke
|
|
||||||
all tokens generated via `auth/github/login`. This would be done by using the
|
|
||||||
`auth/token/revoke-prefix/` API with the `auth/github/` prefix. Revoking the
|
|
||||||
prefix will revoke all client tokens generated at that path, as well as all
|
|
||||||
dynamic secrets generated by those tokens. This provides a powerful "break glass"
|
|
||||||
procedure during a potential compromise.
|
|
||||||
|
|
||||||
If a token is created by another authentication backend, they do not have
|
|
||||||
a parent token. However, any tokens created by the `auth/token/create` API
|
|
||||||
have a parent token, namely the token used to make that request. By maintaining
|
|
||||||
this parent-child relationship, Nomad models token trees. Child tokens can
|
|
||||||
be created with a subset of the parent policies, allowing for dropping of
|
|
||||||
privileges. When a token is revoked, the entire sub-tree of tokens is revoked
|
|
||||||
with it. This allows clients to safely generate child tokens and then revoke
|
|
||||||
them all along with the root.
|
|
||||||
|
|
||||||
Child tokens are very useful, especially when combined with limited use tokens.
|
|
||||||
When a token is created, its use count can be optionally specified. Providing
|
|
||||||
a use count of one makes a _one time token_. This means the token can be used
|
|
||||||
for a single request before being automatically revoked. This can be generalized
|
|
||||||
to any number of uses. Limited use tokens cannot be used to create sub-tokens,
|
|
||||||
but they can be a powerful way to allow extremely limited access to Nomad.
|
|
|
@ -15,8 +15,8 @@
|
||||||
<div id="hero-logotype"></div>
|
<div id="hero-logotype"></div>
|
||||||
</div>
|
</div>
|
||||||
<div id="hero-text">
|
<div id="hero-text">
|
||||||
<h1>Applications on a global fleet.</h1>
|
<h1>Easily deploy applications at any scale</h1>
|
||||||
<h3>As simple as a Single Machine.</h3>
|
<h3>Any App. Any OS. Any Cloud.</h3>
|
||||||
<div id="hero-btns">
|
<div id="hero-btns">
|
||||||
<a class="h-btn light lrg has-caret intro" href="/intro">Learn More<span class="h-caret"></span></a>
|
<a class="h-btn light lrg has-caret intro" href="/intro">Learn More<span class="h-caret"></span></a>
|
||||||
<a class="h-btn green lrg has-caret has-border try" href="">Try Nomad<span class="h-caret"></span></a>
|
<a class="h-btn green lrg has-caret has-border try" href="">Try Nomad<span class="h-caret"></span></a>
|
||||||
|
@ -47,10 +47,13 @@
|
||||||
<div id="deploy" class="feature">
|
<div id="deploy" class="feature">
|
||||||
<div class="feature-header">
|
<div class="feature-header">
|
||||||
<h3>Deploy to any cloud</h3>
|
<h3>Deploy to any cloud</h3>
|
||||||
<p>Deploy Applications and Docker containers across datacenters to any cloud</p>
|
<p>
|
||||||
|
Nomad supports multi-datacenter and multi-region clusters. Deploy applications that
|
||||||
|
span multiple geographic locations or cloud providers.
|
||||||
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<div class="feature-footer">
|
<div class="feature-footer">
|
||||||
<p>Phasellus quis arcu nec turpis aliquet malesuada. Pellentesque auctor fermentum cursus.</p>
|
<p>Applications containerized with Docker can be quickly deployed, making it easy to scale.</p>
|
||||||
<span class="docker-outline-logo"></span>
|
<span class="docker-outline-logo"></span>
|
||||||
</div>
|
</div>
|
||||||
</div> <!-- .feature -->
|
</div> <!-- .feature -->
|
||||||
|
@ -106,7 +109,7 @@
|
||||||
<div id="density" class="feature">
|
<div id="density" class="feature">
|
||||||
<div class="feature-header">
|
<div class="feature-header">
|
||||||
<h3>Increase density and reduce cost</h3>
|
<h3>Increase density and reduce cost</h3>
|
||||||
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque hendrerit nulla ut orci ultricies viverra.</p>
|
<p>Nomad automatically bin packs tasks to maximize efficency, increase density and reduce costs.</p>
|
||||||
</div>
|
</div>
|
||||||
<div class="feature-graphic"></div>
|
<div class="feature-graphic"></div>
|
||||||
</div> <!-- .feature -->
|
</div> <!-- .feature -->
|
||||||
|
|
|
@ -13,24 +13,16 @@
|
||||||
<a href="/docs/internals/architecture.html">Architecture</a>
|
<a href="/docs/internals/architecture.html">Architecture</a>
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
<li<%= sidebar_current("docs-internals-ha") %>>
|
<li<%= sidebar_current("docs-internals-consensus") %>>
|
||||||
<a href="/docs/internals/high-availability.html">High Availability</a>
|
<a href="/docs/internals/consensus.html">Consensus Protocol</a>
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
<li<%= sidebar_current("docs-internals-security") %>>
|
<li<%= sidebar_current("docs-internals-gossip") %>>
|
||||||
<a href="/docs/internals/security.html">Security Model</a>
|
<a href="/docs/internals/gossip.html">Gossip Protocol</a>
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
<li<%= sidebar_current("docs-internals-telemetry") %>>
|
<li<%= sidebar_current("docs-internals-telemetry") %>>
|
||||||
<a href="/docs/internals/telemetry.html">Telemetry</a>
|
<a href="/docs/internals/telemetry.html">Telemetry</a>
|
||||||
</li>
|
|
||||||
|
|
||||||
<li<%= sidebar_current("docs-internals-token") %>>
|
|
||||||
<a href="/docs/internals/token.html">Token Authentication</a>
|
|
||||||
</li>
|
|
||||||
|
|
||||||
<li<%= sidebar_current("docs-internals-rotation") %>>
|
|
||||||
<a href="/docs/internals/rotation.html">Key Rotation</a>
|
|
||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
|
|
Loading…
Reference in a new issue