open-nomad/website/content/docs/operations/key-management.mdx
Tim Gross 78acc75b57
docs: add notes about keyring to snapshot restore (#16663)
When cluster administrators restore from Raft snapshot, they also need to ensure the
keyring is in place. For on-prem users doing in-place upgrades this is less of a
concern but for typical cloud workflows where the whole host is replaced, it's
an important warning (at least until #14852 has been implemented).
2023-03-28 08:31:01 -04:00

67 lines
3.2 KiB
Plaintext

---
layout: docs
page_title: Key Management
description: Learn about the key management in Nomad.
---
# Key Management
Nomad servers maintain an encryption keyring used to encrypt [Variables][] and
sign task [workload identities][]. The servers store key metadata in raft, but
the encryption key material is stored in a separate file in the `keystore`
subdirectory of the Nomad [data directory][]. These files have the extension
`.nks.json`. The key material in each file is wrapped in a unique key encryption
key (KEK) that is not shared between servers.
Under normal operations the keyring is entirely managed by Nomad, but this
section provides administrators additional context around key replication and
recovery.
## Key Rotation
Only one key in the keyring is "active" at any given time, and all encryption
and signing operations happen on the leader. Nomad automatically rotates the
active encryption key every 30 days. When a key is rotated, the existing keys
are marked as "inactive" but not deleted, so they can be used for decrypting
previously encrypted variables and verifying workload identities for existing
allocations.
If you believe key material has been compromised, you can execute [`nomad
operator root keyring rotate -full`][]. A new "active" key will be created and
"inactive" keys will be marked "rekeying". Nomad will asynchronously decrypt and
re-encrypt all variables with the new key. As each key's variables are encrypted
with the new key, the old key will marked as "deprecated".
## Key Replication
When a leader is elected, it creates the keyring if it does not already
exist. When a key is added, the metadata will be replicated via raft. Each
server runs a key replication process that watches for changes to the state
store and will fetch the key material from the leader asynchronously, falling
back to retrieving from other servers in the case where a key is written
immediately before a leader election.
## Restoring the Keyring from Backup
Key material is never stored in raft. This prevents an attacker with a backup of
the state store from getting access to encrypted variables. It also allows the
HashiCorp engineering and support organization to safely handle cluster
snapshots you might provide without exposing any of your keys or variables.
However, this means that to restore a cluster from snapshot you need to also
provide the keystore directory with the `.nks.json` key files on at least one
server. The `.nks.json` key files are unique per server, but only one server's
key files are needed to recover the cluster. Operators should include these
files as part of your organization's backup and recovery strategy for the
cluster.
If you are recovering a Raft snapshot onto a new cluster without running
workloads, you can skip restoring the keyring and run [`nomad operator root
keyring rotate`][] once the servers have joined.
[variables]: /nomad/docs/concepts/variables
[workload identities]: /nomad/docs/concepts/workload-identity
[data directory]: /nomad/docs/configuration#data_dir
[`nomad operator root keyring rotate -full`]: /nomad/docs/commands/operator/root/keyring-rotate
[`nomad operator root keyring rotate`]: /nomad/docs/commands/operator/root/keyring-rotate