6496f6674f
Co-authored-by: Alexander Scheel <alex.scheel@hashicorp.com>
1019 lines
53 KiB
Plaintext
1019 lines
53 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: 'PKI - Secrets Engines: Considerations'
|
|
description: The PKI secrets engine for Vault generates TLS certificates.
|
|
---
|
|
|
|
# PKI Secrets Engine - Considerations
|
|
|
|
To successfully deploy this secrets engine, there are a number of important
|
|
considerations to be aware of, as well as some preparatory steps that should be
|
|
undertaken. You should read all of these _before_ using this secrets engine or
|
|
generating the CA to use with this secrets engine.
|
|
|
|
## Table of Contents
|
|
|
|
- [Be Careful with Root CAs](#be-careful-with-root-cas)
|
|
- [Managed Keys](#managed-keys)
|
|
- [One CA Certificate, One Secrets Engine](#one-ca-certificate-one-secrets-engine)
|
|
- [Always Configure a Default Issuer](#always-configure-a-default-issuer)
|
|
- [Key Types Matter](#key-types-matter)
|
|
- [Cluster Performance and Key Types](#cluster-performance-and-key-types)
|
|
- [Use a CA Hierarchy](#use-a-ca-hierarchy)
|
|
- [Cross-Signed Intermediates](#cross-signed-intermediates)
|
|
- [Cluster URLs are Important](#cluster-urls-are-important)
|
|
- [Automate Rotation with ACME](#automate-rotation-with-acme)
|
|
- [ACME Stores Certificates](#acme-stores-certificates)
|
|
- [ACME Role Restrictions Require EAB](#acme-role-restrictions-require-eab)
|
|
- [ACME and the Public Internet](#acme-and-the-public-internet)
|
|
- [ACME Errors are in Server Logs](#acme-errors-are-in-server-logs)
|
|
- [ACME Security Considerations](#acme-security-considerations)
|
|
- [ACME and Client Counting](#acme-and-client-counting)
|
|
- [Keep Certificate Lifetimes Short, For CRL's Sake](#keep-certificate-lifetimes-short-for-crls-sake)
|
|
- [NotAfter Behavior on Leaf Certificates](#notafter-behavior-on-leaf-certificates)
|
|
- [Cluster Performance and Quantity of Leaf Certificates](#cluster-performance-and-quantity-of-leaf-certificates)
|
|
- [You must configure issuing/CRL/OCSP information _in advance_](#you-must-configure-issuingcrlocsp-information-_in-advance_)
|
|
- [Distribution of CRLs and OCSP](#distribution-of-crls-ocsp)
|
|
- [Automate CRL Building and Tidying](#automate-crl-building-and-tidying)
|
|
- [Spectrum of Revocation Support](#spectrum-of-revocation-support)
|
|
- [What Are Cross-Cluster CRLs?](#what-are-cross-cluster-crls)
|
|
- [Issuer Subjects and CRLs](#issuer-subjects-and-crls)
|
|
- [Automate Leaf Certificate Renewal](#automate-leaf-certificate-renewal)
|
|
- [Safe Minimums](#safe-minimums)
|
|
- [Token Lifetimes and Revocation](#token-lifetimes-and-revocation)
|
|
- [Safe Usage of Roles](#safe-usage-of-roles)
|
|
- [Telemetry](#telemetry)
|
|
- [Auditing](#auditing)
|
|
- [Role-Based Access](#role-based-access)
|
|
- [Replicated DataSets](#replicated-datasets)
|
|
- [Cluster Scalability](#cluster-scalability)
|
|
- [PSS Support](#pss-support)
|
|
- [Issuer Storage Migration Issues](#issuer-storage-migration-issues)
|
|
|
|
## Be Careful with Root CAs
|
|
|
|
Vault storage is secure, but not as secure as a piece of paper in a bank vault.
|
|
It is, after all, networked software. If your root CA is hosted outside of
|
|
Vault, don't put it in Vault as well; instead, issue a shorter-lived
|
|
intermediate CA certificate and put this into Vault. This aligns with industry
|
|
best practices.
|
|
|
|
Since 0.4, the secrets engine supports generating self-signed root CAs and
|
|
creating and signing CSRs for intermediate CAs. In each instance, for security
|
|
reasons, the private key can _only_ be exported at generation time, and the
|
|
ability to do so is part of the command path (so it can be put into ACL
|
|
policies).
|
|
|
|
If you plan on using intermediate CAs with Vault, it is suggested that you let
|
|
Vault create CSRs and do not export the private key, then sign those with your
|
|
root CA (which may be a second mount of the `pki` secrets engine).
|
|
|
|
### Managed Keys
|
|
|
|
Since 1.10, Vault Enterprise can access private key material in a
|
|
[_managed key_](/vault/docs/enterprise/managed-keys). In this case, Vault never sees the
|
|
private key, and the external KMS or HSM performs certificate signing operations.
|
|
Managed keys are configured by selecting the `kms` type when generating a root
|
|
or intermediate.
|
|
|
|
## One CA Certificate, One Secrets Engine
|
|
|
|
Since Vault 1.11.0, the PKI Secrets Engine supports multiple issuers in a single
|
|
mount. However, in order to simplify the configuration, it is _strongly_
|
|
recommended that operators limit a mount to a single issuer. If you want to issue
|
|
certificates from multiple disparate CAs, mount the PKI secrets engine at multiple
|
|
mount points with separate CA certificates in each.
|
|
|
|
The rationale for separating mounts is to simplify permissions management:
|
|
very few individuals need access to perform operations with the root, but
|
|
many need access to create leaves. The operations on a root should generally
|
|
be limited to issuing and revoking intermediate CAs, which is a highly
|
|
privileged operation; it becomes much easier to audit these operations when
|
|
they're in a separate mount than if they're mixed in with day-to-day leaf
|
|
issuance.
|
|
|
|
A common pattern is to have one mount act as your root CA and to use this CA
|
|
only to sign intermediate CA CSRs from other PKI secrets engines.
|
|
|
|
To keep old CAs active, there's two approaches to achieving rotation:
|
|
|
|
1. Use multiple secrets engines. This allows a fresh start, preserving the
|
|
old issuer and CRL. Vault ACL policy can be updated to deny new issuance
|
|
under the old mount point and roles can be re-evaluated before being
|
|
imported into the new mount point.
|
|
2. Use multiple issuers in the same mount point. The usage of the old issuer
|
|
can be restricted to CRL signing, and existing roles and ACL policy can be
|
|
kept as-is. This allows cross-signing within the same mount, and consumers
|
|
of the mount won't have to update their configuration. Once the transitional
|
|
period for this rotation has completed and all past issued certificate have
|
|
expired, it is encouraged to fully remove the old issuer and any unnecessary
|
|
cross-signed issuers from the mount point.
|
|
|
|
Another suggested use case for multiple issuers in the same mount is splitting
|
|
issuance by TTL lifetime. For short-lived certificates, an intermediate
|
|
stored in Vault will often out-perform a HSM-backed intermediate. For
|
|
longer-lived certificates, however, it is often important to have the
|
|
intermediate key material secured throughout the lifetime of the end-entity
|
|
certificate. This means that two intermediates in the same mount -- one backed
|
|
by the HSM and one backed by Vault -- can satisfy both use cases. Operators
|
|
can make roles setting maximum TTLs for each issuer and consumers of the
|
|
mount can decide which to use.
|
|
|
|
### Always Configure a Default Issuer
|
|
|
|
For backwards compatibility, [the default issuer](/vault/api-docs/secret/pki#read-issuers-configuration)
|
|
is used to service PKI endpoints without an explicit issuer (either via path
|
|
selection or role-based selection). When certificates are revoked and their
|
|
issuer is no longer part of this PKI mount, Vault places them on the default
|
|
issuer's CRL. This means maintaining a default issuer is important for both
|
|
backwards compatibility for issuing certificates and for ensuring revoked
|
|
certificates land on a CRL.
|
|
|
|
## Key Types Matter
|
|
|
|
Certain key types have impacts on performance. Signing certificates from a RSA
|
|
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
|
|
(using `/issue/:role` endpoints) using RSA keys will also be slow: RSA key
|
|
generation involves finding suitable random primes, whereas Ed25519 keys can
|
|
be random data. As the number of bits goes up (RSA 2048 -> 4096 or ECDSA
|
|
P-256 -> P-521), signature times also increases.
|
|
|
|
This matters in both directions: not only is issuance more expensive,
|
|
but validation of the corresponding signature (in say, TLS handshakes) will
|
|
also be more expensive. Careful consideration of both issuer and issued key
|
|
types can have meaningful impacts on performance of not only Vault, but
|
|
systems using these certificates.
|
|
|
|
### Cluster Performance and Key Type
|
|
|
|
The [benchmark-vault](https://github.com/hashicorp/vault-benchmark) project
|
|
can be used to measure the performance of a Vault PKI instance. In general,
|
|
some considerations to be aware of:
|
|
|
|
- RSA key generation is much slower and highly variable than EC key
|
|
generation. If performance and throughput are a necessity, consider using
|
|
EC keys (including NIST P-curves and Ed25519) instead of RSA.
|
|
|
|
- Key signing requests (via `/pki/sign`) will be faster than (`/pki/issue`),
|
|
especially for RSA keys: this removes the necessity for Vault to generate
|
|
key material and can sign the key material provided by the client. This
|
|
signing step is common between both endpoints, so key generation is pure
|
|
overhead if the client has a sufficiently secure source of entropy.
|
|
|
|
- The CA's key type matters as well: using a RSA CA will result in a RSA
|
|
signature and takes longer than a ECDSA or Ed25519 CA.
|
|
|
|
- Storage is an important factor: with [BYOC Revocation](/vault/api-docs/secret/pki#revoke-certificate),
|
|
using `no_store=true` still gives you the ability to revoke certificates
|
|
and audit logs can be used to track issuance. Clusters using a remote
|
|
storage (like Consul) over a slow network and using `no_store=false` will
|
|
result in additional latency on issuance. Adding leases for every issued
|
|
certificate compounds the problem.
|
|
|
|
- Storing too many certificates results in longer `LIST /pki/certs` time,
|
|
including the time to tidy the instance. As such, for large scale
|
|
deployments (>= 250k active certificates) it is recommended to use audit
|
|
logs to track certificates outside of Vault.
|
|
|
|
As a general comparison on unspecified hardware, using `benchmark-vault` for
|
|
`30s` on a local, single node, raft-backed Vault instance:
|
|
|
|
- Vault can issue 300k certificates using EC P-256 for CA & leaf keys and
|
|
without storage.
|
|
|
|
- But switching to storing these leaves drops that number to 65k, and only
|
|
20k with leases.
|
|
|
|
- Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves,
|
|
regardless of whether or not storage or leases were used. The 95% key
|
|
generation time is above 10s.
|
|
|
|
- In comparison, using P-521 keys, Vault can issue closer to 30k leaves
|
|
without leases and 18k with leases.
|
|
|
|
These numbers are for example only, to represent the impact different key types
|
|
can have on PKI cluster performance.
|
|
|
|
The use of ACME adds additional latency into these numbers, both because
|
|
certificates need to be stored and because challenge validation needs to
|
|
be performed.
|
|
|
|
## Use a CA Hierarchy
|
|
|
|
It is generally recommended to use a hierarchical CA setup, with a root
|
|
certificate which issues one or more intermediates (based on usage), which
|
|
in turn issue the leaf certificates.
|
|
|
|
This allows stronger storage or policy guarantees around [protection of the
|
|
root CA](#be-careful-with-root-cas), while letting Vault manage the
|
|
intermediate CAs and issuance of leaves. Different intermediates might be
|
|
issued for different usage, such as VPN signing, Email signing, or testing
|
|
versus production TLS services. This helps to keep CRLs limited to specific
|
|
purposes: for example, VPN services don't care about the revoked set of
|
|
email signing certificates if they're using separate certificates and
|
|
different intermediates, and thus don't need both CRL contents. Additionally,
|
|
this allows higher risk intermediates (such as those issuing longer-lived
|
|
email signing certificates) to have HSM-backing without impacting the
|
|
performance of easier-to-rotate intermediates and certificates (such as
|
|
TLS intermediates).
|
|
|
|
Vault supports the use of both the [`allowed_domains` parameter on
|
|
Roles](/vault/api-docs/secret/pki#allowed_domains) and the [`permitted_dns_domains`
|
|
parameter to set the Name Constraints extension](/vault/api-docs/secret/pki#permitted_dns_domains)
|
|
on root and intermediate generation. This allows for several layers of
|
|
separation of concerns between TLS-based services.
|
|
|
|
### Cross-Signed Intermediates
|
|
|
|
When cross-signing intermediates from two separate roots, two separate
|
|
intermediate issuers will exist within the Vault PKI mount. In order to
|
|
correctly serve the cross-signed chain on issuance requests, the
|
|
`manual_chain` override is required on either or both intermediates. This
|
|
can be constructed in the following order:
|
|
|
|
- this issuer (`self`)
|
|
- this root
|
|
- the other copy of this intermediate
|
|
- the other root
|
|
|
|
All requests to this issuer for signing will now present the full cross-signed
|
|
chain.
|
|
|
|
## Cluster URLs are Important
|
|
|
|
In Vault 1.13, support for [templated AIA
|
|
URLs](/vault/api-docs/secret/pki#enable_aia_url_templating-1)
|
|
was added. With the [per-cluster URL
|
|
configuration](/vault/api-docs/secret/pki#set-cluster-configuration) pointing
|
|
to this Performance Replication cluster, AIA information will point to the
|
|
cluster that issued this certificate automatically.
|
|
|
|
In Vault 1.14, with ACME support, the same configuration is used for allowing
|
|
ACME clients to discover the URL of this cluster.
|
|
|
|
~> **Warning**: It is important to ensure that this configuration is
|
|
up to date and maintained correctly, always pointing to the node's
|
|
PR cluster address (which may be a Load Balanced or a DNS Round-Robbin
|
|
address). If this configuration is not set on every Performance Replication
|
|
cluster, certificate issuance (via REST and/or via ACME) will fail.
|
|
|
|
## Automate Rotation with ACME
|
|
|
|
In Vault 1.14, support for the [Automatic Certificate Management Environment
|
|
(ACME)](https://datatracker.ietf.org/doc/html/rfc8555) protocol has been
|
|
added to the PKI Engine. This is a standardized way to handle validation,
|
|
issuance, rotation, and revocation of server certificates.
|
|
|
|
Many ecosystems, from web servers like Caddy, Nginx, and Apache, to
|
|
orchestration environments like Kubernetes (via cert-manager) natively
|
|
support issuance via the ACME protocol. For deployments without native
|
|
support, stand-alone tools like certbot support fetching and renewing
|
|
certificates on behalf of consumers. Vault's PKI Engine only includes server
|
|
support for ACME; no client functionality has been included.
|
|
|
|
~> Note: Vault's PKI ACME server caps the certificate's validity at 90 days
|
|
maximum, regardless of role and/or global limits. Shorter validity
|
|
durations can be set via limiting the role's TTL to be under 90 days.
|
|
Aligning with Let's Encrypt, we do not support the optional `NotBefore`
|
|
and `NotAfter` order request parameters.
|
|
|
|
### ACME Stores Certificates
|
|
|
|
Because ACME requires stored certificates in order to function, the notes
|
|
[below about automating tidy](#automate-crl-building-and-tidying) are
|
|
especially important for the long-term health of the PKI cluster. ACME also
|
|
introduces additional resource types (accounts, orders, authorizations, and
|
|
challenges) that must be tidied via [the `tidy_acme=true`
|
|
option](/vault/api-docs/secret/pki#tidy). Orders, authorizations, and
|
|
challenges are [cleaned up based on the
|
|
`safety_buffer`](/vault/api-docs/secret/pki#safety_buffer)
|
|
parameter, but accounts can live longer past their last issued certificate
|
|
by controlling the [`acme_account_safety_buffer`
|
|
parameter](/vault/api-docs/secret/pki#acme_account_safety_buffer).
|
|
|
|
As a consequence of the above, and like the discussions in the [Cluster
|
|
Scalability](#cluster-scalability) section, because these roles have
|
|
`no_store=false` set, ACME can only issue certificates on the active nodes
|
|
of PR clusters; standby nodes, if contacted, will transparently forward
|
|
all requests to the active node.
|
|
|
|
### ACME Role Restrictions Require EAB
|
|
|
|
Because ACME by default has no external authorization engine and is
|
|
unauthenticated from a Vault perspective, the use of roles with ACME
|
|
in the default configuration are of limited value as any ACME client
|
|
can request certificates under any role by proving possession of the
|
|
requested certificate identifiers.
|
|
|
|
To solve this issue, there are two possible approaches:
|
|
|
|
1. Use a restrictive [`allowed_roles`, `allowed_issuers`, and
|
|
`default_directory_policy` ACME
|
|
configuration](/vault/api-docs/secret/pki#set-acme-configuration)
|
|
to let only a single role and issuer be used. This prevents user
|
|
choice, allowing some global restrictions to be placed on issuance
|
|
and avoids requiring ACME clients to have (at initial setup) access
|
|
to a Vault token other mechanism for acquiring a Vault EAB ACME token.
|
|
2. Use a more permissive [configuration with
|
|
`eab_policy=always-required`](/vault/api-docs/secret/pki#eab_policy)
|
|
to allow more roles and users to select the roles, but bind ACME clients
|
|
to a Vault token which can be suitably ACL'd to particular sets of
|
|
approved ACME directories.
|
|
|
|
The choice of approach depends on the policies of the organization wishing
|
|
to use ACME.
|
|
|
|
Another consequence of the Vault unauthenticated nature of ACME requests
|
|
are that role templating, based on entity information, cannot be used as
|
|
there is no token and thus no entity associated with the request, even when
|
|
EAB binding is used.
|
|
|
|
### ACME and the Public Internet
|
|
|
|
Using ACME is possible over the public internet; public CAs like Let's Encrypt
|
|
offer this as a service. Similarly, organizations running internal PKI
|
|
infrastructure might wish to issue server certificates to pieces of
|
|
infrastructure outside of their internal network boundaries, from a publicly
|
|
accessible Vault instance. By default, without enforcing a restrictive
|
|
`eab_policy`, this results in a complicated threat model: _any_ external
|
|
client which can prove possession of a domain can issue a certificate under
|
|
this CA, which might be considered more trusted by this organization.
|
|
|
|
As such, we strongly recommend publicly facing Vault instances (such as HCP
|
|
Vault) enforce that PKI mount operators have required a [restrictive
|
|
`eab_policy=always-required` configuration](/vault/api-docs/secret/pki#eab_policy).
|
|
System administrators of Vault instances can enforce this by [setting the
|
|
`VAULT_DISABLE_PUBLIC_ACME=true` environment
|
|
variable](/vault/api-docs/secret/pki#acme-external-account-bindings).
|
|
|
|
### ACME Errors are in Server Logs
|
|
|
|
Because the ACME client is not necessarily trusted (as account registration
|
|
may not be tied to a valid Vault token when EAB is not used), many error
|
|
messages end up in the Vault server logs out of security necessity. When
|
|
troubleshooting issues with clients requesting certificates, first check
|
|
the client's logs, if any, (e.g., certbot will state the log location on
|
|
errors), and then correlate with Vault server logs to identify the failure
|
|
reason.
|
|
|
|
### ACME Security Considerations
|
|
|
|
ACME allows any client to use Vault to make some sort of external call;
|
|
while the design of ACME attempts to minimize this scope and will prohibit
|
|
issuance if incorrect servers are contacted, it cannot account for all
|
|
possible remote server implementations. Vault's ACME server makes three
|
|
types of requests:
|
|
|
|
1. DNS requests for `_acme-challenge.<domain>`, which should be least
|
|
invasive and most safe.
|
|
2. TLS ALPN requests for the `acme-tls/1` protocol, which should be
|
|
safely handled by the TLS before any application code is invoked.
|
|
3. HTTP requests to `http://<domain>/.well-known/acme-challenge/<token>`,
|
|
which could be problematic based on server design; if all requests,
|
|
regardless of path, are treated the same and assumed to be trusted,
|
|
this could result in Vault being used to make (invalid) requests.
|
|
Ideally, any such server implementations should be updated to ignore
|
|
such ACME validation requests or to block access originating from Vault
|
|
to this service.
|
|
|
|
In all cases, no information about the response presented by the remote
|
|
server is returned to the ACME client.
|
|
|
|
When running Vault on multiple networks, note that Vault's ACME server
|
|
places no restrictions on requesting client/destination identifier
|
|
validations paths; a client could use a HTTP challenge to force Vault to
|
|
reach out to a server on a network it could otherwise not access.
|
|
|
|
### ACME and Client Counting
|
|
|
|
In Vault 1.14, ACME contributes differently to usage metrics than other
|
|
interactions with the PKI Secrets Engine. Due to its use of unauthenticated
|
|
requests (which do not generate Vault tokens), it would not be counted in
|
|
the traditional [activity log APIs](/vault/api-docs/system/internal-counters#activity-export).
|
|
Instead, certificates issued via ACME will be counted via their unique
|
|
certificate identifiers (the combination of CN, DNS SANs, and IP SANs).
|
|
These will create a stable identifier that will be consistent across
|
|
renewals, other ACME clients, mounts, and namespaces, contributing to
|
|
the activity log presently as a non-entity token attributed to the first
|
|
mount which created that request.
|
|
|
|
## Keep Certificate Lifetimes Short, For CRL's Sake
|
|
|
|
This secrets engine aligns with Vault's philosophy of short-lived secrets. As
|
|
such it is not expected that CRLs will grow large; the only place a private key
|
|
is ever returned is to the requesting client (this secrets engine does _not_
|
|
store generated private keys, except for CA certificates). In most cases, if the
|
|
key is lost, the certificate can simply be ignored, as it will expire shortly.
|
|
|
|
If a certificate must truly be revoked, the normal Vault revocation function can
|
|
be used, and any revocation action will cause the CRL to be regenerated. When
|
|
the CRL is regenerated, any expired certificates are removed from the CRL (and
|
|
any revoked, expired certificate are removed from secrets engine storage). This
|
|
is an expensive operation! Due to the structure of the CRL standard, Vault must
|
|
read **all** revoked certificates into memory in order to rebuild the CRL and
|
|
clients must fetch the regenerated CRL.
|
|
|
|
This secrets engine does not support multiple CRL endpoints with sliding date
|
|
windows; often such mechanisms will have the transition point a few days apart,
|
|
but this gets into the expected realm of the actual certificate validity periods
|
|
issued from this secrets engine. A good rule of thumb for this secrets engine
|
|
would be to simply not issue certificates with a validity period greater than
|
|
your maximum comfortable CRL lifetime. Alternately, you can control CRL caching
|
|
behavior on the client to ensure that checks happen more often.
|
|
|
|
Often multiple endpoints are used in case a single CRL endpoint is down so that
|
|
clients don't have to figure out what to do with a lack of response. Run Vault
|
|
in HA mode, and the CRL endpoint should be available even if a particular node
|
|
is down.
|
|
|
|
~> Note: Since Vault 1.11.0, with multiple issuers in the same mount point,
|
|
different issuers may have different CRLs (depending on subject and key
|
|
material). This means that Vault may need to regenerate multiple CRLs.
|
|
This is again a rationale for keeping TTLs short and avoiding revocation
|
|
if possible.
|
|
|
|
~> Note: Since Vault 1.12.0, we support two complementary revocation
|
|
mechanisms: Delta CRLs, which allow for rebuilds of smaller, incremental
|
|
additions to the last complete CRL, and OCSP, which allows responding to
|
|
revocation status requests for individual certificates. When coupled with
|
|
the new CRL auto-rebuild functionality, this means that the revoking step
|
|
isn't as costly (as the CRL isn't always rebuilt on each revocation),
|
|
outside of storage considerations. However, while the rebuild operation
|
|
still can be expensive with lots of certificates, it will be done on a
|
|
schedule rather than on demand.
|
|
|
|
### NotAfter Behavior on Leaf Certificates
|
|
|
|
In Vault 1.11.0, the PKI Secrets Engine has introduced a new
|
|
`leaf_not_after_behavior` [parameter on
|
|
issuers](/vault/api-docs/secret/pki#leaf_not_after_behavior).
|
|
This allows modification of the issuance behavior: should Vault `err`,
|
|
preventing issuance of a longer-lived leaf cert than issuer, silently
|
|
`truncate` to that of the issuer's `NotAfter` value, or `permit` longer
|
|
expirations.
|
|
|
|
It is strongly suggested to use `err` or `truncate` for intermediates;
|
|
`permit` is only useful for root certificates, as intermediate's NotAfter
|
|
expiration are checked when validating presented chains.
|
|
|
|
In combination with a cascading expiration with longer lived roots (perhaps
|
|
on the range of 2-10 years), shorter lived intermediates (perhaps on the
|
|
range of 6 months to 2 years), and short-lived leaf certificates (on the
|
|
range of 30 to 90 days), and the [rotation strategies discussed in other
|
|
sections](/vault/docs/secrets/pki/rotation-primitives), this should keep the
|
|
CRLs adequately small.
|
|
|
|
### Cluster Performance and Quantity of Leaf Certificates
|
|
|
|
As mentioned above, keeping TTLs short (or using `no_store=true`) and avoiding
|
|
leases is important for a healthy cluster. However it is important to note
|
|
this is a scale problem: 10-1000 long-lived, stored certificates are probably
|
|
fine, but 50k-100k become a problem and 500k+ stored, unexpired certificates
|
|
can negatively impact even large Vault clusters--even with short TTLs!
|
|
|
|
However, once these certificates are expired, a [tidy operation](/vault/api-docs/secret/pki#tidy)
|
|
will clean up CRLs and Vault cluster storage.
|
|
|
|
Note that organizational risk assessments for certificate compromise might
|
|
mean certain certificate types should always be issued with `no_store=false`;
|
|
even short-lived broad wildcard certificates (say, `*.example.com`) might be
|
|
important enough to have precise control over revocation. However, an internal
|
|
service with a well-scoped certificate (say, `service.example.com`) might be
|
|
of low enough risk to issue a 90-day TTL with `no_store=true`, preventing
|
|
the need for revocation in the unlikely case of compromise.
|
|
|
|
Having a shorter TTL decreases the likelihood of needing to revoke a cert
|
|
(but cannot prevent it entirely) and decrease the impact of any such
|
|
compromise.
|
|
|
|
~> Note: As of Vault 1.12, the PKI Secret Engine's [Bring-Your-Own-Cert
|
|
(BYOC)](/vault/api-docs/secret/pki#revoke-certificate)
|
|
functionality allows revocation of certificates not previously stored
|
|
(e.g., issued via a role with `no_store=true`). This means that setting
|
|
`no_store=true` _is now_ safe to be used globally, regardless of importance
|
|
of issued certificates (and their likelihood for revocation).
|
|
|
|
## You must configure issuing/CRL/OCSP information _in advance_
|
|
|
|
This secrets engine serves CRLs from a predictable location, but it is not
|
|
possible for the secrets engine to know where it is running. Therefore, you must
|
|
configure desired URLs for the issuing certificate, CRL distribution points, and
|
|
OCSP servers manually using the `config/urls` endpoint. It is supported to have
|
|
more than one of each of these by passing in the multiple URLs as a
|
|
comma-separated string parameter.
|
|
|
|
~> Note: when using Vault Enterprise's Performance Replication features with a
|
|
PKI Secrets Engine mount, each cluster will have its own CRL; this means
|
|
each cluster's unique CRL address should be included in the [AIA
|
|
information](https://datatracker.ietf.org/doc/html/rfc5280#section-5.2.7)
|
|
field separately, or the CRLs should be consolidated and served outside of
|
|
Vault.
|
|
|
|
~> Note: When using multiple issuers in the same mount, it is suggested to use
|
|
the per-issuer AIA fields rather than the global (`/config/urls`) variant.
|
|
This is for correctness: these fields are used for chain building and
|
|
automatic CRL detection in certain applications. If they point to the wrong
|
|
issuer's information, these applications may break.
|
|
|
|
## Distribution of CRLs and OCSP
|
|
|
|
Both CRLs and OCSP allow interrogating revocation status of certificates. Both
|
|
of these methods include internal security and authenticity (both CRLs and
|
|
OCSP responses are signed by the issuing CA within Vault). This means both are
|
|
fine to distribute over non-secure and non-authenticated channels, such as
|
|
HTTP.
|
|
|
|
## Automate CRL Building and Tidying
|
|
|
|
Since Vault 1.12, the PKI Secrets Engine supports automated CRL rebuilding
|
|
(including optional Delta CRLs which can be built more frequently than
|
|
complete CRLs) via the `/config/crl` endpoint. Additionally, tidying of
|
|
revoked and expired certificates can be configured automatically via the
|
|
`/config/auto-tidy` endpoint. Both of these should be enabled to ensure
|
|
compatibility with the wider PKIX ecosystem and performance of the cluster.
|
|
|
|
## Spectrum of Revocation Support
|
|
|
|
Starting with Vault 1.13, the PKI secrets engine has the ability to support a
|
|
spectrum of cluster sizes and certificate revocation quantities.
|
|
|
|
For users with few revocations or who want a unified view and have the
|
|
inter-cluster bandwidth to support it, we recommend turning on auto
|
|
rebuilding of CRLs, cross-cluster revocation queues, and cross-cluster CRLs.
|
|
This allows all consumers of the CRLs to have the most accurate picture of
|
|
revocations, regardless of which cluster they talk to.
|
|
|
|
If the unified CRL becomes too big for the underlying storage mechanism or
|
|
for a single host to build, we recommend relying on OCSP instead of CRLs.
|
|
These have much smaller storage entries, and the CRL `disabled` flag is
|
|
independent of `unified_crls`, allowing unified OCSP to remain.
|
|
|
|
However, when cross-cluster traffic becomes too high (or if CRLs are still
|
|
necessary in addition to OCSP), we recommend sharding the CRL between
|
|
different clusters. This has been the default behavior of Vault, but with
|
|
the introduction of per-cluster, templated AIA information, the leaf
|
|
certificate's Authority Information Access (AIA) info will point directly
|
|
to the cluster which issued it, allowing the correct CRL for this cert to
|
|
be identified by the application. This more correctly mimics the behavior
|
|
of [Let's Encrypt's CRL sharding](https://letsencrypt.org/2022/09/07/new-life-for-crls.html).
|
|
|
|
This sharding behavior can also be used for OCSP, if the cross-cluster
|
|
traffic for revocation entries becomes too high.
|
|
|
|
For users who wish to manage revocation manually, using the audit logs to
|
|
track certificate issuance would allow an external system to identify which
|
|
certificates were issued. These can be manually tracked for revocation, and
|
|
a [custom CRL can be built](/vault/api-docs/secret/pki#combine-crls-from-the-same-issuer)
|
|
using externally tracked revocations. This would allow usage of roles set to
|
|
`no_store=true`, so Vault is strictly used as an issuing authority and isn't
|
|
storing any certificates, issued or revoked. For the highest of revocation
|
|
volumes, this could be the best option.
|
|
|
|
Notably, this last approach can either be used for the creation of externally
|
|
stored unified or sharded CRLs. If a single external unified CRL becomes
|
|
unreasonably large, each cluster's certificates could have AIA info point
|
|
to an externally stored and maintained, sharded CRL. However,
|
|
Vault has no mechanism to sign OCSP requests at this time.
|
|
|
|
### What Are Cross-Cluster CRLs?
|
|
|
|
Vault Enterprise supports a clustering mode called [Performance
|
|
Replication](/vault/docs/enterprise/replication#performance-replication). In
|
|
a replicated PKI Secrets Engine mount, issuer and role information is synced
|
|
between the Performance Primary and all Performance Secondary clusters.
|
|
However, each Performance Secondary cluster has its own local storage of
|
|
issued certificates and revocations which is not synced. In Vault versions
|
|
before 1.13, this meant that each of these clusters had its own CRL and
|
|
OCSP data, and any revocation requests needed to be processed on the
|
|
cluster that issued it (or BYOC used).
|
|
|
|
Starting with Vault 1.13, we've added [two
|
|
features](/vault/api-docs/secret/pki#read-crl-configuration) to Vault
|
|
Enterprise to help manage this setup more correctly and easily: revocation
|
|
request queues (`cross_cluster_revocation=true` in `config/crl`) and unified
|
|
revocation entries (`unified_crl=true` in `config/crl`).
|
|
|
|
The former allows operators (revoking by serial number) to request a
|
|
certificate be revoked regardless of which cluster it was issued on. For
|
|
example, if a request goes into the Performance Primary, but it didn't
|
|
issue the certificate, it'll write a cross-cluster revocation request,
|
|
and mark the results as pending. If another cluster already has this
|
|
certificate in storage, it will revoke it and confirm the revocation back
|
|
to the main cluster. An operator can [list pending
|
|
revocations](/vault/api-docs/secret/pki#list-revocation-requests) to see
|
|
the status of these requests. To clean up invalid requests (e.g., if the
|
|
cluster which had that certificate disappeared, if that certificate was
|
|
issued with `no_store=true` on the role, or if it was an invalid serial
|
|
number), an operator can [use tidy](/vault/api-docs/secret/pki#tidy) with
|
|
`tidy_revocation_queue=true`, optionally shortening
|
|
`revocation_queue_safety_buffer` to remove them quicker.
|
|
|
|
The latter allows all clusters to have a unified view of revocations,
|
|
that is, to have access to a list of revocations performed by other clusters.
|
|
While the configuration parameter includes `crl` in the description, this
|
|
applies to [both CRLs](/vault/api-docs/secret/pki#read-issuer-crl) and the
|
|
[OCSP responder](/vault/api-docs/secret/pki#ocsp-request). When this
|
|
revocation replication occurs, if any cluster considers a cert revoked when
|
|
another doesn't (e.g., via BYOC revocation of a `no_store=false` certificate),
|
|
all clusters will now consider it revoked assuming it hasn't expired. Notably,
|
|
the active node of the primary cluster will be used to rebuild the CRL; as
|
|
this can grow large if many clusters have lots of revoked certs, an operator
|
|
might need to disable CRL building (`disabled=true` in `config/crl`) or
|
|
increase the [storage size](/vault/docs/configuration/storage/raft#max_entry_size).
|
|
|
|
As an aside, all new cross-cluster writes (from Performance Secondary up to
|
|
the Performance Primary) are performed synchronously. This gives the caller
|
|
confidence that the request actually went through, at the expense of incurring
|
|
a bit higher overhead for revoking certificates. When a node loses its GRPC
|
|
connection (e.g., during leadership election or being otherwise unable to
|
|
contact the active primary), errors will occur though the local portion of the
|
|
write (if any) will still succeed. For cross-cluster revocation requests, due
|
|
to there being no local write, this means that the operation will need to be
|
|
retried, but in the event of an issue writing a cross-cluster revocation entry
|
|
when the cert existed locally, the revocation will eventually be synced across
|
|
clusters when the connection comes back.
|
|
|
|
## Issuer Subjects and CRLs
|
|
|
|
As noted on several [GitHub issues](https://github.com/hashicorp/vault/issues/10176),
|
|
Go's x509 library has an opinionated parsing and structuring mechanism for
|
|
certificate's Subjects. Issuers created within Vault are fine, but when using
|
|
externally created CA certificates, these may not be parsed
|
|
correctly throughout all parts of the PKI. In particular, CRLs embed a
|
|
(modified) copy of the issuer name. This can be avoided by using OCSP to
|
|
track revocation, but note that performance characteristics are different
|
|
between OCSP and CRLs.
|
|
|
|
~> Note: As of Go 1.20 and Vault 1.13, Go correctly formats the CRL's issuer
|
|
name and this notice [does not apply](https://github.com/golang/go/commit/a367981b4c8e3ae955eca9cc597d9622201155f3).
|
|
|
|
## Automate Leaf Certificate Renewal
|
|
|
|
To manage certificates for services at scale, it is best to automate the
|
|
certificate renewal as much as possible. Vault Agent [has support for
|
|
automatically renewing requested certificates](/vault/docs/agent-and-proxy/agent/template#certificates)
|
|
based on the `validTo` field. Other solutions might involve using
|
|
[cert-manager](https://cert-manager.io/) in Kubernetes or OpenShift, backed
|
|
by the Vault CA.
|
|
|
|
## Safe Minimums
|
|
|
|
Since its inception, this secrets engine has enforced SHA256 for signature
|
|
hashes rather than SHA1. As of 0.5.1, a minimum of 2048 bits for RSA keys is
|
|
also enforced. Software that can handle SHA256 signatures should also be able to
|
|
handle 2048-bit keys, and 1024-bit keys are considered unsafe and are disallowed
|
|
in the Internet PKI.
|
|
|
|
## Token Lifetimes and Revocation
|
|
|
|
When a token expires, it revokes all leases associated with it. This means that
|
|
long-lived CA certs need correspondingly long-lived tokens, something that is
|
|
easy to forget. Starting with 0.6, root and intermediate CA certs no longer have
|
|
associated leases, to prevent unintended revocation when not using a token with
|
|
a long enough lifetime. To revoke these certificates, use the `pki/revoke`
|
|
endpoint.
|
|
|
|
## Safe Usage of Roles
|
|
|
|
The Vault PKI Secrets Engine supports many options to limit issuance via
|
|
[Roles](/vault/api-docs/secret/pki#create-update-role).
|
|
Careful consideration of construction is necessary to ensure that more
|
|
permissions are not given than necessary. Additionally, roles should generally
|
|
do _one_ thing; multiple roles should be preferable over having too permissive
|
|
roles that allow arbitrary issuance (e.g., `allow_any_name` should generally
|
|
be used sparingly, if at all).
|
|
|
|
- `allow_any_name` should generally be set to `false`; this is the default.
|
|
- `allow_localhost` should generally be set to `false` for production
|
|
services, unless listening on `localhost` is expected.
|
|
- Unless necessary, `allow_wildcard_certificates` should generally be set to
|
|
`false`. This is **not** the default due to backwards compatibility
|
|
concerns.
|
|
- This is especially necessary when `allow_subdomains` or `allow_glob_domains`
|
|
are enabled.
|
|
- `enforce_hostnames` should generally be enabled for TLS services; this is
|
|
the default.
|
|
- `allow_ip_sans` should generally be set to `false` (but defaults to `true`),
|
|
unless IP address certificates are explicitly required.
|
|
- When using short TTLs (< 30 days) or with high issuance volume, it is
|
|
generally recommend to set `no_store` to `true` (defaults to `false`).
|
|
This prevents revocation but allows higher throughput as Vault no longer
|
|
needs to store every issued certificate. This is discussed more in the
|
|
[Replicated Datasets](#replicated-datasets) section below.
|
|
- Do not use roles with root certificates (`issuer_ref`). Root certificates
|
|
should generally only issue intermediates (see the section on [CA hierarchy
|
|
above](#use-a-ca-hierarchy)), which doesn't rely on roles.
|
|
- Limit `key_usage` and `ext_key_usage`; don't attempt to allow all usages
|
|
for all purposes. Generally the default values are useful for client and
|
|
server TLS authentication.
|
|
|
|
## Telemetry
|
|
|
|
Beyond Vault's default telemetry around request processing, PKI exposes count and
|
|
duration metrics for the issue, sign, sign-verbatim, and revoke calls. The
|
|
metrics keys take the form `mount-path,operation,[failure]` with labels for
|
|
namespace and role name.
|
|
|
|
Note that these metrics are per-node and thus would need to be aggregated across
|
|
nodes and clusters.
|
|
|
|
## Auditing
|
|
|
|
Because Vault HMACs audit string keys by default, it is necessary to tune
|
|
PKI secrets mounts to get an accurate view of issuance that is occurring under
|
|
this mount.
|
|
|
|
~> Note: Depending on usage of Vault, CRLs (and rarely, CA chains) can grow to
|
|
be rather large. We don't recommend un-HMACing the `crl` field for this
|
|
reason, but note that the recommendations below suggest to un-HMAC the
|
|
`certificate` response parameter, which the CRL can be served in via
|
|
the `/pki/cert/crl` API endpoint. Additionally, the `http_raw_body` can
|
|
be used to return CRL both in PEM and raw binary DER form, so it is
|
|
suggested not to un-HMAC that field to not corrupt the log format.<br /><br />
|
|
If this is done with only a [syslog](/vault/docs/audit/syslog) audit device,
|
|
Vault can deny requests (with an opaque `500 Internal Error` message)
|
|
after the action has been performed on the server, because it was
|
|
unable to log the message.<br /><br />
|
|
The suggested workaround is to either leave the `certificate` and `crl`
|
|
response fields HMACed and/or to also enable the [`file`](/vault/docs/audit/file)
|
|
audit log type.
|
|
|
|
Some suggested keys to un-HMAC for requests are as follows:
|
|
|
|
- `csr` - the requested CSR to sign,
|
|
- `certificate` - the requested self-signed certificate to re-sign or
|
|
when importing issuers,
|
|
- Various issuance-related overriding parameters, such as:
|
|
- `issuer_ref` - the issuer requested to sign this certificate,
|
|
- `common_name` - the requested common name,
|
|
- `alt_names` - alternative requested DNS-type SANs for this certificate,
|
|
- `other_sans` - other (non-DNS, non-Email, non-IP, non-URI) requested SANs for this certificate,
|
|
- `ip_sans` - requested IP-type SANs for this certificate,
|
|
- `uri_sans` - requested URI-type SANs for this certificate,
|
|
- `ttl` - requested expiration date of this certificate,
|
|
- `not_after` - requested expiration date of this certificate,
|
|
- `serial_number` - the subject's requested serial number,
|
|
- `key_type` - the requested key type,
|
|
- `private_key_format` - the requested key format which is also
|
|
used for the public certificate format as well,
|
|
- Various role- or issuer-related generation parameters, such as:
|
|
- `managed_key_name` - when creating an issuer, the requested managed
|
|
key name,
|
|
- `managed_key_id` - when creating an issuer, the requested managed
|
|
key identifier,
|
|
- `ou` - the subject's organizational unit,
|
|
- `organization` - the subject's organization,
|
|
- `country` - the subject's country code,
|
|
- `locality` - the subject's locality,
|
|
- `province` - the subject's province,
|
|
- `street_address` - the subject's street address,
|
|
- `postal_code` - the subject's postal code,
|
|
- `permitted_dns_domains` - permitted DNS domains,
|
|
- `policy_identifiers` - the requested policy identifiers when creating a role, and
|
|
- `ext_key_usage_oids` - the extended key usage OIDs for the requested certificate.
|
|
|
|
Some suggested keys to un-HMAC for responses are as follows:
|
|
|
|
- `certificate` - the certificate that was issued,
|
|
- `issuing_ca` - the certificate of the CA which issued the requested
|
|
certificate,
|
|
- `serial_number` - the serial number of the certificate that was issued,
|
|
- `error` - to show errors associated with the request, and
|
|
- `ca_chain` - optional due to noise; the full CA chain of the issuer of
|
|
the requested certificate.
|
|
|
|
~> Note: These list of parameters to un-HMAC are provided as a suggestion and
|
|
may not be exhaustive.
|
|
|
|
The following keys are suggested **NOT** to un-HMAC, due to their sensitive
|
|
nature:
|
|
|
|
- `private_key` - this response parameter contains the private keys
|
|
generated by Vault during issuance, and
|
|
- `pem_bundle` this request parameter is only used on the issuer-import
|
|
paths and may contain sensitive private key material.
|
|
|
|
## Role-Based Access
|
|
|
|
Vault supports [path-based ACL Policies](/vault/tutorials/getting-started/getting-started-policies)
|
|
for limiting access to various paths within Vault.
|
|
|
|
The following is a condensed example reference of ACLing the PKI Secrets
|
|
Engine. These are just a suggestion; other personas and policy approaches
|
|
may also be valid.
|
|
|
|
We suggest the following personas:
|
|
|
|
- *Operator*; a privileged user who manages the health of the PKI
|
|
subsystem; manages issuers and key material.
|
|
- *Agent*; a semi-privileged user that manages roles and handles
|
|
revocation on behalf of an operator; may also handle delegated
|
|
issuance. This may also be called an *administrator* or *role
|
|
manager*.
|
|
- *Advanced*; potentially a power-user or service that has access to
|
|
additional issuance APIs.
|
|
- *Requester*; a low-level user or service that simply requests certificates.
|
|
- *Unauthed*; any arbitrary user or service that lacks a Vault token.
|
|
|
|
For these personas, we suggest the following ACLs, in condensed, tabular form:
|
|
|
|
| Path | Operations | Operator | Agent | Advanced | Requester | Unauthed |
|
|
| :--- | :--------- | :------- | :---- | :------- | :-------- | :------- |
|
|
| `/ca(/pem)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/ca_chain` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/crl(/pem)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/crl/delta(/pem)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/cert/:serial(/raw(/pem)?)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/issuers` | List | Yes | Yes | Yes | Yes | Yes |
|
|
| `/issuer/:issuer_ref/(json¦der¦pem)` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/issuer/:issuer_ref/crl(/der¦/pem)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/issuer/:issuer_ref/crl/delta(/der¦/pem)?` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/ocsp/<request>` | Read | Yes | Yes | Yes | Yes | Yes |
|
|
| `/ocsp` | Write | Yes | Yes | Yes | Yes | Yes |
|
|
| `/certs` | List | Yes | Yes | Yes | Yes | |
|
|
| `/revoke-with-key` | Write | Yes | Yes | Yes | Yes | |
|
|
| `/roles` | List | Yes | Yes | Yes | Yes | |
|
|
| `/roles/:role` | Read | Yes | Yes | Yes | Yes | |
|
|
| `/(issue¦sign)/:role` | Write | Yes | Yes | Yes | Yes | |
|
|
| `/issuer/:issuer_ref/(issue¦sign)/:role` | Write | Yes | Yes | Yes | | |
|
|
| `/config/auto-tidy` | Read | Yes | Yes | | | |
|
|
| `/config/ca` | Read | Yes | Yes | | | |
|
|
| `/config/crl` | Read | Yes | Yes | | | |
|
|
| `/config/issuers` | Read | Yes | Yes | | | |
|
|
| `/crl/rotate` | Read | Yes | Yes | | | |
|
|
| `/crl/rotate-delta` | Read | Yes | Yes | | | |
|
|
| `/roles/:role` | Write | Yes | Yes | | | |
|
|
| `/issuer/:issuer_ref` | Read | Yes | Yes | | | |
|
|
| `/sign-verbatim(/:role)?` | Write | Yes | Yes | | | |
|
|
| `/issuer/:issuer_ref/sign-verbatim(/:role)?` | Write | Yes | Yes | | | |
|
|
| `/revoke` | Write | Yes | Yes | | | |
|
|
| `/tidy` | Write | Yes | Yes | | | |
|
|
| `/tidy-cancel` | Write | Yes | Yes | | | |
|
|
| `/tidy-status` | Read | Yes | Yes | | | |
|
|
| `/config/auto-tidy` | Write | Yes | | | | |
|
|
| `/config/ca` | Write | Yes | | | | |
|
|
| `/config/crl` | Write | Yes | | | | |
|
|
| `/config/issuers` | Write | Yes | | | | |
|
|
| `/config/keys` | Read, Write | Yes | | | | |
|
|
| `/config/urls` | Read, Write | Yes | | | | |
|
|
| `/issuer/:issuer_ref` | Write | Yes | | | | |
|
|
| `/issuer/:issuer_ref/revoke` | Write | Yes | | | | |
|
|
| `/issuer/:issuer_ref/sign-intermediate` | Write | Yes | | | | |
|
|
| `/issuer/issuer_ref/sign-self-issued` | Write | Yes | | | | |
|
|
| `/issuers/generate/+/+` | Write | Yes | | | | |
|
|
| `/issuers/import/+` | Write | Yes | | | | |
|
|
| `/intermediate/generate/+` | Write | Yes | | | | |
|
|
| `/intermediate/cross-sign` | Write | Yes | | | | |
|
|
| `/intermediate/set-signed` | Write | Yes | | | | |
|
|
| `/keys` | List | Yes | | | | |
|
|
| `/key/:key_ref` | Read, Write | Yes | | | | |
|
|
| `/keys/generate/+` | Write | Yes | | | | |
|
|
| `/keys/import` | Write | Yes | | | | |
|
|
| `/root/generate/+` | Write | Yes | | | | |
|
|
| `/root/sign-intermediate` | Write | Yes | | | | |
|
|
| `/root/sign-self-issued` | Write | Yes | | | | |
|
|
| `/root/rotate/+` | Write | Yes | | | | |
|
|
| `/root/replace` | Write | Yes | | | | |
|
|
|
|
~> Note: With managed keys, operators might need access to [read the mount
|
|
point's tunable data](/vault/api-docs/system/mounts) (Read on `/sys/mounts`) and
|
|
may need access [to use or manage managed keys](/vault/api-docs/system/managed-keys).
|
|
|
|
## Replicated DataSets
|
|
|
|
When operating with [Performance Secondary](/vault/docs/enterprise/replication#architecture)
|
|
clusters, certain data-sets are maintained across all clusters, while others for performance
|
|
and scalability reasons are kept within a given cluster.
|
|
|
|
The following table breaks down by data type what data sets will cross the cluster boundaries.
|
|
For data-types that do not cross a cluster boundary, read requests for that data will need to be
|
|
sent to the appropriate cluster that the data was generated on.
|
|
|
|
| Data Set | Replicated Across Clusters |
|
|
|--------------------------|----------------------------|
|
|
| Issuers & Keys | Yes |
|
|
| Roles | Yes |
|
|
| CRL Config | Yes |
|
|
| URL Config | Yes |
|
|
| Issuer Config | Yes |
|
|
| Key Config | Yes |
|
|
| CRL | No |
|
|
| Revoked Certificates | No |
|
|
| Leaf/Issued Certificates | No |
|
|
|
|
The main effect is that within the PKI secrets engine leaf certificates
|
|
issued with `no_store` set to `false` are stored local to the cluster that issued them.
|
|
This allows for both primary and [Performance Secondary](/vault/docs/enterprise/replication#architecture)
|
|
clusters' active node to issue certificates for greater scalability. As a
|
|
result, these certificates and any revocations are visible only on the issuing
|
|
cluster. This additionally means each cluster has its own set of CRLs, distinct
|
|
from other clusters. These CRLs should either be unified into a single CRL for
|
|
distribution from a single URI, or server operators should know to fetch all
|
|
CRLs from all clusters.
|
|
|
|
## Cluster Scalability
|
|
|
|
Most non-introspection operations in the PKI secrets engine require a write to
|
|
storage, and so are forwarded to the cluster's active node for execution.
|
|
This table outlines which operations can be executed on performance standby nodes
|
|
and thus scale horizontally across all nodes within a cluster.
|
|
|
|
| Path | Operations |
|
|
|-------------------------------|----------------------|
|
|
| ca[/pem] | Read |
|
|
| cert/<em>serial-number</em> | Read |
|
|
| cert/ca_chain | Read |
|
|
| config/crl | Read |
|
|
| certs | List |
|
|
| ca_chain | Read |
|
|
| crl[/pem] | Read |
|
|
| issue | Update <sup>\*</sup> |
|
|
| revoke/<em>serial-number</em> | Read |
|
|
| sign | Update <sup>\*</sup> |
|
|
| sign-verbatim | Update <sup>\*</sup> |
|
|
|
|
\* Only if the corresponding role has `no_store` set to true and `generate_lease`
|
|
set to false. If `generate_lease` is true the lease creation will be forwarded to
|
|
the active node; if `no_store` is false the entire request will be forwarded to
|
|
the active node.
|
|
|
|
## PSS Support
|
|
|
|
Go lacks support for PSS certificates, keys, and CSRs using the `rsaPSS` OID
|
|
(`1.2.840.113549.1.1.10`). It requires all RSA certificates, keys, and CSRs
|
|
to use the alternative `rsaEncryption` OID (`1.2.840.113549.1.1.1`).
|
|
|
|
When using OpenSSL to generate CAs or CSRs from PKCS8-encoded PSS keys, the
|
|
resulting CAs and CSRs will have the `rsaPSS` OID. Go and Vault will reject
|
|
them. Instead, use OpenSSL to generate or convert to a PKCS#1v1.5 private
|
|
key file and use this to generate the CSR. Vault will, depending on the role
|
|
and the signing mechanism, still use a PSS signature despite the
|
|
`rsaEncryption` OID on the request as the SubjectPublicKeyInfo and
|
|
SignatureAlgorithm fields are orthogonal. When creating an external CA and
|
|
importing it into Vault, ensure that the `rsaEncryption` OID is present on
|
|
the SubjectPublicKeyInfo field even if the SignatureAlgorithm is PSS-based.
|
|
|
|
These certificates generated by Go (with `rsaEncryption` OID but PSS-based
|
|
signatures) are otherwise compatible with the fully PSS-based certificates.
|
|
OpenSSL and NSS support parsing and verifying chains using this type of
|
|
certificate. Note that some TLS implementations may not support these types
|
|
of certificates if they do not support `rsa_pss_rsae_*` signature schemes.
|
|
Additionally, some implementations allow rsaPSS OID certificates to contain
|
|
restrictions on signature parameters allowed by this certificate, but Go and
|
|
Vault do not support adding such restrictions.
|
|
|
|
At this time Go lacks support for signing CSRs with the PSS signature
|
|
algorithm. If using a managed key that requires a RSA PSS algorithm (such as GCP or
|
|
a PKCS#11 HSM) as a backing for an intermediate CA key, attempting to generate
|
|
a CSR (via `pki/intermediate/generate/kms`) will fail signature verification.
|
|
In this case, the CSR will need to be generated outside of Vault and the
|
|
signed final certificate can be imported into the mount.
|
|
|
|
Go additionally lacks support for creating OCSP responses with the PSS
|
|
signature algorithm. Vault will automatically downgrade issuers with
|
|
PSS-based revocation signature algorithms to PKCS#1v1.5, but note that
|
|
certain KMS devices (like HSMs and GCP) may not support this with the
|
|
same key. As a result, the OCSP responder may fail to sign responses,
|
|
returning an internal error.
|
|
|
|
## Issuer Storage Migration Issues
|
|
|
|
When Vault migrates to the new multi-issuer storage layout on releases prior
|
|
to 1.11.6, 1.12.2, and 1.13, and storage write errors occur during the mount
|
|
initialization and storage migration process, the default issuer _may_ not
|
|
have the correct `ca_chain` value and may only have the self-reference. These
|
|
write errors most commonly manifest in logs as a message like
|
|
`failed to persist issuer ... chain to disk: <cause>` and indicate that Vault
|
|
was not stable at the time of migration. Note that this only occurs when more
|
|
than one issuer exists within the mount (such as an intermediate with root).
|
|
|
|
To fix this manually (until a new version of Vault automatically rebuilds the
|
|
issuer chain), a rebuild of the chains can be performed:
|
|
|
|
```
|
|
curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":"self"}' https://.../issuer/default
|
|
curl -X PATCH -H "Content-Type: application/merge-patch+json" -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d '{"manual_chain":""}' https://.../issuer/default
|
|
```
|
|
|
|
This temporarily sets the manual chain on the default issuer to a self-chain
|
|
only, before reverting it back to automatic chain building. This triggers a
|
|
refresh of the `ca_chain` field on the issuer, and can be verified with:
|
|
|
|
```
|
|
vault read pki/issuer/default
|
|
```
|
|
|
|
## Tutorial
|
|
|
|
Refer to the [Build Your Own Certificate Authority (CA)](/vault/tutorials/secrets-management/pki-engine)
|
|
guide for a step-by-step tutorial.
|
|
|
|
Have a look at the [PKI Secrets Engine with Managed Keys](/vault/tutorials/enterprise/managed-key-pki)
|
|
for more about how to use externally managed keys with PKI.
|
|
|
|
## API
|
|
|
|
The PKI secrets engine has a full HTTP API. Please see the
|
|
[PKI secrets engine API](/vault/api-docs/secret/pki) for more
|
|
details.
|