* Add Vault APIS to create, list, delete ACME EAB keys
- Add Vault authenticated APIs to create, list and delete ACME
EAB keys.
- Add supporting tests for all new apis
* Add require_eab to acme configuration
* Add EAB support to ACME
* Add EAB support to ACME
* PR feedback 1
- Address missing err return within DeleteEab
- Move verifyEabPayload to acme_jws.go no code changes in this PR
- Update error message returned for error on account storage with EAB.
* PR feedback 2
- Verify JWK signature payload after signature verification
* Introduce an ACME eab_policy in configuration
- Instead of a boolean on/off for require_eab, introduce named policies for ACME behaviour enforcing eab.
- The default policy of always-required, will force new accounts to have an EAB, and all operations in the future, will make sure the account has an EAB associated with it.
- Two other policies, not-required will allow any anonymous users to use ACME within PKI and 'new-account-required' will enforce new accounts going forward to require an EAB, but existing accounts will still be allowed to use ACME if they don't have an EAB associated with the account.
- Having 'always-required' as a policy, will override the environment variable to disable public acme as well.
* Add missing go-docs to new tests.
* Add valid eab_policy values in error message.
* Structure of ACME Tidy.
* The tidy endpoints/call information.
* Counts for status plumbing.
* Update typo calls, add information saving date of account creation.
* Missed some field locations.
* Set-up of Tidy command.
* Proper tidy function; lock to work with
* Remove order safety buffer.
* Missed a field.
* Read lock for account creation; Write lock for tidy (account deletion)
* Type issues fixed.
* fix range operator.
* Fix path_tidy read.
* Add fields to auto-tidy config.
* Add (and standardize) Tidy Config Response
* Test pass, consistent fields
* Changes from PR-Reviews.
* Update test to updated default due to PR-Review.
* Handle caching of ACME config
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add DNS resolvers to ACME configuration
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add custom DNS resolver to challenge verification
This required plumbing through the config, reloading it when necessary,
and creating a custom net.Resolver instance.
Not immediately clear is how we'd go about building a custom DNS
validation mechanism that supported multiple resolvers. Likely we'd need
to rely on meikg/dns and handle the resolution separately for each
container and use a custom Dialer that assumes the address is already
pre-resolved.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Improvements to Docker harness
- Expose additional service information, allowing callers to figure out
both the local address and the network-specific address of the
service container, and
- Allow modifying permissions on uploaded container files.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add infrastructure to run Bind9 in a container for tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Validate DNS-01 challenge works
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ACME revocation handlers
This refactors path_revoke to expose Proof of Possession verification,
which is reused by ACME to allow methods 1 and 2:
1. Revocation of a certificate issued by the account, using account
signature as sufficient proof.
2. Revocation of a certificate via proving possession of its private
key, using this private key to create the JWS signature.
We do not support the third mechanism, completing challenges equivalent
to those on the existing certificate and then performing a revocation
under an account which didn't issue the certificate but which did solve
those challenges.
We additionally create another map account->cert->order, allowing us to
quickly look up if a cert was issued by this particular account. Note
that the inverse lookup of cert->(account, order) lookup isn't yet
possible due to Vault's storage structure.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update ACME pkiext tests to revoke certs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add auth handler checks
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* WIP: Implement ACME CSR signing and certificate retrieval
* Add some validations within the ACME finalize API
- Validate that the CSR we were given matches the DNS names
and IP addresses within the order
- Validate that the CSR does not share the same public as the
account
* Gate ACME finalize order validating all authorizations are in valid state
* Add infrastructure for warnings on CRL rebuilds
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add warning on issuer missing KU for CRL Signing
When an entire issuer equivalency class is missing CRL signing usage
(but otherwise has key material present), we should add a warning so
operators can either correct this issuer or create an equivalent version
with KU specified.
Resolves: https://github.com/hashicorp/vault/issues/20137
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for issuer warnings
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix return order of CRL builders
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow creating storageContext with timeout
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add challenge validation engine to ACME
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Initialize the ACME challenge validation engine
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Trigger challenge validation on endpoint submission
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix GetKeyThumbprint to use raw base64
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Point at localhost for testing
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add cleanup of validation engine
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
- Add a helper function that can accept the final API path along with
the pattern function for an ACME api definition and generate the
various flavors for the given API
* Implement ACME new-order API
- This is a very rough draft for the new order ACME API
* Add ACME order list API
* Implement ACME Get order API
* Misc order related fixes
- Filter authorizations in GetOrders for valid
- Validate notBefore and notAfter dates make sense
- Add <order>/cert URL path to order response if set to valid
* Return account status within err authorized, if the account key verified
* Distinguish POST-as-GET from POST-with-empty-body
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ACME authorization, identifier, and challenge types
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to load and save authorizations
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ACME authorizations path handling
This supports two methods: a fetch handler over the authorization, to
expose the underlying challenges, and a deactivate handler to revoke
the authorization and mark its challenges invalid.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ACME challenge path handling
These paths kick off processing and validation of the challenge by the
ACME client.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rework ACME workflow test to leverage Golang's ACME client library
- Instead of testing manually, leverage the Golang ACME library
to test against our implementation from the unit tests.
* Add tests for new-account and misc fixes
- Set and return the account status for registration
- Add handlers for the account/ api/updates
- Switch acme/ to cluster local storage
- Disable terms of service checks for now as we don't set the url
* PR feedback
- Implement account deactivation
- Create separate account update handler, to not mix account creation
logic
- Add kid field to account update definition
- Add support to update contact details on an existing account
* Squash pki/acme package down to pki folder
Without refactoring most of PKI to export the storage layer, which we
were initially hesitant about, it would be nearly impossible to have the
ACME layer handle its own storage while being in the acme/ subpackage
under the pki package.
Thus, merge the two packages together again.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Properly format errors for missing parameters
When missing required ACME request parameters, don't return Vault-level
errors, but drop into the PKI package to return properly-formatted ACME
error messages.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Error type clarifications
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix GetOk with type conversion calls
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Identify whether JWKs existed or were created, set KIDs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Reclassify ErrAccountDoesNotExist as 400 per spec
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add additional stub methods for ACME accounts
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Start adding ACME newAccount handlers
This handler supports two pieces of functionality:
1. Searching for whether an existing account already exists.
2. Creating a new account.
One side effect of our JWS parsing logic is that we needed a way to
differentiate between whether a JWK existed on disk from an account or
if it was specified in the request. This technically means we're
potentially responding to certain requests with positive results (e.g.,
key search based on kid) versus erring earlier like other
implementations do.
No account storage has been done as part of this commit.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Unify path fields handling, fix newAccount method
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Telemetry Metrics Configuration.
* Err Shadowing Fix (woah, semgrep is cool).
* Fix TestBackend_RevokePlusTidy_Intermediate
* Add Changelog.
* Fix memory leak. Code cleanup as suggested by Steve.
* Turn off metrics by default, breaking-change.
* Show on tidy-status before start-up.
* Fix tests
* make fmt
* Add emit metrics to periodicFunc
* Test not delivering unavailable metrics + fix.
* Better error message.
* Fixing the false-error bug.
* make fmt.
* Try to fix race issue, remove confusing comments.
* Switch metric counter variables to an atomic.Uint32
- Switch the metric counter variables to an atomic variable type
so that we are forced to properly load/store values to it
* Fix race-issue better by trying until the metric is sunk.
* make fmt.
* empty commit to retrigger non-race tests that all pass locally
---------
Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
* PKI Unified CRL/OCSP apis should be ent only
- Do not enable any of the unified crl/ocsp related apis on OSS.
* Rollback refactoring of pathFetchCRLViaCertPath
- As pointed out in the PR, this method isn't actually being used at
the moment with the <serial> handler, pathFetchValid, matching
everything under the cert/XXXX path.
* Fix schema for ent/oss diff
- Define the OSS vs ENT urls we want to see within the schema
definition even if they aren't really going to be used in the end.
* Allow unification of revocations on other clusters
If a BYOC revocation occurred on cluster A, while the cert was initially
issued and stored on cluster B, we need to use the invalidation on the
unified entry to detect this: the revocation queues only work for
non-PoP, non-BYOC serial only revocations and thus this BYOC would be
immediately accepted on cluster A. By checking all other incoming
revocations for duplicates on a given cluster, we can ensure that
unified revocation is consistent across clusters.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use time-of-use locking for global revocation processing
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Unified revocation migration code
- Add a periodic function that will list the local revocations
and if any are missing from the unified revocation area will
force a write to the unified revocation folder/remote instance.
* PR Feedback
- Do not transfer expired certificates to unified space from local
- Move new periodic code into a periodic.go file
- Add a flag so we only run this stuff once if all is good, with
a force flag if we encounter errors or if unified_crl is toggled
on
* PR feedback take 2
* Add unified CRL config storage helpers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add support to build unified CRLs
This allows us to build unified versions of both the complete and delta
CRLs. This mostly involved creating a new variant of the
unified-specific CRL builder, fetching certs from each cluster's storage
space.
Unlike OCSP, here we do not unify the node's local storage with the
cross-cluster storage: this node is the active of the performance
primary, so writes to unified storage happen exactly the same as
writes to cluster-local storage, meaning the two are always in
sync. Other performance secondaries do not rebuild the CRL, and hence
the out-of-sync avoidance that we'd like to solve with the OCSP
responder is not necessary to solve here.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to fetch unified CRLs
This adds to the path-fetch APIs the ability to return the unified CRLs.
We update the If-Modified-Since infrastructure to support querying the
unified CRL specific data and fetchCertBySerial to support all unified
variants. This works for both the default/global fetch APIs and the
issuer-specific fetch APIs.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rebuild CRLs on unified status changes
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Handle rebuilding CRLs due to either changing
This allows detecting if the Delta CRL needs to be rebuilt because
either the local or the unified CRL needs to be rebuilt. We never
trigger rebuilding the unified delta on a non-primary cluster.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Ensure serials aren't added to unified CRL twice
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Write delta WAL entries for unified CRLs
When we'd ordinarily write delta WALs for local CRLs, we also need to
populate the cross-cluster delta WAL. This could cause revocation to
appear to fail if the two clusters are disconnected, but notably regular
cross-cluster revocation would also fail.
Notably, this commit also changes us to not write Delta WALs when Delta
CRLs is disabled (versus previously doing it when auto rebuild is
enabled in case Delta CRLs were later asked for), and instead,
triggering rebuilding a complete CRL so we don't need up-to-date Delta
WAL info.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update IMS test for forced CRL rebuilds
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Move comment about perf-primary only invalidation
Also remove noisy debug log.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove more noisy log statements during queue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Skip revocation entries from our current cluster
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add locking and comment about tidying revoke queue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Switch to time.Since for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor tidyStatuses into path_tidy.go
Leaving these in backend.go often causes us to miss adding useful values
to tidyStatus when we add a new config parameter.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Track the number of deleted revocation request
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidy to remove confirmed revocation requests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing field to tidy test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add global, cross-cluster revocation queue to PKI
This adds a global, cross-cluster replicated revocation queue, allowing
operators to revoke certificates by serial number across any cluster. We
don't support revoking with private key (PoP) in the initial
implementation.
In particular, building on the PBPWF work, we add a special storage
location for handling non-local revocations which gets replicated up to
the active, primary cluster node and back down to all secondary PR
clusters. These then check the pending revocation entry and revoke the
serial locally if it exists, writing a cross-cluster confirmation entry.
Listing capabilities are present under pki/certs/revocation-queue,
allowing operators to see which certs are present. However, a future
improvement to the tidy subsystem will allow automatic cleanup of stale
entries.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidying revocation queue entries
No manual operator control of revocation queue entries are allowed.
However, entries are stored with their request time, allowing tidy to,
after a suitable safety buffer, remove these unconfirmed and presumably
invalid requests.
Notably, when a cluster goes offline, it will be unable to process
cross-cluster revocations for certificates it holds. If tidy runs,
potentially valid revocations may be removed. However, it is up to the
administrator to ensure the tidy window is sufficiently long that any
required maintenance is done (or, prior to maintenance when an issue is
first noticed, tidy is temporarily disabled).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Only allow enabling global revocation queue on Vault Enterprise
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use a locking queue to handle revocation requests
This queue attempts to guarantee that PKI's invalidateFunc won't have
to wait long to execute: by locking only around access to the queue
proper, and internally using a list, we minimize the time spent locked,
waiting for queue accesses.
Previously, we held a lock during tidy and processing that would've
prevented us from processing invalidateFunc calls.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* use_global_queue->cross_cluster_revocation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Grab revocation storage lock when processing queue
We need to grab the storage lock as we'll actively be revoking new
certificates in the revocation queue. This ensures nobody else is
competing for storage access, across periodic funcs, new revocations,
and tidy operations.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix expected tidy status test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow probing RollbackManager directly in tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback on revocationQueue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add more cancel checks, fix starting manual tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor CRL building into separate functions
This will allow us to add the ability to add and build a unified CRL
across all clusters, reusing logic that is common to both, but letting
each have their own certificate lists.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename localCRLConfigEntry->internalCRLConfigEntry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename Delta WALs to Local Delta WALs
This adds clarity that we'll have a separate local and remote Delta CRL
and WALs for each.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidy to backup legacy CA bundles
With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move
the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak.
This does two things:
1. Removes ca_bundle from the hot-path of initialization after initial
migration has completed. Because this entry is seal wrapped, this
may result in performance improvements.
2. Allows recovery of this value in the event of some other failure
with migration.
Notably, this cannot occur during migration in the unlikely (and largely
unsupported) case that the operator immediately downgrades to Vault
<1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long,
tidy can always be run manually with a shorter buffer (and only this
flag) to manually move the bundle if necessary.
In the event of needing to recover or undo this operation, it is
sufficient to use sys/raw to read the backed up value and subsequently
write it to its old path (/config/ca_bundle).
The new entry remains seal wrapped, but otherwise isn't used within the
code and so has better performance characteristics.
Performing a fat deletion (DELETE /root) will again remove the backup
like the old legacy bundle, preserving its wipe characteristics.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation about new tidy parameter
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for migration scenarios
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Clean up time comparisons
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
A lot of places took a (context, backend, request) tuple, ignoring the
request proper and only using it for its storage. This (modified) tuple
is exactly the set of elements in the shared storage context, so we
should be using that instead of manually passing all three elements
around.
This simplifies a few places where we'd generate a storage context at
the request level and then split it apart only to recreate it again
later (e.g., CRL building).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow templating of cluster-local AIA URIs
This adds a new configuration path, /config/cluster, which retains
cluster-local configuration. By extending /config/urls and its issuer
counterpart to include an enable_templating parameter, we can allow
operators to correctly identify the particular cluster a cert was
issued on, and tie its AIA information to this (cluster, issuer) pair
dynamically.
Notably, this does not solve all usage issues around AIA URIs: the CRL
and OCSP responder remain local, meaning that some merge capability is
required prior to passing it to other systems if they use CRL files and
must validate requests with certs from any arbitrary PR cluster.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation about templated AIAs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* AIA URIs -> AIA URLs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* issuer.AIAURIs might be nil
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow non-nil response to config/urls
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Always validate URLs on config update
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Ensure URLs lack templating parameters
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Review feedback
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* New PKI API to generate and sign a CRL based on input data
- Add a new PKI API that allows an end-user to feed in all the
information required to generate and sign a CRL by a given issuer.
- This is pretty powerful API allowing an escape hatch for 3rd parties
to craft customized CRLs with extensions based on their individual
needs
* Add api-docs and error if reserved extension is provided as input
* Fix copy/paste error in Object Identifier constants
* Return nil on errors instead of partially filled slices
* Add cl
* Add new PKI api to combine and sign different CRLs from the same issuer
- Add a new PKI api /issuer/<issuer ref>/resign-crls that will allow
combining and signing different CRLs that were signed by the same
issuer.
- This allows external actors to combine CRLs into a single CRL across
different Vault clusters that share the CA certificate and key material
such as performance replica clusters and the primary cluster
* Update API docs
* PR Feedback - Delta CRL rename
* Update to latest version of main
* PR Feedback - Get rid of the new caEntry struct
* Address PR feedback in api-docs and PEM encoded response
* Add automatic tidy of expired issuers
To aid PKI users like Consul, which periodically rotate intermediates,
and provided a little more consistency with older versions of Vault
which would silently (and dangerously!) replace the configured CA on
root/intermediate generation, we introduce an automatic tidy of expired
issuers.
This includes a longer safety buffer (1 year) and logging of the
relevant issuer information prior to deletion (certificate contents, key
ID, and issuer ID/name) to allow admins to recover this value if
desired, or perform further cleanup of keys.
From my PoV, removal of the issuer is thus a relatively safe operation
compared to keys (which I do not feel comfortable removing) as they can
always be re-imported if desired. Additionally, this is an opt-in tidy
operation, not enabled by default. Lastly, most major performance
penalties comes with lots of issuers within the mount, not as much
large numbers of keys (as only new issuer creation/import operations are
affected, unlike LIST /issuers which is a public, unauthenticated
endpoint).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add docs on tidy of issuers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Restructure logging
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing fields to expected tidy output
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Also remove one duplicate error masked by return.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add new API to PKI to list revoked certificates
- A new API that will return the list of serial numbers of
revoked certificates on the local cluster.
* Add cl
* PR feedback
Basics of Cert-Count Telemetry, changelog, "best attempt" slice to capture (and test for) duplicates, Move sorting of possibleDoubleCountedRevokedSerials to after compare of entries. Add values to counter when still initializing.
Set lists to nil after use, Fix atomic2 import, Delay reporting metrics until after deduplication has completed,
The test works now, Move string slice to helper function; Add backendUUID to gauge name.
* Don't race for CRL rebuilding capability check
Core has recently seen some data races during SystemView/replication
updates between them and the PKI subsystem. This is because this
SystemView access occurs outside of a request (during invalidation
handling) and thus the proper lock isn't held.
Because replication status cannot change within the lifetime of a plugin
(and instead, if a node switches replication status, the entire plugin
instance will be torn down and recreated), it is safe to cache this
once, at plugin startup, and use it throughout its lifetime.
Thus, we replace this SystemView access with a stored boolean variable
computed ahead of time.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update builtin/logical/pki/backend.go
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add path to manually rebuild delta CRLs
The crl/rotate-delta path behaves like crl/rotate, triggering a
cluster-local rebuild of just the delta CRL. This is useful for when
delta CRLs are enabled with a longer-than-desired auto-rebuild period
after some high-profile revocations occur.
In the event delta CRLs are not enabled, this becomes a no-op.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for Delta CRL rebuilding
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update documentation about Delta CRLs
Also fixes a omission in the If-Modified-Since docs to mention that the
response header should probably also be passed through.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidy operations to be cancelled
When tidy operations take a long time to execute (and especially when
executing them automatically), having the ability to cancel them becomes
useful to reduce strain on Vault clusters (and let them be rescheduled
at a later time).
To this end, we add the /tidy-cancel write endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing auto-tidy synopsis / description
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add a pause duration between tidying certificates
By setting pause_duration, operators can have a little control over the
resource utilization of a tidy operation. While the list of certificates
remain in memory throughout the entire operation, a pause is added
between processing certificates and the revocation lock is released.
This allows other operations to occur during this gap and potentially
allows the tidy operation to consume less resources per unit of time
(due to the sleep -- though obviously consumes the same resources over
the time of the operation).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for cancellation, pause
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add API docs on pause_duration, /tidy-cancel
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add lock releasing around tidy pause
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Reset cancel guard, return errors
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Issuer renames should invalidate CRL cache times
When an issuer is renamed (or rather, two issuers' names are swapped in
quick succession), this is akin to the earlier identified default issuer
update condition. So, when any issuer is updated, go ahead and trigger
the invalidation logic.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix handling of delta CRL If-Modified-Since
The If-Modified-Since PR was proposed prior to the Delta CRL changes and
thus didn't take it into account. This follow-up commit fixes that,
addressing If-Modified-Since semantics for delta CRL fetching and
ensuring an accurate number is stored.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* honor header if-modified-since if present
* pathGetIssuerCRL first version
* check if modified since for CA endpoints
* fix date comparison for CA endpoints
* suggested changes and refactoring
* add writeIssuer to updateDefaultIssuerId and fix error
* Move methods out of storage.go into util.go
For the most part, these take a SC as param, but aren't directly storage
relevant operations. Move them out of storage.go as a result.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use UTC timezone for storage
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rework path_fetch for better if-modified-since handling
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Invalidate all issuers, CRLs on default write
When the default is updated, access under earlier timestamps will not
work as we're unclear if the timestamp is for this issuer or a previous
issuer. Thus, we need to invalidate the CRL and both issuers involved
(previous, next) by updating their LastModifiedTimes.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for If-Modified-Since
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Correctly invalidate default issuer changes
When the default issuer changes, we'll have to mark the invalidation on
PR secondary clusters, so they know to update their CRL mapping as well.
The swapped issuers will have an updated modification time (which will
eventually replicate down and thus be correct), but the CRL modification
time is cluster-local information and thus won't be replicated.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* make fmt
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor sendNotModifiedResponseIfNecessary
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on if-modified-since
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow generation of up-to-date delta CRLs
While switching to periodic rebuilds of CRLs alleviates the constant
rebuild pressure on Vault during times of high revocation, the CRL
proper becomes stale. One response to this is to switch to OCSP, but not
every system has support for this. Additionally, OCSP usually requires
connectivity and isn't used to augment a pre-distributed CRL (and is
instead used independently).
By generating delta CRLs containing only new revocations, an existing
CRL can be supplemented with newer revocations without requiring Vault
to rebuild all complete CRLs. Admins can periodically fetch the delta
CRL and add it to the existing CRL and applications should be able to
support using serials from both.
Because delta CRLs are emptied when the next complete CRL is rebuilt, it
is important that applications fetch the delta CRL and correlate it to
their complete CRL; if their complete CRL is older than the delta CRL's
extension number, applications MUST fetch the newer complete CRL to
ensure they have a correct combination.
This modifies the revocation process and adds several new configuration
options, controlling whether Delta CRLs are enabled and when we'll
rebuild it.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for delta CRLs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on delta CRLs
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback: fix several bugs
Thanks Steve!
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Correctly invoke periodic func on active nodes
We need to ensure we read the updated config (in case of OCSP request
handling on standby nodes), but otherwise want to avoid CRL/DeltaCRL
re-building.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor tidy steps into two separate helpers
This refactors the tidy go routine into two separate helpers, making it
clear where the boundaries of each are: variables are passed into these
method and concerns are separated. As more operations are rolled into
tidy, we can continue adding more helpers as appropriate. Additionally,
as we move to make auto-tidy occur, we can use these as points to hook
into periodic tidying.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor revInfo checking to helper
This allows us to validate whether or not a revInfo entry contains a
presently valid issuer, from the existing mapping. Coupled with the
changeset to identify the issuer on revocation, we can begin adding
capabilities to tidy to update this association, decreasing CRL build
time and increasing the performance of OCSP.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor issuer fetching for revocation purposes
Revocation needs to gracefully handle using the old legacy cert bundle,
so fetching issuers (and parsing them) needs to be done slightly
differently than other places. Refactor this from revokeCert into a
common helper that can be used by tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidy to associate revoked certs, issuers
When revoking a certificate, we need to associate the issuer that signed
its certificate back to the revInfo entry. Historically this was
performed during CRL building (and still remains so), but when running
without CRL building and with only OCSP, performance will degrade as the
issuer needs to be found each time.
Instead, allow the tidy operation to take over this role, allowing us to
increase the performance of OCSP and CRL in this scenario, by decoupling
issuer identification from CRL building in the ideal case.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for tidy updates
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on new tidy parameter, metrics
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor tidy config into shared struct
Finish adding metrics, status messages about new tidy operation.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Previously we used the global backend-set crlLifetime as a default
value. However, this was refactored into a new defaultCrlConfig instead,
which we should reply with when the CRL configuration has not been set
yet. In particular, the 72h default expiry (and new 12h auto-rebuild
grace period) was added and made explicit.
This fixes the broken UI test.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>