* Fix for PKI.TestStandby_Operations test to work in ENT
- Remove wait call to testhelpers.WaitForActiveNodeAndStandbys and
leverage testhelpers.WaitForStandbyNode instead.
* Use InmemBackendSetup for a proper HA backend in ENT
* Forward PKI revocation requests received by standby nodes to active node
- A refactoring that occurred in 1.13 timeframe removed what was
considered a specific check for standby nodes that wasn't required
as a writes should be returning ErrReadOnly.
- That sadly exposed a long standing bug where the errors from the
storage layer were not being properly wrapped, hiding the ErrReadOnly
coming from a write and failing the request.
* Add cl
* Add test for basic PKI operations against standby nodes
* Add support for importing RSA-PSS keys in Transit
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* plugin/auth: enable multiplexing
- the plugin will be multiplexed when run as an external plugin
by vault versions that support secrets/auth plugin multiplexing (> 1.12)
- we continue to set the TLSProviderFunc to maintain backwards
compatibility with vault versions that don't support AutoMTLS (< 1.12)
* enable multiplexing for secrets engines
* add changelog
* revert call to ServeMultiplex for pki and transit
* Revert "revert call to ServeMultiplex for pki and transit"
This reverts commit 755be28d14b4c4c4d884d3cf4d2ec003dda579b9.
* Telemetry Metrics Configuration.
* Err Shadowing Fix (woah, semgrep is cool).
* Fix TestBackend_RevokePlusTidy_Intermediate
* Add Changelog.
* Fix memory leak. Code cleanup as suggested by Steve.
* Turn off metrics by default, breaking-change.
* Show on tidy-status before start-up.
* Fix tests
* make fmt
* Add emit metrics to periodicFunc
* Test not delivering unavailable metrics + fix.
* Better error message.
* Fixing the false-error bug.
* make fmt.
* Try to fix race issue, remove confusing comments.
* Switch metric counter variables to an atomic.Uint32
- Switch the metric counter variables to an atomic variable type
so that we are forced to properly load/store values to it
* Fix race-issue better by trying until the metric is sunk.
* make fmt.
* empty commit to retrigger non-race tests that all pass locally
---------
Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
- This fix was incorrect as now the tests and program are double
URL encoding the OCSP GET requests, so the base64 + characters
when using Vault proper are becoming space characters.
* Use the unified CRL on legacy CRL paths if UnifiedCRLOnExistingPaths is set
- If the crl configuration option unified_crl_on_existing_paths is set
to true along with the unified_crl feature, provide the unified crl
on the existing CRL paths.
- Added some test helpers to help debugging, they are being used by
the ENT test that validates this feature.
* Rename method to shouldLocalPathsUseUnified
* Use UTC for leaf exceeding CA's notAfter
When generating a leaf which exceeds the CA's validity period, Vault's
error message was confusing as the leaf would use the server's time
zone, but the CA's notAfter date would use UTC. This could cause
user confusion as the leaf's expiry might look before the latter, due
to using different time zones. E.g.:
> cannot satisfy request, as TTL would result in notAfter
> 2023-03-06T16:41:09.757694-08:00 that is beyond the expiration of
> the CA certificate at 2023-03-07T00:29:52Z
Consistently use UTC for this instead.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Apply URL encoding/unencoding to OCSP Get requests
- Missed this during development and sadly the unit tests were written
at a level that did not expose this issue originally, there are
certain combinations of issuer cert + serial that lead to base64
data containing a '/' which will lead to the OCSP handler not getting
the full parameter.
- Do as the spec says, this should be treated as url-encoded data.
* Add cl
* Add higher level PKI OCSP GET/POST tests
* Rename PKI ocsp files to path_ocsp to follow naming conventions
* make fmt
* Add ability to clean up host keys for dynamic keys
This adds a new endpoint, tidy/dynamic-keys that removes any stale host
keys still present on the mount. This does not clean up any pending
dynamic key leases and will not remove these keys from systems with
authorized hosts entries created by Vault.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* PKI Unified CRL/OCSP apis should be ent only
- Do not enable any of the unified crl/ocsp related apis on OSS.
* Rollback refactoring of pathFetchCRLViaCertPath
- As pointed out in the PR, this method isn't actually being used at
the moment with the <serial> handler, pathFetchValid, matching
everything under the cert/XXXX path.
* Fix schema for ent/oss diff
- Define the OSS vs ENT urls we want to see within the schema
definition even if they aren't really going to be used in the end.
* Move some test helper stuff from the vault package to a new helper/testhelpers/corehelpers package. Consolidate on a single "noop audit" implementation.
* Remove dynamic keys from SSH Secrets Engine
This removes the functionality of Vault creating keys and adding them to
the authorized keys file on hosts.
This functionality has been deprecated since Vault version 0.7.2.
The preferred alternative is to use the SSH CA method, which also allows
key generation but places limits on TTL and doesn't require Vault reach
out to provision each key on the specified host, making it much more
secure.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove dynamic ssh references from documentation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove dynamic key secret type entirely
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Clarify changelog language
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add removal notice to the website
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address pki::TestAutoRebuild flakiness
- Wait for a CRL change before progressing to the next step after
we change configuration. Prior to this we would be racing against
the CRL reloading from the configuration change.
* Read total cert counts with atomic.LoadUint32(...)
When generating the tidy status, we read the values of two backend
atomics, b.certCount and b.revokedCertCount, without using the atomic
load operation. This resulted in a data race when the status was read
at the same time as an on-going tidy operation:
WARNING: DATA RACE
Write at 0x00c00c77680c by goroutine 90522:
sync/atomic.AddInt32()
/usr/local/go/src/runtime/race_amd64.s:281 +0xb
sync/atomic.AddUint32()
<autogenerated>:1 +0x1a
github.com/hashicorp/vault/builtin/logical/pki.(*backend).tidyStatusIncRevokedCertCount()
/home/circleci/go/src/github.com/hashicorp/vault/builtin/logical/pki/path_tidy.go:1236 +0x107
github.com/hashicorp/vault/builtin/logical/pki.(*backend).doTidyRevocationStore()
/home/circleci/go/src/github.com/hashicorp/vault/builtin/logical/pki/path_tidy.go:525 +0x1404
github.com/hashicorp/vault/builtin/logical/pki.(*backend).startTidyOperation.func1.1()
/home/circleci/go/src/github.com/hashicorp/vault/builtin/logical/pki/path_tidy.go:290 +0x1a4
github.com/hashicorp/vault/builtin/logical/pki.(*backend).startTidyOperation.func1()
/home/circleci/go/src/github.com/hashicorp/vault/builtin/logical/pki/path_tidy.go:342 +0x278
Previous read at 0x00c00c77680c by goroutine 90528:
reflect.Value.Uint()
/usr/local/go/src/reflect/value.go:2584 +0x195
encoding/json.uintEncoder()
/usr/local/go/src/encoding/json/encode.go:562 +0x45
encoding/json.ptrEncoder.encode()
/usr/local/go/src/encoding/json/encode.go:944 +0x3c2
encoding/json.ptrEncoder.encode-fm()
<autogenerated>:1 +0x90
encoding/json.(*encodeState).reflectValue()
/usr/local/go/src/encoding/json/encode.go:359 +0x88
encoding/json.interfaceEncoder()
/usr/local/go/src/encoding/json/encode.go:715 +0x17b
encoding/json.mapEncoder.encode()
/usr/local/go/src/encoding/json/encode.go:813 +0x854
... more stack trace pointing into JSON encoding and http
handler...
In particular, because the tidy status was directly reading the uint
value without resorting to the atomic side, the JSON serialization could
race with a later atomic update.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Also use atomic load in tests
Because no tidy operation is running here, it should be safe to read the
pointed value directly, but use the safer atomic.Load for consistency.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
- This has been done to help diagnose errors in the future so that
we get the callers in the trace's when we fail and not just the
helper's trace output.
* Allow unification of revocations on other clusters
If a BYOC revocation occurred on cluster A, while the cert was initially
issued and stored on cluster B, we need to use the invalidation on the
unified entry to detect this: the revocation queues only work for
non-PoP, non-BYOC serial only revocations and thus this BYOC would be
immediately accepted on cluster A. By checking all other incoming
revocations for duplicates on a given cluster, we can ensure that
unified revocation is consistent across clusters.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use time-of-use locking for global revocation processing
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Unified revocation migration code
- Add a periodic function that will list the local revocations
and if any are missing from the unified revocation area will
force a write to the unified revocation folder/remote instance.
* PR Feedback
- Do not transfer expired certificates to unified space from local
- Move new periodic code into a periodic.go file
- Add a flag so we only run this stuff once if all is good, with
a force flag if we encounter errors or if unified_crl is toggled
on
* PR feedback take 2
- Return a detailed reponse within the list api that an end-user can
use to determine what clusters revoked the certificate on from the
pki/certs/unified-revoked LIST api.
- Return colon delimited serial numbers from the certs/revocation-queue
LIST api
* Add new tidy operation for cross revoked certs
This operation allows tidying of the cross-cluster revocation storage.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix missing cancels, status values
Previous additions to tidy didn't have enough cancel operations and left
out some new values from the status operation.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Clarify error on due to unsupported EC key bits
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove documentation about unsupported EC/224
Resolves: #18843
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
- I missed this in the original review, that we were storing the
unified-crl in a cluster-local storage area so none of the other
hosts would receive it.
- Discovered while writing unit tests, the main cluster had the unified
crl but the other clusters would return an empty response
* The fields.
* UserID set, add to certificate
* Changelog.
* Fix test (set default).
* Add UserID constant to certutil, revert extension changes
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add user_ids as field for leaf signing
Presumably, this isn't necessary for CAs, given that CAs probably don't
have a user ID corresponding to them.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Support setting multiple user_ids in Subject
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow any User ID with sign-verbatim
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for User IDs in PKI
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add docs about user_ids, allowed_user_ids
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: Alexander Scheel <alex.scheel@hashicorp.com>
- It does not make sense to allow operators to enable the cross-cluster
revocation features on local mounts as they will never have a
corresponding mount on the other cluster.
* Add unified CRL config storage helpers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add support to build unified CRLs
This allows us to build unified versions of both the complete and delta
CRLs. This mostly involved creating a new variant of the
unified-specific CRL builder, fetching certs from each cluster's storage
space.
Unlike OCSP, here we do not unify the node's local storage with the
cross-cluster storage: this node is the active of the performance
primary, so writes to unified storage happen exactly the same as
writes to cluster-local storage, meaning the two are always in
sync. Other performance secondaries do not rebuild the CRL, and hence
the out-of-sync avoidance that we'd like to solve with the OCSP
responder is not necessary to solve here.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to fetch unified CRLs
This adds to the path-fetch APIs the ability to return the unified CRLs.
We update the If-Modified-Since infrastructure to support querying the
unified CRL specific data and fetchCertBySerial to support all unified
variants. This works for both the default/global fetch APIs and the
issuer-specific fetch APIs.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rebuild CRLs on unified status changes
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Handle rebuilding CRLs due to either changing
This allows detecting if the Delta CRL needs to be rebuilt because
either the local or the unified CRL needs to be rebuilt. We never
trigger rebuilding the unified delta on a non-primary cluster.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Ensure serials aren't added to unified CRL twice
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Write delta WAL entries for unified CRLs
When we'd ordinarily write delta WALs for local CRLs, we also need to
populate the cross-cluster delta WAL. This could cause revocation to
appear to fail if the two clusters are disconnected, but notably regular
cross-cluster revocation would also fail.
Notably, this commit also changes us to not write Delta WALs when Delta
CRLs is disabled (versus previously doing it when auto rebuild is
enabled in case Delta CRLs were later asked for), and instead,
triggering rebuilding a complete CRL so we don't need up-to-date Delta
WAL info.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update IMS test for forced CRL rebuilds
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Move comment about perf-primary only invalidation
Also remove noisy debug log.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove more noisy log statements during queue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Skip revocation entries from our current cluster
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add locking and comment about tidying revoke queue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Switch to time.Since for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor tidyStatuses into path_tidy.go
Leaving these in backend.go often causes us to miss adding useful values
to tidyStatus when we add a new config parameter.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Track the number of deleted revocation request
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidy to remove confirmed revocation requests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing field to tidy test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add global, cross-cluster revocation queue to PKI
This adds a global, cross-cluster replicated revocation queue, allowing
operators to revoke certificates by serial number across any cluster. We
don't support revoking with private key (PoP) in the initial
implementation.
In particular, building on the PBPWF work, we add a special storage
location for handling non-local revocations which gets replicated up to
the active, primary cluster node and back down to all secondary PR
clusters. These then check the pending revocation entry and revoke the
serial locally if it exists, writing a cross-cluster confirmation entry.
Listing capabilities are present under pki/certs/revocation-queue,
allowing operators to see which certs are present. However, a future
improvement to the tidy subsystem will allow automatic cleanup of stale
entries.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidying revocation queue entries
No manual operator control of revocation queue entries are allowed.
However, entries are stored with their request time, allowing tidy to,
after a suitable safety buffer, remove these unconfirmed and presumably
invalid requests.
Notably, when a cluster goes offline, it will be unable to process
cross-cluster revocations for certificates it holds. If tidy runs,
potentially valid revocations may be removed. However, it is up to the
administrator to ensure the tidy window is sufficiently long that any
required maintenance is done (or, prior to maintenance when an issue is
first noticed, tidy is temporarily disabled).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Only allow enabling global revocation queue on Vault Enterprise
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use a locking queue to handle revocation requests
This queue attempts to guarantee that PKI's invalidateFunc won't have
to wait long to execute: by locking only around access to the queue
proper, and internally using a list, we minimize the time spent locked,
waiting for queue accesses.
Previously, we held a lock during tidy and processing that would've
prevented us from processing invalidateFunc calls.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* use_global_queue->cross_cluster_revocation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Grab revocation storage lock when processing queue
We need to grab the storage lock as we'll actively be revoking new
certificates in the revocation queue. This ensures nobody else is
competing for storage access, across periodic funcs, new revocations,
and tidy operations.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix expected tidy status test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow probing RollbackManager directly in tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback on revocationQueue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add more cancel checks, fix starting manual tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor CRL building into separate functions
This will allow us to add the ability to add and build a unified CRL
across all clusters, reusing logic that is common to both, but letting
each have their own certificate lists.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename localCRLConfigEntry->internalCRLConfigEntry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename Delta WALs to Local Delta WALs
This adds clarity that we'll have a separate local and remote Delta CRL
and WALs for each.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename revokeCert variable to identify serial number formatting
* Refactor out lease specific behavior out of revokeCert
- Isolate the specific behavior regarding revoking lease specific
certificates outside of the revokeCert function and into the only
caller that leveraged used it.
- This allows us to simplify revokeCert a little bit and keeps the
function purely about revoking a certificate
* Within revokeCert short circuit the already revoked use-case
- Make the function a little easier to process by exiting early
if the certificate has already been revoked.
* Do not load certificates from storage multiple times during revocation
- Isolate the loading of a certificate and parsing of a certificate
into a single attempt, either when provided the certificate for BYOC
revocation or strictly from storage for the other revocation types.
* With BYOC write certificate entry using dashes not the legacy colon char