open-vault/builtin/logical/pki/path_tidy.go

1215 lines
44 KiB
Go
Raw Normal View History

package pki
import (
"context"
"crypto/x509"
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
"errors"
"fmt"
"net/http"
"sync/atomic"
"time"
"github.com/armon/go-metrics"
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
"github.com/hashicorp/go-hclog"
"github.com/hashicorp/vault/sdk/framework"
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
"github.com/hashicorp/vault/sdk/helper/consts"
"github.com/hashicorp/vault/sdk/logical"
)
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
var tidyCancelledError = errors.New("tidy operation cancelled")
type tidyStatusState int
const (
tidyStatusInactive tidyStatusState = iota
tidyStatusStarted = iota
tidyStatusFinished = iota
tidyStatusError = iota
tidyStatusCancelling = iota
tidyStatusCancelled = iota
)
type tidyStatus struct {
// Parameters used to initiate the operation
safetyBuffer int
issuerSafetyBuffer int
tidyCertStore bool
tidyRevokedCerts bool
tidyRevokedAssocs bool
tidyExpiredIssuers bool
tidyBackupBundle bool
pauseDuration string
// Status
state tidyStatusState
err error
timeStarted time.Time
timeFinished time.Time
message string
certStoreDeletedCount uint
revokedCertDeletedCount uint
missingIssuerCertCount uint
revQueueDeletedCount uint
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
type tidyConfig struct {
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
Enabled bool `json:"enabled"`
Interval time.Duration `json:"interval_duration"`
CertStore bool `json:"tidy_cert_store"`
RevokedCerts bool `json:"tidy_revoked_certs"`
IssuerAssocs bool `json:"tidy_revoked_cert_issuer_associations"`
ExpiredIssuers bool `json:"tidy_expired_issuers"`
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
BackupBundle bool `json:"tidy_move_legacy_ca_bundle"`
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
SafetyBuffer time.Duration `json:"safety_buffer"`
IssuerSafetyBuffer time.Duration `json:"issuer_safety_buffer"`
PauseDuration time.Duration `json:"pause_duration"`
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
RevocationQueue bool `json:"tidy_revocation_queue"`
QueueSafetyBuffer time.Duration `json:"revocation_queue_safety_buffer"`
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
var defaultTidyConfig = tidyConfig{
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
Enabled: false,
Interval: 12 * time.Hour,
CertStore: false,
RevokedCerts: false,
IssuerAssocs: false,
ExpiredIssuers: false,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
BackupBundle: false,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
SafetyBuffer: 72 * time.Hour,
IssuerSafetyBuffer: 365 * 24 * time.Hour,
PauseDuration: 0 * time.Second,
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
RevocationQueue: false,
QueueSafetyBuffer: 48 * time.Hour,
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
}
func pathTidy(b *backend) *framework.Path {
return &framework.Path{
Pattern: "tidy$",
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
Fields: addTidyFields(map[string]*framework.FieldSchema{}),
Operations: map[logical.Operation]framework.OperationHandler{
logical.UpdateOperation: &framework.PathOperation{
2022-01-27 18:06:34 +00:00
Callback: b.pathTidyWrite,
ForwardPerformanceStandby: true,
},
},
HelpSynopsis: pathTidyHelpSyn,
HelpDescription: pathTidyHelpDesc,
}
}
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
func pathTidyCancel(b *backend) *framework.Path {
return &framework.Path{
Pattern: "tidy-cancel$",
Operations: map[logical.Operation]framework.OperationHandler{
logical.UpdateOperation: &framework.PathOperation{
Callback: b.pathTidyCancelWrite,
ForwardPerformanceStandby: true,
},
},
HelpSynopsis: pathTidyCancelHelpSyn,
HelpDescription: pathTidyCancelHelpDesc,
}
}
func pathTidyStatus(b *backend) *framework.Path {
return &framework.Path{
Pattern: "tidy-status$",
Operations: map[logical.Operation]framework.OperationHandler{
logical.ReadOperation: &framework.PathOperation{
2022-01-27 18:06:34 +00:00
Callback: b.pathTidyStatusRead,
ForwardPerformanceStandby: true,
},
},
HelpSynopsis: pathTidyStatusHelpSyn,
HelpDescription: pathTidyStatusHelpDesc,
}
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
func pathConfigAutoTidy(b *backend) *framework.Path {
return &framework.Path{
Pattern: "config/auto-tidy",
Fields: addTidyFields(map[string]*framework.FieldSchema{
"enabled": {
Type: framework.TypeBool,
Description: `Set to true to enable automatic tidy operations.`,
},
"interval_duration": {
Type: framework.TypeDurationSecond,
Description: `Interval at which to run an auto-tidy operation. This is the time between tidy invocations (after one finishes to the start of the next). Running a manual tidy will reset this duration.`,
Default: int(defaultTidyConfig.Interval / time.Second), // TypeDurationSecond currently requires the default to be an int.
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
},
}),
Operations: map[logical.Operation]framework.OperationHandler{
logical.ReadOperation: &framework.PathOperation{
Callback: b.pathConfigAutoTidyRead,
},
logical.UpdateOperation: &framework.PathOperation{
Callback: b.pathConfigAutoTidyWrite,
// Read more about why these flags are set in backend.go.
ForwardPerformanceStandby: true,
ForwardPerformanceSecondary: true,
},
},
HelpSynopsis: pathConfigAutoTidySyn,
HelpDescription: pathConfigAutoTidyDesc,
}
}
func (b *backend) pathTidyWrite(ctx context.Context, req *logical.Request, d *framework.FieldData) (*logical.Response, error) {
2016-07-19 17:54:18 +00:00
safetyBuffer := d.Get("safety_buffer").(int)
tidyCertStore := d.Get("tidy_cert_store").(bool)
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
tidyRevokedCerts := d.Get("tidy_revoked_certs").(bool) || d.Get("tidy_revocation_list").(bool)
tidyRevokedAssocs := d.Get("tidy_revoked_cert_issuer_associations").(bool)
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
tidyExpiredIssuers := d.Get("tidy_expired_issuers").(bool)
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
tidyBackupBundle := d.Get("tidy_move_legacy_ca_bundle").(bool)
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
issuerSafetyBuffer := d.Get("issuer_safety_buffer").(int)
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
pauseDurationStr := d.Get("pause_duration").(string)
pauseDuration := 0 * time.Second
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
tidyRevocationQueue := d.Get("tidy_revocation_queue").(bool)
queueSafetyBuffer := d.Get("revocation_queue_safety_buffer").(int)
if safetyBuffer < 1 {
return logical.ErrorResponse("safety_buffer must be greater than zero"), nil
}
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
if issuerSafetyBuffer < 1 {
return logical.ErrorResponse("issuer_safety_buffer must be greater than zero"), nil
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
if queueSafetyBuffer < 1 {
return logical.ErrorResponse("revocation_queue_safety_buffer must be greater than zero"), nil
}
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
if pauseDurationStr != "" {
var err error
pauseDuration, err = time.ParseDuration(pauseDurationStr)
if err != nil {
return logical.ErrorResponse(fmt.Sprintf("Error parsing pause_duration: %v", err)), nil
}
if pauseDuration < (0 * time.Second) {
return logical.ErrorResponse("received invalid, negative pause_duration"), nil
}
}
2016-07-19 17:54:18 +00:00
bufferDuration := time.Duration(safetyBuffer) * time.Second
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
issuerBufferDuration := time.Duration(issuerSafetyBuffer) * time.Second
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
queueSafetyBufferDuration := time.Duration(queueSafetyBuffer) * time.Second
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
// Manual run with constructed configuration.
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
config := &tidyConfig{
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
Enabled: true,
Interval: 0 * time.Second,
CertStore: tidyCertStore,
RevokedCerts: tidyRevokedCerts,
IssuerAssocs: tidyRevokedAssocs,
ExpiredIssuers: tidyExpiredIssuers,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
BackupBundle: tidyBackupBundle,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
SafetyBuffer: bufferDuration,
IssuerSafetyBuffer: issuerBufferDuration,
PauseDuration: pauseDuration,
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
RevocationQueue: tidyRevocationQueue,
QueueSafetyBuffer: queueSafetyBufferDuration,
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
}
if !atomic.CompareAndSwapUint32(b.tidyCASGuard, 0, 1) {
resp := &logical.Response{}
resp.AddWarning("Tidy operation already in progress.")
return resp, nil
}
// Tests using framework will screw up the storage so make a locally
// scoped req to hold a reference
req = &logical.Request{
Storage: req.Storage,
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
// Mark the last tidy operation as relatively recent, to ensure we don't
// try to trigger the periodic function.
b.tidyStatusLock.Lock()
b.lastTidy = time.Now()
b.tidyStatusLock.Unlock()
// Kick off the actual tidy.
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
b.startTidyOperation(req, config)
resp := &logical.Response{}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
if !tidyCertStore && !tidyRevokedCerts && !tidyRevokedAssocs && !tidyExpiredIssuers && !tidyBackupBundle && !tidyRevocationQueue {
resp.AddWarning("No targets to tidy; specify tidy_cert_store=true or tidy_revoked_certs=true or tidy_revoked_cert_issuer_associations=true or tidy_expired_issuers=true or tidy_move_legacy_ca_bundle=true or tidy_revocation_queue=true to start a tidy operation.")
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
} else {
resp.AddWarning("Tidy operation successfully started. Any information from the operation will be printed to Vault's server logs.")
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
if tidyRevocationQueue {
isNotPerfPrimary := b.System().ReplicationState().HasState(consts.ReplicationDRSecondary|consts.ReplicationPerformanceStandby) ||
(!b.System().LocalMount() && b.System().ReplicationState().HasState(consts.ReplicationPerformanceSecondary))
if isNotPerfPrimary {
resp.AddWarning("tidy_revocation_queue=true can only be set on the active node of the primary cluster unless a local mount is used; this option has been ignored.")
}
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
return logical.RespondWithStatusCode(resp, req, http.StatusAccepted)
}
func (b *backend) startTidyOperation(req *logical.Request, config *tidyConfig) {
go func() {
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
atomic.StoreUint32(b.tidyCancelCAS, 0)
defer atomic.StoreUint32(b.tidyCASGuard, 0)
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
b.tidyStatusStart(config)
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
// Don't cancel when the original client request goes away.
ctx := context.Background()
logger := b.Logger().Named("tidy")
doTidy := func() error {
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
if config.CertStore {
if err := b.doTidyCertStore(ctx, req, logger, config); err != nil {
return err
}
}
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
if config.RevokedCerts || config.IssuerAssocs {
if err := b.doTidyRevocationStore(ctx, req, logger, config); err != nil {
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
return err
}
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
if config.ExpiredIssuers {
if err := b.doTidyExpiredIssuers(ctx, req, logger, config); err != nil {
return err
}
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
if config.BackupBundle {
if err := b.doTidyMoveCABundle(ctx, req, logger, config); err != nil {
return err
}
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
if config.RevocationQueue {
if err := b.doTidyRevocationQueue(ctx, req, logger, config); err != nil {
return err
}
}
return nil
}
if err := doTidy(); err != nil {
logger.Error("error running tidy", "error", err)
b.tidyStatusStop(err)
} else {
b.tidyStatusStop(nil)
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
// Since the tidy operation finished without an error, we don't
// really want to start another tidy right away (if the interval
// is too short). So mark the last tidy as now.
b.tidyStatusLock.Lock()
b.lastTidy = time.Now()
b.tidyStatusLock.Unlock()
}
}()
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
func (b *backend) doTidyCertStore(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
serials, err := req.Storage.List(ctx, "certs/")
if err != nil {
return fmt.Errorf("error fetching list of certs: %w", err)
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
serialCount := len(serials)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_total_entries"}, float32(serialCount))
for i, serial := range serials {
b.tidyStatusMessage(fmt.Sprintf("Tidying certificate store: checking entry %d of %d", i, serialCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_current_entry"}, float32(i))
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
// Check for pause duration to reduce resource consumption.
if config.PauseDuration > (0 * time.Second) {
time.Sleep(config.PauseDuration)
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
certEntry, err := req.Storage.Get(ctx, "certs/"+serial)
if err != nil {
return fmt.Errorf("error fetching certificate %q: %w", serial, err)
}
if certEntry == nil {
logger.Warn("certificate entry is nil; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting nil entry with serial %s: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
continue
}
if certEntry.Value == nil || len(certEntry.Value) == 0 {
logger.Warn("certificate entry has no value; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting entry with nil value with serial %s: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
continue
}
cert, err := x509.ParseCertificate(certEntry.Value)
if err != nil {
return fmt.Errorf("unable to parse stored certificate with serial %q: %w", serial, err)
}
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
if time.Since(cert.NotAfter) > config.SafetyBuffer {
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from storage: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
}
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
b.tidyStatusLock.RLock()
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_total_entries_remaining"}, float32(uint(serialCount)-b.tidyStatus.certStoreDeletedCount))
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
b.tidyStatusLock.RUnlock()
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
return nil
}
func (b *backend) doTidyRevocationStore(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
b.revokeStorageLock.Lock()
defer b.revokeStorageLock.Unlock()
// Fetch and parse our issuers so we can associate them if necessary.
sc := b.makeStorageContext(ctx, req.Storage)
issuerIDCertMap, err := fetchIssuerMapForRevocationChecking(sc)
if err != nil {
return err
}
rebuildCRL := false
revokedSerials, err := req.Storage.List(ctx, "revoked/")
if err != nil {
return fmt.Errorf("error fetching list of revoked certs: %w", err)
}
revokedSerialsCount := len(revokedSerials)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_total_entries"}, float32(revokedSerialsCount))
fixedIssuers := 0
var revInfo revocationInfo
for i, serial := range revokedSerials {
b.tidyStatusMessage(fmt.Sprintf("Tidying revoked certificates: checking certificate %d of %d", i, len(revokedSerials)))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_current_entry"}, float32(i))
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
// Check for cancel before continuing.
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 1, 0) {
return tidyCancelledError
}
// Check for pause duration to reduce resource consumption.
if config.PauseDuration > (0 * time.Second) {
b.revokeStorageLock.Unlock()
time.Sleep(config.PauseDuration)
b.revokeStorageLock.Lock()
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
revokedEntry, err := req.Storage.Get(ctx, "revoked/"+serial)
if err != nil {
return fmt.Errorf("unable to fetch revoked cert with serial %q: %w", serial, err)
}
if revokedEntry == nil {
logger.Warn("revoked entry is nil; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting nil revoked entry with serial %s: %w", serial, err)
}
b.tidyStatusIncRevokedCertCount()
continue
}
if revokedEntry.Value == nil || len(revokedEntry.Value) == 0 {
logger.Warn("revoked entry has nil value; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting revoked entry with nil value with serial %s: %w", serial, err)
}
b.tidyStatusIncRevokedCertCount()
continue
}
err = revokedEntry.DecodeJSON(&revInfo)
if err != nil {
return fmt.Errorf("error decoding revocation entry for serial %q: %w", serial, err)
}
revokedCert, err := x509.ParseCertificate(revInfo.CertificateBytes)
if err != nil {
return fmt.Errorf("unable to parse stored revoked certificate with serial %q: %w", serial, err)
}
// Tidy operations over revoked certs should execute prior to
// tidyRevokedCerts as that may remove the entry. If that happens,
// we won't persist the revInfo changes (as it was deleted instead).
var storeCert bool
if config.IssuerAssocs {
if !isRevInfoIssuerValid(&revInfo, issuerIDCertMap) {
b.tidyStatusIncMissingIssuerCertCount()
revInfo.CertificateIssuer = issuerID("")
storeCert = true
if associateRevokedCertWithIsssuer(&revInfo, revokedCert, issuerIDCertMap) {
fixedIssuers += 1
}
}
}
if config.RevokedCerts {
// Only remove the entries from revoked/ and certs/ if we're
// past its NotAfter value. This is because we use the
// information on revoked/ to build the CRL and the
// information on certs/ for lookup.
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
if time.Since(revokedCert.NotAfter) > config.SafetyBuffer {
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from revoked list: %w", serial, err)
}
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from store when tidying revoked: %w", serial, err)
}
rebuildCRL = true
storeCert = false
b.tidyStatusIncRevokedCertCount()
}
}
// If the entry wasn't removed but was otherwise modified,
// go ahead and write it back out.
if storeCert {
revokedEntry, err = logical.StorageEntryJSON("revoked/"+serial, revInfo)
if err != nil {
return fmt.Errorf("error building entry to persist changes to serial %v from revoked list: %w", serial, err)
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
}
err = req.Storage.Put(ctx, revokedEntry)
if err != nil {
return fmt.Errorf("error persisting changes to serial %v from revoked list: %w", serial, err)
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
}
}
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
b.tidyStatusLock.RLock()
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_total_entries_remaining"}, float32(uint(revokedSerialsCount)-b.tidyStatus.revokedCertDeletedCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_entries_incorrect_issuers"}, float32(b.tidyStatus.missingIssuerCertCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_entries_fixed_issuers"}, float32(fixedIssuers))
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
b.tidyStatusLock.RUnlock()
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
if rebuildCRL {
// Expired certificates isn't generally an important
// reason to trigger a CRL rebuild for. Check if
// automatic CRL rebuilds have been enabled and defer
// the rebuild if so.
config, err := sc.getRevocationConfig()
if err != nil {
return err
}
if !config.AutoRebuild {
if err := b.crlBuilder.rebuild(sc, false); err != nil {
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
return err
}
}
}
return nil
}
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
func (b *backend) doTidyExpiredIssuers(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
if b.System().ReplicationState().HasState(consts.ReplicationDRSecondary|consts.ReplicationPerformanceStandby) ||
(!b.System().LocalMount() && b.System().ReplicationState().HasState(consts.ReplicationPerformanceSecondary)) {
b.Logger().Debug("skipping expired issuer tidy as we're not on the primary or secondary with a local mount")
return nil
}
// Short-circuit to avoid having to deal with the legacy mounts. While we
// could handle this case and remove these issuers, its somewhat
// unexpected behavior and we'd prefer to finish the migration first.
if b.useLegacyBundleCaStorage() {
return nil
}
b.issuersLock.Lock()
defer b.issuersLock.Unlock()
// Fetch and parse our issuers so we have their expiration date.
sc := b.makeStorageContext(ctx, req.Storage)
issuerIDCertMap, err := fetchIssuerMapForRevocationChecking(sc)
if err != nil {
return err
}
// Fetch the issuer config to find the default; we don't want to remove
// the current active issuer automatically.
iConfig, err := sc.getIssuersConfig()
if err != nil {
return err
}
// We want certificates which have expired before this date by a given
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
// safety buffer.
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
rebuildChainsAndCRL := false
for issuer, cert := range issuerIDCertMap {
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
if time.Since(cert.NotAfter) <= config.IssuerSafetyBuffer {
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
continue
}
entry, err := sc.fetchIssuerById(issuer)
if err != nil {
return nil
}
// This issuer's certificate has expired. We explicitly persist the
// key, but log both the certificate and the keyId to the
// informational logs so an admin can recover the removed cert if
// necessary or remove the key (and know which cert it belonged to),
// if desired.
msg := "[Tidy on mount: %v] Issuer %v has expired by %v and is being removed."
idAndName := fmt.Sprintf("[id:%v/name:%v]", entry.ID, entry.Name)
msg = fmt.Sprintf(msg, b.backendUUID, idAndName, config.IssuerSafetyBuffer)
// Before we log, check if we're the default. While this is late, and
// after we read it from storage, we have more info here to tell the
// user that their default has expired AND has passed the safety
// buffer.
if iConfig.DefaultIssuerId == issuer {
msg = "[Tidy on mount: %v] Issuer %v has expired and would be removed via tidy, but won't be, as it is currently the default issuer."
msg = fmt.Sprintf(msg, b.backendUUID, idAndName)
b.Logger().Warn(msg)
continue
}
// Log the above message..
b.Logger().Info(msg, "serial_number", entry.SerialNumber, "key_id", entry.KeyID, "certificate", entry.Certificate)
wasDefault, err := sc.deleteIssuer(issuer)
if err != nil {
b.Logger().Error(fmt.Sprintf("failed to remove %v: %v", idAndName, err))
return err
}
if wasDefault {
b.Logger().Warn(fmt.Sprintf("expired issuer %v was default; it is strongly encouraged to choose a new default issuer for backwards compatibility", idAndName))
}
rebuildChainsAndCRL = true
}
if rebuildChainsAndCRL {
// When issuers are removed, there's a chance chains change as a
// result; remove them.
if err := sc.rebuildIssuersChains(nil); err != nil {
return err
}
// Removal of issuers is generally a good reason to rebuild the CRL,
// even if auto-rebuild is enabled.
b.revokeStorageLock.Lock()
defer b.revokeStorageLock.Unlock()
if err := b.crlBuilder.rebuild(sc, false); err != nil {
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
return err
}
}
return nil
}
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
func (b *backend) doTidyMoveCABundle(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
if b.System().ReplicationState().HasState(consts.ReplicationDRSecondary|consts.ReplicationPerformanceStandby) ||
(!b.System().LocalMount() && b.System().ReplicationState().HasState(consts.ReplicationPerformanceSecondary)) {
b.Logger().Debug("skipping moving the legacy CA bundle as we're not on the primary or secondary with a local mount")
return nil
}
// Short-circuit to avoid moving the legacy bundle from under a legacy
// mount.
if b.useLegacyBundleCaStorage() {
return nil
}
// If we've already run, exit.
_, bundle, err := getLegacyCertBundle(ctx, req.Storage)
if err != nil {
return fmt.Errorf("failed to fetch the legacy CA bundle: %w", err)
}
if bundle == nil {
b.Logger().Debug("No legacy CA bundle available; nothing to do.")
return nil
}
log, err := getLegacyBundleMigrationLog(ctx, req.Storage)
if err != nil {
return fmt.Errorf("failed to fetch the legacy bundle migration log: %w", err)
}
if log == nil {
return fmt.Errorf("refusing to tidy with an empty legacy migration log but present CA bundle: %w", err)
}
if time.Since(log.Created) <= config.IssuerSafetyBuffer {
b.Logger().Debug("Migration was created too recently to remove the legacy bundle; refusing to move legacy CA bundle to backup location.")
return nil
}
// Do the write before the delete.
entry, err := logical.StorageEntryJSON(legacyCertBundleBackupPath, bundle)
if err != nil {
return fmt.Errorf("failed to create new backup storage entry: %w", err)
}
err = req.Storage.Put(ctx, entry)
if err != nil {
return fmt.Errorf("failed to write new backup legacy CA bundle: %w", err)
}
err = req.Storage.Delete(ctx, legacyCertBundlePath)
if err != nil {
return fmt.Errorf("failed to remove old legacy CA bundle path: %w", err)
}
b.Logger().Info("legacy CA bundle successfully moved to backup location")
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
return nil
}
func (b *backend) doTidyRevocationQueue(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
if b.System().ReplicationState().HasState(consts.ReplicationDRSecondary|consts.ReplicationPerformanceStandby) ||
(!b.System().LocalMount() && b.System().ReplicationState().HasState(consts.ReplicationPerformanceSecondary)) {
b.Logger().Debug("skipping cross-cluster revocation queue tidy as we're not on the primary or secondary with a local mount")
return nil
}
sc := b.makeStorageContext(ctx, req.Storage)
clusters, err := sc.Storage.List(sc.Context, crossRevocationPrefix)
if err != nil {
return fmt.Errorf("failed to list cross-cluster revocation queue participating clusters: %w", err)
}
// Grab locks as we're potentially modifying revocation-related storage.
b.revokeStorageLock.Lock()
defer b.revokeStorageLock.Unlock()
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
for cIndex, cluster := range clusters {
if cluster[len(cluster)-1] == '/' {
cluster = cluster[0 : len(cluster)-1]
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
cPath := crossRevocationPrefix + cluster + "/"
serials, err := sc.Storage.List(sc.Context, cPath)
if err != nil {
return fmt.Errorf("failed to list cross-cluster revocation queue entries for cluster %v (%v): %w", cluster, cIndex, err)
}
for _, serial := range serials {
// Check for pause duration to reduce resource consumption.
if config.PauseDuration > (0 * time.Second) {
b.revokeStorageLock.Unlock()
time.Sleep(config.PauseDuration)
b.revokeStorageLock.Lock()
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
// Confirmation entries _should_ be handled by this cluster's
// processRevocationQueue(...) invocation; if not, when the plugin
// reloads, maybeGatherQueueForFirstProcess(...) will remove all
// stale confirmation requests. However, we don't want to force an
// operator to reload their in-use plugin, so allow tidy to also
// clean up confirmation values without reloading.
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
if serial[len(serial)-1] == '/' {
// Check if we have a confirmed entry.
confirmedPath := cPath + serial + "confirmed"
removalEntry, err := sc.Storage.Get(sc.Context, confirmedPath)
if err != nil {
return fmt.Errorf("error reading revocation confirmation (%v) during tidy: %w", confirmedPath, err)
}
if removalEntry == nil {
continue
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
// Remove potential revocation requests from all clusters.
for _, subCluster := range clusters {
if subCluster[len(subCluster)-1] == '/' {
subCluster = subCluster[0 : len(subCluster)-1]
}
reqPath := subCluster + "/" + serial[0:len(serial)-1]
if err := sc.Storage.Delete(sc.Context, reqPath); err != nil {
return fmt.Errorf("failed to remove confirmed revocation request on candidate cluster (%v): %w", reqPath, err)
}
}
// Then delete the confirmation.
if err := sc.Storage.Delete(sc.Context, confirmedPath); err != nil {
return fmt.Errorf("failed to remove confirmed revocation confirmation (%v): %w", confirmedPath, err)
}
// No need to handle a revocation request at this path: it can't
// still exist on this cluster after we deleted it above.
continue
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
}
ePath := cPath + serial
entry, err := sc.Storage.Get(sc.Context, ePath)
if err != nil {
return fmt.Errorf("error reading revocation request (%v) to tidy: %w", ePath, err)
}
if entry == nil || entry.Value == nil {
continue
}
var revRequest revocationRequest
if err := entry.DecodeJSON(&revRequest); err != nil {
return fmt.Errorf("error reading revocation request (%v) to tidy: %w", ePath, err)
}
if time.Since(revRequest.RequestedAt) > config.QueueSafetyBuffer {
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
continue
}
// Safe to remove this entry.
if err := sc.Storage.Delete(sc.Context, ePath); err != nil {
return fmt.Errorf("error deleting revocation request (%v): %w", ePath, err)
}
// Assumption: there should never be a need to remove this from
// the processing queue on this node. We're on the active primary,
// so our writes don't cause invalidations. This means we'd have
// to have slated it for deletion very quickly after it'd been
// sent (i.e., inside of the 1-minute boundary that periodicFunc
// executes at). While this is possible, because we grab the
// revocationStorageLock above, we can't execute interleaved
// with that periodicFunc, so the periodicFunc would've had to
// finished before we actually did this deletion (or it wouldn't
// have ignored this serial because our deletion would've
// happened prior to it reading the storage entry). Thus we should
// be safe to ignore the revocation queue removal here.
b.tidyStatusIncRevQueueCount()
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
}
}
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
return nil
}
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
func (b *backend) pathTidyCancelWrite(ctx context.Context, req *logical.Request, d *framework.FieldData) (*logical.Response, error) {
if atomic.LoadUint32(b.tidyCASGuard) == 0 {
resp := &logical.Response{}
resp.AddWarning("Tidy operation cannot be cancelled as none is currently running.")
return resp, nil
}
// Grab the status lock before writing the cancel atomic. This lets us
// update the status correctly as well, avoiding writing it if we're not
// presently running.
//
// Unlock needs to occur prior to calling read.
b.tidyStatusLock.Lock()
if b.tidyStatus.state == tidyStatusStarted || atomic.LoadUint32(b.tidyCASGuard) == 1 {
if atomic.CompareAndSwapUint32(b.tidyCancelCAS, 0, 1) {
b.tidyStatus.state = tidyStatusCancelling
}
}
b.tidyStatusLock.Unlock()
return b.pathTidyStatusRead(ctx, req, d)
}
Allow Multiple Issuers in PKI Secret Engine Mounts - PKI Pod (#15277) * Starter PKI CA Storage API (#14796) * Simple starting PKI storage api for CA rotation * Add key and issuer storage apis * Add listKeys and listIssuers storage implementations * Add simple keys and issuers configuration storage api methods * Handle resolving key, issuer references The API context will usually have a user-specified reference to the key. This is either the literal string "default" to select the default key, an identifier of the key, or a slug name for the key. Here, we wish to resolve this reference to an actual identifier that can be understood by storage. Also adds the missing Name field to keys. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add method to fetch an issuer's cert bundle This adds a method to construct a certutil.CertBundle from the specified issuer identifier, optionally loading its corresponding key for signing. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor certutil PrivateKey PEM handling This refactors the parsing of PrivateKeys from PEM blobs into shared methods (ParsePEMKey, ParseDERKey) that can be reused by the existing Bundle parsing logic (ParsePEMBundle) or independently in the new issuers/key-based PKI storage code. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add importKey, importCert to PKI storage importKey is generally preferable to the low-level writeKey for adding new entries. This takes only the contents of the private key (as a string -- so a PEM bundle or a managed key handle) and checks if it already exists in the storage. If it does, it returns the existing key instance. Otherwise, we create a new one. In the process, we detect any issuers using this key and link them back to the new key entry. The same holds for importCert over importKey, with the note that keys are not modified when importing certificates. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for importing issuers, keys This adds tests for importing keys and issuers into the new storage layout, ensuring that identifiers are correctly inferred and linked. Note that directly writing entries to storage (writeKey/writeissuer) will take KeyID links from the parent entry and should not be used for import; only existing entries should be updated with this info. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Implement PKI storage migration. - Hook into the backend::initialize function, calling the migration on a primary only. - Migrate an existing certificate bundle to the new issuers and key layout * Make fetchCAInfo aware of new storage layout This allows fetchCAInfo to fetch a specified issuer, via a reference parameter provided by the user. We pass that into the storage layer and have it return a cert bundle for us. Finally, we need to validate that it truly has the key desired. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Begin /issuers API endpoints This implements the fetch operations around issuers in the PKI Secrets Engine. We implement the following operations: - LIST /issuers - returns a list of known issuers' IDs and names. - GET /issuer/:ref - returns a JSON blob with information about this issuer. - POST /issuer/:ref - allows configuring information about issuers, presently just its name. - DELETE /issuer/:ref - allows deleting the specified issuer. - GET /issuer/:ref/{der,pem} - returns a raw API response with just the DER (or PEM) of the issuer's certificate. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add import to PKI Issuers API This adds the two core import code paths to the API: /issuers/import/cert and /issuers/import/bundle. The former differs from the latter in that the latter allows the import of keys. This allows operators to restrict importing of keys to privileged roles, while allowing more operators permission to import additional certificates (not used for signing, but instead for path/chain building). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add /issuer/:ref/sign-intermediate endpoint This endpoint allows existing issuers to be used to sign intermediate CA certificates. In the process, we've updated the existing /root/sign-intermediate endpoint to be equivalent to a call to /issuer/default/sign-intermediate. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add /issuer/:ref/sign-self-issued endpoint This endpoint allows existing issuers to be used to sign self-signed certificates. In the process, we've updated the existing /root/sign-self-issued endpoint to be equivalent to a call to /issuer/default/sign-self-issued. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add /issuer/:ref/sign-verbatim endpoint This endpoint allows existing issuers to be used to directly sign CSRs. In the process, we've updated the existing /sign-verbatim endpoint to be equivalent to a call to /issuer/:ref/sign-verbatim. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow configuration of default issuers Using the new updateDefaultIssuerId(...) from the storage migration PR allows for easy implementation of configuring the default issuer. We restrict callers from setting blank defaults and setting default to default. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix fetching default issuers After setting a default issuer, one should be able to use the old /ca, /ca_chain, and /cert/{ca,ca_chain} endpoints to fetch the default issuer (and its chain). Update the fetchCertBySerial helper to no longer support fetching the ca and prefer fetchCAInfo for that instead (as we've already updated that to support fetching the new issuer location). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add /issuer/:ref/{sign,issue}/:role This updates the /sign and /issue endpoints, allowing them to take the default issuer (if none is provided by a role) and adding issuer-specific versions of them. Note that at this point in time, the behavior isn't yet ideal (as /sign/:role allows adding the ref=... parameter to override the default issuer); a later change adding role-based issuer specification will fix this incorrect behavior. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add support root issuer generation * Add support for issuer generate intermediate end-point * Update issuer and key arguments to consistent values - Update all new API endpoints to use the new agreed upon argument names. - issuer_ref & key_ref to refer to existing - issuer_name & key_name for new definitions - Update returned values to always user issuer_id and key_id * Add utility methods to fetch common ref and name arguments - Add utility methods to fetch the issuer_name, issuer_ref, key_name and key_ref arguments from data fields. - Centralize the logic to clean up these inputs and apply various validations to all of them. * Rename common PKI backend handlers - Use the buildPath convention for the function name instead of common... * Move setting PKI defaults from writeCaBundle to proper import{keys,issuer} methods - PR feedback, move setting up the default configuration references within the import methods instead of within the writeCaBundle method. This should now cover all use cases of us setting up the defaults properly. * Introduce constants for issuer_ref, rename isKeyDefaultSet... * Fix legacy PKI sign-verbatim api path - Addresses some test failures due to an incorrect refactoring of a legacy api path /sign-verbatim within PKI * Use import code to handle intermediate, config/ca The existing bundle import code will satisfy the intermediate import; use it instead of the old ca_bundle import logic. Additionally, update /config/ca to use the new import code as well. While testing, a panic was discovered: > reflect.Value.SetMapIndex: value of type string is not assignable to type pki.keyId This was caused by returning a map with type issuerId->keyId; instead switch to returning string->string maps so the audit log can properly HMAC them. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clarify error message on missing defaults When the default issuer and key are missing (and haven't yet been specified), we should clarify that error message. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Update test semantics for new changes This makes two minor changes to the existing test suite: 1. Importing partial bundles should now succeed, where they'd previously error. 2. fetchCertBySerial no longer handles CA certificates. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add support for deleting all keys, issuers The old DELETE /root code must now delete all keys and issuers for backwards compatibility. We strongly suggest calling individual delete methods (DELETE /key/:key_ref or DELETE /issuer/:issuer_ref) instead, for finer control. In the process, we detect whether the deleted key/issuers was set as the default. This will allow us to warn (from the single key/deletion issuer code) whether or not the default was deleted (while allowing the operation to succeed). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Introduce defaultRef constant within PKI - Replace hardcoded "default" references with a constant to easily identify various usages. - Use the addIssuerRefField function instead of redefining the field in various locations. * Rework PKI test TestBackend_Root_Idempotency - Validate that generate/root calls are no longer idempotent, but the bundle importing does not generate new keys/issuers - As before make sure that the delete root api resets everything - Address a bug within the storage that we bombed when we had multiple different key types within storage. * Assign Name=current to migrated key and issuer - Detail I missed from the RFC was to assign the Name field as "current" for migrated key and issuer. * Build CRL upon PKI intermediary set-signed api called - Add a call to buildCRL if we created an issuer within pathImportIssuers - Augment existing FullCAChain to verify we have a proper CRL post set-signed api call - Remove a code block writing out "ca" storage entry that is no longer used. * Identify which certificate or key failed When importing complex chains, we should identify in which certificate or key the failure occurred. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * PKI migration writes out empty migration log entry - Since the elements of the struct were not exported we serialized an empty migration log to disk and would re-run the migration * Add chain-building logic to PKI issuers path With the one-entry-per-issuer approach, CA Chains become implicitly constructed from the pool of issuers. This roughly matches the existing expectations from /config/ca (wherein a chain could be provided) and /intemediate/set-signed (where a chain may be provided). However, in both of those cases, we simply accepted a chain. Here, we need to be able to reconstruct the chain from parts on disk. However, with potential rotation of roots, we need to be aware of disparate chains. Simply concating together all issuers isn't sufficient. Thus we need to be able to parse a certificate's Issuer and Subject field and reconstruct valid (and potentially parallel) parent<->child mappings. This attempts to handle roots, intermediates, cross-signed intermediates, cross-signed roots, and rotated keys (wherein one might not have a valid signature due to changed key material with the same subject). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Return CA Chain when fetching issuers This returns the CA Chain attribute of an issuer, showing its computed chain based on other issuers in the database, when fetching a specific issuer. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add testing for chain building Using the issuance infrastructure, we generate new certificates (either roots or intermediates), positing that this is roughly equivalent to importing an external bundle (minus error handling during partial imports). This allows us to incrementally construct complex chains, creating reissuance cliques and cross-signing cycles. By using ECDSA certificates, we avoid high signature verification and key generation times. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow manual construction of issuer chain Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix handling of duplicate names With the new issuer field (manual_chain), we can no longer err when a name already exists: we might be updating the existing issuer (with the same name), but changing its manual_chain field. Detect this error and correctly handle it. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for manual chain building We break the clique, instead building these chains manually, ensuring that the remaining chains do not change and only the modified certs change. We then reset them (back to implicit chain building) and ensure we get the same results as earlier. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add stricter verification of issuers PEM format This ensures each issuer is only a single certificate entry (as validated by count and parsing) without any trailing data. We further ensure that each certificate PEM has leading and trailing spaces removed with only a single trailing new line remaining. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix full chain building Don't set the legacy IssuingCA field on the certificate bundle, as we prefer the CAChain field over it. Additionally, building the full chain could result in duplicate certificates when the CAChain included the leaf certificate itself. When building the full chain, ensure we don't include the bundle's certificate twice. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add stricter tests for full chain construction We wish to ensure that each desired certificate in the chain is only present once. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Rename PKI types to avoid constant variable name collisions keyId -> keyID issuerId -> issuerID key -> keyEntry issuer -> issuerEntry keyConfig -> keyConfigEntry issuerConfig -> issuerConfigEntry * Update CRL handling for multiple issuers When building CRLs, we've gotta make sure certs issued by that issuer land up on that issuer's CRL and not some other CRL. If no CRL is found (matching a cert), we'll place it on the default CRL. However, in the event of equivalent issuers (those with the same subject AND the same key material) -- perhaps due to reissuance -- we'll only create a single (unified) CRL for them. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow fetching updated CRL locations This updates fetchCertBySerial to support querying the default issuer's CRL. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Remove legacy CRL storage location test case Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Update to CRLv2 Format to copy RawIssuer When using the older Certificate.CreateCRL(...) call, Go's x509 library copies the parsed pkix.Name version of the CRL Issuer's Subject field. For certain constructed CAs, this fails since pkix.Name is not suitable for round-tripping. This also builds a CRLv1 (per RFC 5280) CRL. In updating to the newer x509.CreateRevocationList(...) call, we can construct the CRL in the CRLv2 format and correctly copy the issuer's name. However, this requires holding an additional field per-CRL, the CRLNumber field, which is required in Go's implementation of CRLv2 (though OPTIONAL in the spec). We store this on the new LocalCRLConfigEntry object, per-CRL. Co-authored-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add comment regarding CRL non-assignment in GOTO In previous versions of Vault, it was possible to sign an empty CRL (when the CRL was disabled and a force-rebuild was requested). Add a comment about this case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow fetching the specified issuer's CRL We add a new API endpoint to fetch the specified issuer's CRL directly (rather than the default issuer's CRL at /crl and /certs/crl). We also add a new test to validate the CRL in a multi-root scenario and ensure it is signed with the correct keys. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add new PKI key prefix to seal wrapped storage (#15126) * Refactor common backend initialization within backend_test - Leverage an existing helper method within the PKI backend tests to setup a PKI backend with storage. * Add ability to read legacy cert bundle if the migration has not occurred on secondaries. - Track the migration state forbidding an issuer/key writing api call if we have not migrated - For operations that just need to read the CA bundle, use the same tracking variable to switch between reading the legacy bundle or use the new key/issuer storage. - Add an invalidation function that will listen for updates to our log path to refresh the state on secondary clusters. * Always write migration entry to trigger secondary clusters to wake up - Some PR feedback and handle a case in which the primary cluster does not have a CA bundle within storage but somehow a secondary does. * Update CA Chain to report entire chain This merges the ca_chain JSON field (of the /certs/ca_chain path) with the regular certificate field, returning the root of trust always. This also affects the non-JSON (raw) endpoints as well. We return the default issuer's chain here, rather than all known issuers (as that may not form a strict chain). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow explicit issuer override on roles When a role is used to generate a certificate (such as with the sign/ and issue/ legacy paths or the legacy sign-verbatim/ paths), we prefer that issuer to the one on the request. This allows operators to set an issuer (other than default) for requests to be issued against, effectively making the change no different from the users' perspective as it is "just" a different role name. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for role-based issuer selection Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Expand NotAfter limit enforcement behavior Vault previously strictly enforced NotAfter/ttl values on certificate requests, erring if the requested TTL extended past the NotAfter date of the issuer. In the event of issuing an intermediate, this behavior was ignored, instead permitting the issuance. Users generally do not think to check their issuer's NotAfter date when requesting a certificate; thus this behavior was generally surprising. Per RFC 5280 however, issuers need to maintain status information throughout the life cycle of the issued cert. If this leaf cert were to be issued for a longer duration than the parent issuer, the CA must still maintain revocation information past its expiration. Thus, we add an option to the issuer to change the desired behavior: - err, to err out, - permit, to permit the longer NotAfter date, or - truncate, to silently truncate the expiration to the issuer's NotAfter date. Since expiration of certificates in the system's trust store are not generally validated (when validating an arbitrary leaf, e.g., during TLS validation), permit should generally only be used in that case. However, browsers usually validate intermediate's validity periods, and thus truncate should likely be used (as with permit, the leaf's chain will not validate towards the end of the issuance period). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for expanded issuance behaviors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add warning on keyless default issuer (#15178) Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Update PKI to new Operations framework (#15180) The backend Framework has updated Callbacks (used extensively in PKI) to become deprecated; Operations takes their place and clarifies forwarding of requests. We switch to the new format everywhere, updating some bad assumptions about forwarding along the way. Anywhere writes are handled (that should be propagated to all nodes in all clusters), we choose to forward the request all the way up to the performance primary cluster's primary node. This holds for issuers/keys, roles, and configs (such as CRL config, which is globally set for all clusters despite all clusters having their own separate CRL). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Kitography/vault 5474 rebase (#15150) * These parts work (put in signature so that backend wouldn't break, but missing fields, desc, etc.) * Import and Generate API calls w/ needed additions to SDK. * make fmt * Add Help/Sync Text, fix some of internal/exported/kms code. * Fix PEM/DER Encoding issue. * make fmt * Standardize keyIdParam, keyNameParam, keyTypeParam * Add error response if key to be deleted is in use. * replaces all instances of "default" in code with defaultRef * Updates from Callbacks to Operations Function with explicit forwarding. * Fixes a panic with names not being updated everywhere. * add a logged error in addition to warning on deleting default key. * Normalize whitespace upon importing keys. Authored-by: Alexander Scheel <alexander.m.scheel@gmail.com> * Fix isKeyInUse functionality. * Fixes tests associated with newline at end of key pem. * Add alternative proposal PKI aliased paths (#15211) * Add aliased path for root/rotate/:exported This adds a user-friendly path name for generating a rotated root. We automatically choose the name "next" for the newly generated root at this path if it doesn't already exist. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add aliased path for intermediate/cross-sign This allows cross-signatures to work. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add path for replacing the current root This updates default to point to the value of the issuer with name "next" rather than its current value. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Remove plural issuers/ in signing paths These paths use a single issuer and thus shouldn't include the plural issuers/ as a path prefix, instead using the singular issuer/ path prefix. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only warn if default issuer was imported When the default issuer was not (re-)imported, we'd fail to find it, causing an extraneous warning about missing keys, even though this issuer indeed had a key. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing issuer sign/issue paths Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up various warnings within the PKI package (#15230) * Rebuild CRLs on secondary performance clusters post migration and on new/updated issuers - Hook into the backend invalidation function so that secondaries are notified of new/updated issuer or migrations occuring on the primary cluster. Upon notification schedule a CRL rebuild to take place upon the next process to read/update the CRL or within the periodic function if no request comes in. * Schedule rebuilding PKI CRLs on active nodes only - Address an issue that we were scheduling the rebuilding of a CRL on standby nodes, which would not be able to write to storage. - Fix an issue with standby nodes not correctly determining that a migration previously occurred. * Return legacy CRL storage path when no migration has occurred. * Handle issuer, keys locking (#15227) * Handle locking of issuers during writes We need a write lock around writes to ensure serialization of modifications. We use a single lock for both issuer and key updates, in part because certain operations (like deletion) will potentially affect both. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing b.useLegacyBundleCaStorage guards Several locations needed to guard against early usage of the new issuers endpoint pre-migration. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address PKI to properly support managed keys (#15256) * Address codebase for managed key fixes * Add proper public key comparison for better managed key support to importKeys * Remove redundant public key fetching within PKI importKeys * Correctly handle rebuilding remaining chains When deleting a specific issuer, we might impact the chains. From a consistency perspective, we need to ensure the remaining chains are correct and don't refer to the since-deleted issuer, so trigger a full rebuild here. We don't need to call this in the delete-the-world (DELETE /root) code path, as there shouldn't be any remaining issuers or chains to build. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Remove legacy CRL bundle on world deletion When calling DELETE /root, we should remove the legacy CRL bundle, since we're deleting the legacy CA issuer bundle as well. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Remove deleted issuers' CRL entries Since CRLs are no longer resolvable after deletion (due to missing issuer ID, which will cause resolution to fail regardless of if an ID or a name/default reference was used), we should delete these CRLs from storage to avoid leaking them. In the event that this issuer comes back (with key material), we can simply rebuild the CRL at that time (from the remaining revoked storage entries). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add unauthed JSON fetching of CRLs, Issuers (#15253) Default to fetching JSON CRL for consistency This makes the bare issuer-specific CRL fetching endpoint return the JSON-wrapped CRL by default, moving the DER CRL to a specific endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Add JSON-specific endpoint for fetching issuers Unlike the unqualified /issuer/:ref endpoint (which also returns JSON), we have a separate /issuer/:ref/json endpoint to return _only_ the PEM-encoded certificate and the chain, mirroring the existing /cert/ca endpoint but for a specific issuer. This allows us to make the endpoint unauthenticated, whereas the bare endpoint would remain authenticated and usually privileged. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Add tests for raw JSON endpoints Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add unauthenticated issuers endpoints to PKI table This adds the unauthenticated issuers endpoints? - LIST /issuers, - Fetching _just_ the issuer certificates (in JSON/DER/PEM form), and - Fetching the CRL of this issuer (in JSON/DER/PEM form). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add issuer usage restrictions bitset This allows issuers to have usage restrictions, limiting whether they can be used to issue certificates or if they can generate CRLs. This allows certain issuers to not generate a CRL (if the global config is with the CRL enabled) or allows the issuer to not issue new certificates (but potentially letting the CRL generation continue). Setting both fields to false effectively forms a soft delete capability. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * PKI Pod rotation Add Base Changelog (#15283) * PKI Pod rotation changelog. * Use feature release-note formatting of changelog. Co-authored-by: Steven Clark <steven.clark@hashicorp.com> Co-authored-by: Kit Haines <kit.haines@hashicorp.com> Co-authored-by: kitography <khaines@mit.edu>
2022-05-11 16:42:28 +00:00
func (b *backend) pathTidyStatusRead(_ context.Context, _ *logical.Request, _ *framework.FieldData) (*logical.Response, error) {
b.tidyStatusLock.RLock()
defer b.tidyStatusLock.RUnlock()
resp := &logical.Response{
Data: map[string]interface{}{
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
"safety_buffer": nil,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
"issuer_safety_buffer": nil,
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
"tidy_cert_store": nil,
"tidy_revoked_certs": nil,
"tidy_revoked_cert_issuer_associations": nil,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
"tidy_expired_issuers": nil,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
"tidy_move_legacy_ca_bundle": nil,
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
"pause_duration": nil,
"state": "Inactive",
"error": nil,
"time_started": nil,
"time_finished": nil,
"message": nil,
"cert_store_deleted_count": nil,
"revoked_cert_deleted_count": nil,
"missing_issuer_cert_count": nil,
"current_cert_store_count": nil,
"current_revoked_cert_count": nil,
"revocation_queue_deleted_count": nil,
},
}
if b.tidyStatus.state == tidyStatusInactive {
return resp, nil
}
resp.Data["safety_buffer"] = b.tidyStatus.safetyBuffer
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
resp.Data["issuer_safety_buffer"] = b.tidyStatus.issuerSafetyBuffer
resp.Data["tidy_cert_store"] = b.tidyStatus.tidyCertStore
resp.Data["tidy_revoked_certs"] = b.tidyStatus.tidyRevokedCerts
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
resp.Data["tidy_revoked_cert_issuer_associations"] = b.tidyStatus.tidyRevokedAssocs
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
resp.Data["tidy_expired_issuers"] = b.tidyStatus.tidyExpiredIssuers
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
resp.Data["tidy_move_legacy_ca_bundle"] = b.tidyStatus.tidyBackupBundle
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
resp.Data["pause_duration"] = b.tidyStatus.pauseDuration
resp.Data["time_started"] = b.tidyStatus.timeStarted
resp.Data["message"] = b.tidyStatus.message
resp.Data["cert_store_deleted_count"] = b.tidyStatus.certStoreDeletedCount
resp.Data["revoked_cert_deleted_count"] = b.tidyStatus.revokedCertDeletedCount
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
resp.Data["missing_issuer_cert_count"] = b.tidyStatus.missingIssuerCertCount
resp.Data["revocation_queue_deleted_count"] = b.tidyStatus.revQueueDeletedCount
2022-01-27 18:06:34 +00:00
switch b.tidyStatus.state {
case tidyStatusStarted:
resp.Data["state"] = "Running"
case tidyStatusFinished:
resp.Data["state"] = "Finished"
resp.Data["time_finished"] = b.tidyStatus.timeFinished
resp.Data["message"] = nil
case tidyStatusError:
resp.Data["state"] = "Error"
resp.Data["time_finished"] = b.tidyStatus.timeFinished
resp.Data["error"] = b.tidyStatus.err.Error()
// Don't clear the message so that it serves as a hint about when
// the error occurred.
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
case tidyStatusCancelling:
resp.Data["state"] = "Cancelling"
case tidyStatusCancelled:
resp.Data["state"] = "Cancelled"
resp.Data["time_finished"] = b.tidyStatus.timeFinished
}
resp.Data["current_cert_store_count"] = b.certCount
resp.Data["current_revoked_cert_count"] = b.revokedCertCount
if !b.certsCounted.Load() {
resp.AddWarning("Certificates in storage are still being counted, current counts provided may be " +
"inaccurate")
}
return resp, nil
}
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
func (b *backend) pathConfigAutoTidyRead(ctx context.Context, req *logical.Request, data *framework.FieldData) (*logical.Response, error) {
sc := b.makeStorageContext(ctx, req.Storage)
config, err := sc.getAutoTidyConfig()
if err != nil {
return nil, err
}
return &logical.Response{
Data: map[string]interface{}{
"enabled": config.Enabled,
"interval_duration": int(config.Interval / time.Second),
"tidy_cert_store": config.CertStore,
"tidy_revoked_certs": config.RevokedCerts,
"tidy_revoked_cert_issuer_associations": config.IssuerAssocs,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
"tidy_expired_issuers": config.ExpiredIssuers,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
"tidy_move_legacy_ca_bundle": config.BackupBundle,
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
"safety_buffer": int(config.SafetyBuffer / time.Second),
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
"issuer_safety_buffer": int(config.IssuerSafetyBuffer / time.Second),
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
"pause_duration": config.PauseDuration.String(),
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
"tidy_revocation_queue": config.RevocationQueue,
"revocation_queue_safety_buffer": int(config.QueueSafetyBuffer / time.Second),
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
},
}, nil
}
func (b *backend) pathConfigAutoTidyWrite(ctx context.Context, req *logical.Request, d *framework.FieldData) (*logical.Response, error) {
sc := b.makeStorageContext(ctx, req.Storage)
config, err := sc.getAutoTidyConfig()
if err != nil {
return nil, err
}
if enabledRaw, ok := d.GetOk("enabled"); ok {
config.Enabled = enabledRaw.(bool)
}
if intervalRaw, ok := d.GetOk("interval_duration"); ok {
config.Interval = time.Duration(intervalRaw.(int)) * time.Second
if config.Interval < 0 {
return logical.ErrorResponse(fmt.Sprintf("given interval_duration must be greater than or equal to zero seconds; got: %v", intervalRaw)), nil
}
}
if certStoreRaw, ok := d.GetOk("tidy_cert_store"); ok {
config.CertStore = certStoreRaw.(bool)
}
if revokedCertsRaw, ok := d.GetOk("tidy_revoked_certs"); ok {
config.RevokedCerts = revokedCertsRaw.(bool)
}
if issuerAssocRaw, ok := d.GetOk("tidy_revoked_cert_issuer_associations"); ok {
config.IssuerAssocs = issuerAssocRaw.(bool)
}
if safetyBufferRaw, ok := d.GetOk("safety_buffer"); ok {
config.SafetyBuffer = time.Duration(safetyBufferRaw.(int)) * time.Second
if config.SafetyBuffer < 1*time.Second {
return logical.ErrorResponse(fmt.Sprintf("given safety_buffer must be at least one second; got: %v", safetyBufferRaw)), nil
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
}
}
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
if pauseDurationRaw, ok := d.GetOk("pause_duration"); ok {
config.PauseDuration, err = time.ParseDuration(pauseDurationRaw.(string))
if err != nil {
return logical.ErrorResponse(fmt.Sprintf("unable to parse given pause_duration: %v", err)), nil
}
if config.PauseDuration < (0 * time.Second) {
return logical.ErrorResponse("received invalid, negative pause_duration"), nil
}
}
if expiredIssuers, ok := d.GetOk("tidy_expired_issuers"); ok {
config.ExpiredIssuers = expiredIssuers.(bool)
}
if issuerSafetyBufferRaw, ok := d.GetOk("issuer_safety_buffer"); ok {
config.IssuerSafetyBuffer = time.Duration(issuerSafetyBufferRaw.(int)) * time.Second
if config.IssuerSafetyBuffer < 1*time.Second {
return logical.ErrorResponse(fmt.Sprintf("given safety_buffer must be at least one second; got: %v", issuerSafetyBufferRaw)), nil
}
}
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
if backupBundle, ok := d.GetOk("tidy_move_legacy_ca_bundle"); ok {
config.BackupBundle = backupBundle.(bool)
}
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
if revocationQueueRaw, ok := d.GetOk("tidy_revocation_queue"); ok {
config.RevocationQueue = revocationQueueRaw.(bool)
}
if queueSafetyBufferRaw, ok := d.GetOk("revocation_queue_safety_buffer"); ok {
config.QueueSafetyBuffer = time.Duration(queueSafetyBufferRaw.(int)) * time.Second
if config.QueueSafetyBuffer < 1*time.Second {
return logical.ErrorResponse(fmt.Sprintf("given revocation_queue_safety_buffer must be at least one second; got: %v", queueSafetyBufferRaw)), nil
}
}
if config.Enabled && !(config.CertStore || config.RevokedCerts || config.IssuerAssocs || config.ExpiredIssuers || config.BackupBundle || config.RevocationQueue) {
return logical.ErrorResponse("Auto-tidy enabled but no tidy operations were requested. Enable at least one tidy operation to be run (tidy_cert_store / tidy_revoked_certs / tidy_revoked_cert_issuer_associations)."), nil
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
}
if err := sc.writeAutoTidyConfig(config); err != nil {
return nil, err
}
return &logical.Response{
Data: map[string]interface{}{
"enabled": config.Enabled,
"interval_duration": int(config.Interval / time.Second),
"tidy_cert_store": config.CertStore,
"tidy_revoked_certs": config.RevokedCerts,
"tidy_revoked_cert_issuer_associations": config.IssuerAssocs,
"tidy_expired_issuers": config.ExpiredIssuers,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
"tidy_move_legacy_ca_bundle": config.BackupBundle,
"safety_buffer": int(config.SafetyBuffer / time.Second),
"issuer_safety_buffer": int(config.IssuerSafetyBuffer / time.Second),
"pause_duration": config.PauseDuration.String(),
Add cross-cluster revocation queues for PKI (#18784) * Add global, cross-cluster revocation queue to PKI This adds a global, cross-cluster replicated revocation queue, allowing operators to revoke certificates by serial number across any cluster. We don't support revoking with private key (PoP) in the initial implementation. In particular, building on the PBPWF work, we add a special storage location for handling non-local revocations which gets replicated up to the active, primary cluster node and back down to all secondary PR clusters. These then check the pending revocation entry and revoke the serial locally if it exists, writing a cross-cluster confirmation entry. Listing capabilities are present under pki/certs/revocation-queue, allowing operators to see which certs are present. However, a future improvement to the tidy subsystem will allow automatic cleanup of stale entries. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidying revocation queue entries No manual operator control of revocation queue entries are allowed. However, entries are stored with their request time, allowing tidy to, after a suitable safety buffer, remove these unconfirmed and presumably invalid requests. Notably, when a cluster goes offline, it will be unable to process cross-cluster revocations for certificates it holds. If tidy runs, potentially valid revocations may be removed. However, it is up to the administrator to ensure the tidy window is sufficiently long that any required maintenance is done (or, prior to maintenance when an issue is first noticed, tidy is temporarily disabled). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Only allow enabling global revocation queue on Vault Enterprise Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use a locking queue to handle revocation requests This queue attempts to guarantee that PKI's invalidateFunc won't have to wait long to execute: by locking only around access to the queue proper, and internally using a list, we minimize the time spent locked, waiting for queue accesses. Previously, we held a lock during tidy and processing that would've prevented us from processing invalidateFunc calls. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * use_global_queue->cross_cluster_revocation Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Grab revocation storage lock when processing queue We need to grab the storage lock as we'll actively be revoking new certificates in the revocation queue. This ensures nobody else is competing for storage access, across periodic funcs, new revocations, and tidy operations. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix expected tidy status test Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow probing RollbackManager directly in tests Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Address review feedback on revocationQueue Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add more cancel checks, fix starting manual tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
"tidy_revocation_queue": config.RevocationQueue,
"revocation_queue_safety_buffer": int(config.QueueSafetyBuffer / time.Second),
},
}, nil
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
func (b *backend) tidyStatusStart(config *tidyConfig) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus = &tidyStatus{
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
safetyBuffer: int(config.SafetyBuffer / time.Second),
issuerSafetyBuffer: int(config.IssuerSafetyBuffer / time.Second),
tidyCertStore: config.CertStore,
tidyRevokedCerts: config.RevokedCerts,
tidyRevokedAssocs: config.IssuerAssocs,
tidyExpiredIssuers: config.ExpiredIssuers,
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
tidyBackupBundle: config.BackupBundle,
Add automatic tidy of expired issuers (#17823) * Add automatic tidy of expired issuers To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers. This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys. From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add test for tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add docs on tidy of issuers Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Restructure logging Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing fields to expected tidy output Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
pauseDuration: config.PauseDuration.String(),
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
state: tidyStatusStarted,
timeStarted: time.Now(),
}
metrics.SetGauge([]string{"secrets", "pki", "tidy", "start_time_epoch"}, float32(b.tidyStatus.timeStarted.Unix()))
}
func (b *backend) tidyStatusStop(err error) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.timeFinished = time.Now()
b.tidyStatus.err = err
if err == nil {
b.tidyStatus.state = tidyStatusFinished
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
} else if err == tidyCancelledError {
b.tidyStatus.state = tidyStatusCancelled
} else {
b.tidyStatus.state = tidyStatusError
}
metrics.MeasureSince([]string{"secrets", "pki", "tidy", "duration"}, b.tidyStatus.timeStarted)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "start_time_epoch"}, 0)
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "cert_store_deleted_count"}, float32(b.tidyStatus.certStoreDeletedCount))
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "revoked_cert_deleted_count"}, float32(b.tidyStatus.revokedCertDeletedCount))
if err != nil {
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "failure"}, 1)
} else {
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "success"}, 1)
}
}
func (b *backend) tidyStatusMessage(msg string) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.message = msg
}
func (b *backend) tidyStatusIncCertStoreCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.certStoreDeletedCount++
b.decrementTotalCertificatesCountReport()
}
func (b *backend) tidyStatusIncRevokedCertCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.revokedCertDeletedCount++
b.decrementTotalRevokedCertificatesCountReport()
}
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
func (b *backend) tidyStatusIncMissingIssuerCertCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.missingIssuerCertCount++
}
func (b *backend) tidyStatusIncRevQueueCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.revQueueDeletedCount++
}
const pathTidyHelpSyn = `
Tidy up the backend by removing expired certificates, revocation information,
or both.
`
const pathTidyHelpDesc = `
This endpoint allows expired certificates and/or revocation information to be
removed from the backend, freeing up storage and shortening CRLs.
For safety, this function is a noop if called without parameters; cleanup from
normal certificate storage must be enabled with 'tidy_cert_store' and cleanup
from revocation information must be enabled with 'tidy_revocation_list'.
The 'safety_buffer' parameter is useful to ensure that clock skew amongst your
hosts cannot lead to a certificate being removed from the CRL while it is still
considered valid by other hosts (for instance, if their clocks are a few
2016-02-25 03:05:39 +00:00
minutes behind). The 'safety_buffer' parameter can be an integer number of
seconds or a string duration like "72h".
All certificates and/or revocation information currently stored in the backend
will be checked when this endpoint is hit. The expiration of the
certificate/revocation information of each certificate being held in
2018-03-20 18:54:10 +00:00
certificate storage or in revocation information will then be checked. If the
current time, minus the value of 'safety_buffer', is greater than the
expiration, it will be removed.
`
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
const pathTidyCancelHelpSyn = `
Cancels a currently running tidy operation.
`
const pathTidyCancelHelpDesc = `
This endpoint allows cancelling a currently running tidy operation.
Periodically throughout the invocation of tidy, we'll check if the operation
has been requested to be cancelled. If so, we'll stop the currently running
tidy operation.
`
const pathTidyStatusHelpSyn = `
Returns the status of the tidy operation.
`
const pathTidyStatusHelpDesc = `
This is a read only endpoint that returns information about the current tidy
operation, or the most recent if none is currently running.
The result includes the following fields:
* 'safety_buffer': the value of this parameter when initiating the tidy operation
* 'tidy_cert_store': the value of this parameter when initiating the tidy operation
* 'tidy_revoked_certs': the value of this parameter when initiating the tidy operation
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
* 'tidy_revoked_cert_issuer_associations': the value of this parameter when initiating the tidy operation
* 'state': one of "Inactive", "Running", "Finished", "Error"
* 'error': the error message, if the operation ran into an error
* 'time_started': the time the operation started
* 'time_finished': the time the operation finished
* 'message': One of "Tidying certificate store: checking entry N of TOTAL" or
"Tidying revoked certificates: checking certificate N of TOTAL"
* 'cert_store_deleted_count': The number of certificate storage entries deleted
* 'revoked_cert_deleted_count': The number of revoked certificate entries deleted
Let PKI tidy associate revoked certs with their issuers (#16871) * Refactor tidy steps into two separate helpers This refactors the tidy go routine into two separate helpers, making it clear where the boundaries of each are: variables are passed into these method and concerns are separated. As more operations are rolled into tidy, we can continue adding more helpers as appropriate. Additionally, as we move to make auto-tidy occur, we can use these as points to hook into periodic tidying. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor revInfo checking to helper This allows us to validate whether or not a revInfo entry contains a presently valid issuer, from the existing mapping. Coupled with the changeset to identify the issuer on revocation, we can begin adding capabilities to tidy to update this association, decreasing CRL build time and increasing the performance of OCSP. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor issuer fetching for revocation purposes Revocation needs to gracefully handle using the old legacy cert bundle, so fetching issuers (and parsing them) needs to be done slightly differently than other places. Refactor this from revokeCert into a common helper that can be used by tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Allow tidy to associate revoked certs, issuers When revoking a certificate, we need to associate the issuer that signed its certificate back to the revInfo entry. Historically this was performed during CRL building (and still remains so), but when running without CRL building and with only OCSP, performance will degrade as the issuer needs to be found each time. Instead, allow the tidy operation to take over this role, allowing us to increase the performance of OCSP and CRL in this scenario, by decoupling issuer identification from CRL building in the ideal case. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for tidy updates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on new tidy parameter, metrics Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Refactor tidy config into shared struct Finish adding metrics, status messages about new tidy operation. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 17:13:45 +00:00
* 'missing_issuer_cert_count': The number of revoked certificates which were missing a valid issuer reference
Allow tidy to backup legacy CA bundles (#18645) * Allow tidy to backup legacy CA bundles With the new tidy_move_legacy_ca_bundle option, we'll use tidy to move the legacy CA bundle from /config/ca_bundle to /config/ca_bundle.bak. This does two things: 1. Removes ca_bundle from the hot-path of initialization after initial migration has completed. Because this entry is seal wrapped, this may result in performance improvements. 2. Allows recovery of this value in the event of some other failure with migration. Notably, this cannot occur during migration in the unlikely (and largely unsupported) case that the operator immediately downgrades to Vault <1.11.x. Thus, we reuse issuer_safety_buffer; while potentially long, tidy can always be run manually with a shorter buffer (and only this flag) to manually move the bundle if necessary. In the event of needing to recover or undo this operation, it is sufficient to use sys/raw to read the backed up value and subsequently write it to its old path (/config/ca_bundle). The new entry remains seal wrapped, but otherwise isn't used within the code and so has better performance characteristics. Performing a fat deletion (DELETE /root) will again remove the backup like the old legacy bundle, preserving its wipe characteristics. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation about new tidy parameter Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for migration scenarios Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Clean up time comparisons Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-11 17:12:53 +00:00
* 'tidy_expired_issuers': the value of this parameter when initiating the tidy operation
* 'issuer_safety_buffer': the value of this parameter when initiating the tidy operation
* 'tidy_move_legacy_ca_bundle': the value of this parameter when initiating the tidy operation
`
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
const pathConfigAutoTidySyn = `
Modifies the current configuration for automatic tidy execution.
`
const pathConfigAutoTidyDesc = `
This endpoint accepts parameters to a tidy operation (see /tidy) that
will be used for automatic tidy execution. This takes two extra parameters,
enabled (to enable or disable auto-tidy) and interval_duration (which
controls the frequency of auto-tidy execution).
Add ability to perform automatic tidy operations (#16900) * Add ability to perform automatic tidy operations This enables the PKI secrets engine to allow tidy to be started periodically by the engine itself, avoiding the need for interaction. This operation is disabled by default (to avoid load on clusters which don't need tidy to be run) but can be enabled. In particular, a default tidy configuration is written (via /config/auto-tidy) which mirrors the options passed to /tidy. Two additional parameters, enabled and interval, are accepted, allowing auto-tidy to be enabled or disabled and controlling the interval (between successful tidy runs) to attempt auto-tidy. Notably, a manual execution of tidy will delay additional auto-tidy operations. Status is reported via the existing /tidy-status endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add documentation on auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for auto-tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Prevent race during parallel testing We modified the RollbackManager's execution window to allow more faithful testing of the periodicFunc. However, the TestAutoRebuild and the new TestAutoTidy would then race against each other for modifying the period and creating their clusters (before resetting to the old value). This changeset adds a lock around this, preventing the races. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Use tidyStatusLock to gate lastTidy time This prevents a data race between the periodic func and the execution of the running tidy. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add read lock around tidyStatus gauges When reading from tidyStatus for computing gauges, since the underlying values aren't atomics, we really should be gating these with a read lock around the status access. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
Add ability to cancel PKI tidy operations, pause between tidying certs (#16958) * Allow tidy operations to be cancelled When tidy operations take a long time to execute (and especially when executing them automatically), having the ability to cancel them becomes useful to reduce strain on Vault clusters (and let them be rescheduled at a later time). To this end, we add the /tidy-cancel write endpoint. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add missing auto-tidy synopsis / description Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add a pause duration between tidying certificates By setting pause_duration, operators can have a little control over the resource utilization of a tidy operation. While the list of certificates remain in memory throughout the entire operation, a pause is added between processing certificates and the revocation lock is released. This allows other operations to occur during this gap and potentially allows the tidy operation to consume less resources per unit of time (due to the sleep -- though obviously consumes the same resources over the time of the operation). Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add tests for cancellation, pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add API docs on pause_duration, /tidy-cancel Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add changelog entry Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add lock releasing around tidy pause Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Reset cancel guard, return errors Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-31 18:36:12 +00:00
Once enabled, a tidy operation will be kicked off automatically, as if it
were executed with the posted configuration.
`