Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
package pki
|
|
|
|
|
|
|
|
import (
|
2022-11-02 17:06:04 +00:00
|
|
|
"encoding/json"
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
"testing"
|
|
|
|
"time"
|
|
|
|
|
|
|
|
"github.com/hashicorp/vault/api"
|
|
|
|
vaulthttp "github.com/hashicorp/vault/http"
|
|
|
|
"github.com/hashicorp/vault/sdk/logical"
|
|
|
|
"github.com/hashicorp/vault/vault"
|
|
|
|
|
|
|
|
"github.com/stretchr/testify/require"
|
|
|
|
)
|
|
|
|
|
|
|
|
func TestAutoTidy(t *testing.T) {
|
|
|
|
t.Parallel()
|
|
|
|
|
|
|
|
// While we'd like to reduce this duration, we need to wait until
|
|
|
|
// the rollback manager timer ticks. With the new helper, we can
|
|
|
|
// modify the rollback manager timer period directly, allowing us
|
|
|
|
// to shorten the total test time significantly.
|
|
|
|
//
|
|
|
|
// We set the delta CRL time to ensure it executes prior to the
|
|
|
|
// main CRL rebuild, and the new CRL doesn't rebuild until after
|
|
|
|
// we're done.
|
|
|
|
newPeriod := 1 * time.Second
|
|
|
|
|
|
|
|
// This test requires the periodicFunc to trigger, which requires we stand
|
|
|
|
// up a full test cluster.
|
|
|
|
coreConfig := &vault.CoreConfig{
|
|
|
|
LogicalBackends: map[string]logical.Factory{
|
|
|
|
"pki": Factory,
|
|
|
|
},
|
|
|
|
// See notes below about usage of /sys/raw for reading cluster
|
|
|
|
// storage without barrier encryption.
|
2022-10-13 13:59:07 +00:00
|
|
|
EnableRaw: true,
|
|
|
|
RollbackPeriod: newPeriod,
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
}
|
2022-10-13 13:59:07 +00:00
|
|
|
cluster := vault.NewTestCluster(t, coreConfig, &vault.TestClusterOptions{
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
HandlerFunc: vaulthttp.Handler,
|
|
|
|
})
|
2022-10-13 13:59:07 +00:00
|
|
|
cluster.Start()
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
defer cluster.Cleanup()
|
|
|
|
client := cluster.Cores[0].Client
|
|
|
|
|
|
|
|
// Mount PKI
|
|
|
|
err := client.Sys().Mount("pki", &api.MountInput{
|
|
|
|
Type: "pki",
|
|
|
|
Config: api.MountConfigInput{
|
|
|
|
DefaultLeaseTTL: "10m",
|
|
|
|
MaxLeaseTTL: "60m",
|
|
|
|
},
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Generate root.
|
|
|
|
resp, err := client.Logical().Write("pki/root/generate/internal", map[string]interface{}{
|
|
|
|
"ttl": "40h",
|
|
|
|
"common_name": "Root X1",
|
|
|
|
"key_type": "ec",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotEmpty(t, resp.Data)
|
|
|
|
require.NotEmpty(t, resp.Data["issuer_id"])
|
2022-11-02 17:06:04 +00:00
|
|
|
issuerId := resp.Data["issuer_id"]
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
|
|
|
|
// Run tidy so status is not empty when we run it later...
|
|
|
|
_, err = client.Logical().Write("pki/tidy", map[string]interface{}{
|
|
|
|
"tidy_revoked_certs": true,
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Setup a testing role.
|
|
|
|
_, err = client.Logical().Write("pki/roles/local-testing", map[string]interface{}{
|
|
|
|
"allow_any_name": true,
|
|
|
|
"enforce_hostnames": false,
|
|
|
|
"key_type": "ec",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Write the auto-tidy config.
|
|
|
|
_, err = client.Logical().Write("pki/config/auto-tidy", map[string]interface{}{
|
|
|
|
"enabled": true,
|
|
|
|
"interval_duration": "1s",
|
|
|
|
"tidy_cert_store": true,
|
|
|
|
"tidy_revoked_certs": true,
|
|
|
|
"safety_buffer": "1s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Issue a cert and revoke it.
|
|
|
|
resp, err = client.Logical().Write("pki/issue/local-testing", map[string]interface{}{
|
|
|
|
"common_name": "example.com",
|
|
|
|
"ttl": "10s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotNil(t, resp.Data)
|
|
|
|
require.NotEmpty(t, resp.Data["serial_number"])
|
|
|
|
require.NotEmpty(t, resp.Data["certificate"])
|
|
|
|
leafSerial := resp.Data["serial_number"].(string)
|
|
|
|
leafCert := parseCert(t, resp.Data["certificate"].(string))
|
|
|
|
|
2022-11-02 17:06:04 +00:00
|
|
|
// Read cert before revoking
|
|
|
|
resp, err = client.Logical().Read("pki/cert/" + leafSerial)
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotNil(t, resp.Data)
|
|
|
|
require.NotEmpty(t, resp.Data["certificate"])
|
|
|
|
revocationTime, err := (resp.Data["revocation_time"].(json.Number)).Int64()
|
|
|
|
require.Equal(t, int64(0), revocationTime, "revocation time was not zero")
|
|
|
|
require.Empty(t, resp.Data["revocation_time_rfc3339"], "revocation_time_rfc3339 was not empty")
|
|
|
|
require.Empty(t, resp.Data["issuer_id"], "issuer_id was not empty")
|
|
|
|
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
_, err = client.Logical().Write("pki/revoke", map[string]interface{}{
|
|
|
|
"serial_number": leafSerial,
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Cert should still exist.
|
|
|
|
resp, err = client.Logical().Read("pki/cert/" + leafSerial)
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotNil(t, resp.Data)
|
|
|
|
require.NotEmpty(t, resp.Data["certificate"])
|
2022-11-02 17:06:04 +00:00
|
|
|
revocationTime, err = (resp.Data["revocation_time"].(json.Number)).Int64()
|
|
|
|
require.NoError(t, err, "failed converting %s to int", resp.Data["revocation_time"])
|
|
|
|
revTime := time.Unix(revocationTime, 0)
|
|
|
|
now := time.Now()
|
|
|
|
if !(now.After(revTime) && now.Add(-10*time.Minute).Before(revTime)) {
|
|
|
|
t.Fatalf("parsed revocation time not within the last 10 minutes current time: %s, revocation time: %s", now, revTime)
|
|
|
|
}
|
|
|
|
utcLoc, err := time.LoadLocation("UTC")
|
|
|
|
require.NoError(t, err, "failed to parse UTC location?")
|
|
|
|
|
|
|
|
rfc3339RevocationTime, err := time.Parse(time.RFC3339Nano, resp.Data["revocation_time_rfc3339"].(string))
|
|
|
|
require.NoError(t, err, "failed parsing revocation_time_rfc3339 field: %s", resp.Data["revocation_time_rfc3339"])
|
|
|
|
|
|
|
|
require.Equal(t, revTime.In(utcLoc), rfc3339RevocationTime.Truncate(time.Second),
|
|
|
|
"revocation times did not match revocation_time: %s, "+"rfc3339 time: %s", revTime, rfc3339RevocationTime)
|
|
|
|
require.Equal(t, issuerId, resp.Data["issuer_id"], "issuer_id on leaf cert did not match")
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
|
|
|
|
// Wait for cert to expire and the safety buffer to elapse.
|
2022-08-31 20:25:14 +00:00
|
|
|
time.Sleep(time.Until(leafCert.NotAfter) + 3*time.Second)
|
Add ability to perform automatic tidy operations (#16900)
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-30 19:45:54 +00:00
|
|
|
|
|
|
|
// Wait for auto-tidy to run afterwards.
|
|
|
|
var foundTidyRunning string
|
|
|
|
var foundTidyFinished bool
|
|
|
|
timeoutChan := time.After(120 * time.Second)
|
|
|
|
for {
|
|
|
|
if foundTidyRunning != "" && foundTidyFinished {
|
|
|
|
break
|
|
|
|
}
|
|
|
|
|
|
|
|
select {
|
|
|
|
case <-timeoutChan:
|
|
|
|
t.Fatalf("expected auto-tidy to run (%v) and finish (%v) before 120 seconds elapsed", foundTidyRunning, foundTidyFinished)
|
|
|
|
default:
|
|
|
|
time.Sleep(250 * time.Millisecond)
|
|
|
|
|
|
|
|
resp, err = client.Logical().Read("pki/tidy-status")
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotNil(t, resp.Data)
|
|
|
|
require.NotEmpty(t, resp.Data["state"])
|
|
|
|
require.NotEmpty(t, resp.Data["time_started"])
|
|
|
|
state := resp.Data["state"].(string)
|
|
|
|
started := resp.Data["time_started"].(string)
|
|
|
|
t.Logf("Resp: %v", resp.Data)
|
|
|
|
|
|
|
|
// We want the _next_ tidy run after the cert expires. This
|
|
|
|
// means if we're currently finished when we hit this the
|
|
|
|
// first time, we want to wait for the next run.
|
|
|
|
if foundTidyRunning == "" {
|
|
|
|
foundTidyRunning = started
|
|
|
|
} else if foundTidyRunning != started && !foundTidyFinished && state == "Finished" {
|
|
|
|
foundTidyFinished = true
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Cert should no longer exist.
|
|
|
|
resp, err = client.Logical().Read("pki/cert/" + leafSerial)
|
|
|
|
require.Nil(t, err)
|
|
|
|
require.Nil(t, resp)
|
|
|
|
}
|
2022-08-31 18:36:12 +00:00
|
|
|
|
|
|
|
func TestTidyCancellation(t *testing.T) {
|
|
|
|
t.Parallel()
|
|
|
|
|
|
|
|
numLeaves := 100
|
|
|
|
|
2022-11-14 23:26:26 +00:00
|
|
|
b, s := CreateBackendWithStorage(t)
|
2022-08-31 18:36:12 +00:00
|
|
|
|
|
|
|
// Create a root, a role, and a bunch of leaves.
|
|
|
|
_, err := CBWrite(b, s, "root/generate/internal", map[string]interface{}{
|
|
|
|
"common_name": "root example.com",
|
|
|
|
"issuer_name": "root",
|
|
|
|
"ttl": "20m",
|
|
|
|
"key_type": "ec",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
_, err = CBWrite(b, s, "roles/local-testing", map[string]interface{}{
|
|
|
|
"allow_any_name": true,
|
|
|
|
"enforce_hostnames": false,
|
|
|
|
"key_type": "ec",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
for i := 0; i < numLeaves; i++ {
|
|
|
|
_, err = CBWrite(b, s, "issue/local-testing", map[string]interface{}{
|
|
|
|
"common_name": "testing",
|
|
|
|
"ttl": "1s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
}
|
|
|
|
|
|
|
|
// Kick off a tidy operation (which runs in the background), but with
|
|
|
|
// a slow-ish pause between certificates.
|
|
|
|
_, err = CBWrite(b, s, "tidy", map[string]interface{}{
|
|
|
|
"tidy_cert_store": true,
|
|
|
|
"safety_buffer": "1s",
|
|
|
|
"pause_duration": "1s",
|
|
|
|
})
|
|
|
|
|
|
|
|
// If we wait six seconds, the operation should still be running. That's
|
|
|
|
// how we check that pause_duration works.
|
|
|
|
time.Sleep(3 * time.Second)
|
|
|
|
|
|
|
|
resp, err := CBRead(b, s, "tidy-status")
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, resp)
|
|
|
|
require.NotNil(t, resp.Data)
|
|
|
|
require.Equal(t, resp.Data["state"], "Running")
|
|
|
|
|
|
|
|
// If we now cancel the operation, the response should say Cancelling.
|
|
|
|
cancelResp, err := CBWrite(b, s, "tidy-cancel", map[string]interface{}{})
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, cancelResp)
|
|
|
|
require.NotNil(t, cancelResp.Data)
|
|
|
|
state := cancelResp.Data["state"].(string)
|
|
|
|
howMany := cancelResp.Data["cert_store_deleted_count"].(uint)
|
|
|
|
|
|
|
|
if state == "Cancelled" {
|
|
|
|
// Rest of the test can't run; log and exit.
|
|
|
|
t.Log("Went to cancel the operation but response was already cancelled")
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
|
|
|
require.Equal(t, state, "Cancelling")
|
|
|
|
|
|
|
|
// Wait a little longer, and ensure we only processed at most 2 more certs
|
|
|
|
// after the cancellation respon.
|
|
|
|
time.Sleep(3 * time.Second)
|
|
|
|
|
|
|
|
statusResp, err := CBRead(b, s, "tidy-status")
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, statusResp)
|
|
|
|
require.NotNil(t, statusResp.Data)
|
|
|
|
require.Equal(t, statusResp.Data["state"], "Cancelled")
|
|
|
|
nowMany := statusResp.Data["cert_store_deleted_count"].(uint)
|
|
|
|
if howMany+3 <= nowMany {
|
|
|
|
t.Fatalf("expected to only process at most 3 more certificates, but processed (%v >>> %v) certs", nowMany, howMany)
|
|
|
|
}
|
|
|
|
}
|
Add automatic tidy of expired issuers (#17823)
* Add automatic tidy of expired issuers
To aid PKI users like Consul, which periodically rotate intermediates,
and provided a little more consistency with older versions of Vault
which would silently (and dangerously!) replace the configured CA on
root/intermediate generation, we introduce an automatic tidy of expired
issuers.
This includes a longer safety buffer (1 year) and logging of the
relevant issuer information prior to deletion (certificate contents, key
ID, and issuer ID/name) to allow admins to recover this value if
desired, or perform further cleanup of keys.
From my PoV, removal of the issuer is thus a relatively safe operation
compared to keys (which I do not feel comfortable removing) as they can
always be re-imported if desired. Additionally, this is an opt-in tidy
operation, not enabled by default. Lastly, most major performance
penalties comes with lots of issuers within the mount, not as much
large numbers of keys (as only new issuer creation/import operations are
affected, unlike LIST /issuers which is a public, unauthenticated
endpoint).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add docs on tidy of issuers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Restructure logging
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing fields to expected tidy output
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
|
|
|
|
|
|
|
func TestTidyIssuers(t *testing.T) {
|
|
|
|
t.Parallel()
|
|
|
|
|
2022-11-14 23:26:26 +00:00
|
|
|
b, s := CreateBackendWithStorage(t)
|
Add automatic tidy of expired issuers (#17823)
* Add automatic tidy of expired issuers
To aid PKI users like Consul, which periodically rotate intermediates,
and provided a little more consistency with older versions of Vault
which would silently (and dangerously!) replace the configured CA on
root/intermediate generation, we introduce an automatic tidy of expired
issuers.
This includes a longer safety buffer (1 year) and logging of the
relevant issuer information prior to deletion (certificate contents, key
ID, and issuer ID/name) to allow admins to recover this value if
desired, or perform further cleanup of keys.
From my PoV, removal of the issuer is thus a relatively safe operation
compared to keys (which I do not feel comfortable removing) as they can
always be re-imported if desired. Additionally, this is an opt-in tidy
operation, not enabled by default. Lastly, most major performance
penalties comes with lots of issuers within the mount, not as much
large numbers of keys (as only new issuer creation/import operations are
affected, unlike LIST /issuers which is a public, unauthenticated
endpoint).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add docs on tidy of issuers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Restructure logging
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing fields to expected tidy output
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-10 15:53:26 +00:00
|
|
|
|
|
|
|
// Create a root that expires quickly and one valid for longer.
|
|
|
|
_, err := CBWrite(b, s, "root/generate/internal", map[string]interface{}{
|
|
|
|
"common_name": "root1 example.com",
|
|
|
|
"issuer_name": "root-expired",
|
|
|
|
"ttl": "1s",
|
|
|
|
"key_type": "ec",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
_, err = CBWrite(b, s, "root/generate/internal", map[string]interface{}{
|
|
|
|
"common_name": "root2 example.com",
|
|
|
|
"issuer_name": "root-valid",
|
|
|
|
"ttl": "60m",
|
|
|
|
"key_type": "rsa",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Sleep long enough to expire the root.
|
|
|
|
time.Sleep(2 * time.Second)
|
|
|
|
|
|
|
|
// First tidy run shouldn't remove anything; too long of safety buffer.
|
|
|
|
_, err = CBWrite(b, s, "tidy", map[string]interface{}{
|
|
|
|
"tidy_expired_issuers": true,
|
|
|
|
"issuer_safety_buffer": "60m",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Wait for tidy to finish.
|
|
|
|
time.Sleep(2 * time.Second)
|
|
|
|
|
|
|
|
// Expired issuer should exist.
|
|
|
|
resp, err := CBRead(b, s, "issuer/root-expired")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "expired should still be present")
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-valid")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "valid should still be present")
|
|
|
|
|
|
|
|
// Second tidy run with shorter safety buffer shouldn't remove the
|
|
|
|
// expired one, as it should be the default issuer.
|
|
|
|
_, err = CBWrite(b, s, "tidy", map[string]interface{}{
|
|
|
|
"tidy_expired_issuers": true,
|
|
|
|
"issuer_safety_buffer": "1s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Wait for tidy to finish.
|
|
|
|
time.Sleep(2 * time.Second)
|
|
|
|
|
|
|
|
// Expired issuer should still exist.
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-expired")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "expired should still be present")
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-valid")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "valid should still be present")
|
|
|
|
|
|
|
|
// Update the default issuer.
|
|
|
|
_, err = CBWrite(b, s, "config/issuers", map[string]interface{}{
|
|
|
|
"default": "root-valid",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Third tidy run should remove the expired one.
|
|
|
|
_, err = CBWrite(b, s, "tidy", map[string]interface{}{
|
|
|
|
"tidy_expired_issuers": true,
|
|
|
|
"issuer_safety_buffer": "1s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Wait for tidy to finish.
|
|
|
|
time.Sleep(2 * time.Second)
|
|
|
|
|
|
|
|
// Valid issuer should exist still; other should be removed.
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-expired")
|
|
|
|
require.Error(t, err)
|
|
|
|
require.Nil(t, resp)
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-valid")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "valid should still be present")
|
|
|
|
|
|
|
|
// Finally, one more tidy should cause no changes.
|
|
|
|
_, err = CBWrite(b, s, "tidy", map[string]interface{}{
|
|
|
|
"tidy_expired_issuers": true,
|
|
|
|
"issuer_safety_buffer": "1s",
|
|
|
|
})
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Wait for tidy to finish.
|
|
|
|
time.Sleep(2 * time.Second)
|
|
|
|
|
|
|
|
// Valid issuer should exist still; other should be removed.
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-expired")
|
|
|
|
require.Error(t, err)
|
|
|
|
require.Nil(t, resp)
|
|
|
|
resp, err = CBRead(b, s, "issuer/root-valid")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err, "valid should still be present")
|
|
|
|
|
|
|
|
// Ensure we have safety buffer and expired issuers set correctly.
|
|
|
|
statusResp, err := CBRead(b, s, "tidy-status")
|
|
|
|
require.NoError(t, err)
|
|
|
|
require.NotNil(t, statusResp)
|
|
|
|
require.NotNil(t, statusResp.Data)
|
|
|
|
require.Equal(t, statusResp.Data["issuer_safety_buffer"], 1)
|
|
|
|
require.Equal(t, statusResp.Data["tidy_expired_issuers"], true)
|
|
|
|
}
|
2022-12-14 20:35:21 +00:00
|
|
|
|
|
|
|
func TestTidyIssuerConfig(t *testing.T) {
|
|
|
|
t.Parallel()
|
|
|
|
|
|
|
|
b, s := CreateBackendWithStorage(t)
|
|
|
|
|
|
|
|
// Ensure the default auto-tidy config matches expectations
|
|
|
|
resp, err := CBRead(b, s, "config/auto-tidy")
|
|
|
|
requireSuccessNonNilResponse(t, resp, err)
|
|
|
|
|
|
|
|
jsonBlob, err := json.Marshal(&defaultTidyConfig)
|
|
|
|
require.NoError(t, err)
|
|
|
|
var defaultConfigMap map[string]interface{}
|
|
|
|
err = json.Unmarshal(jsonBlob, &defaultConfigMap)
|
|
|
|
require.NoError(t, err)
|
|
|
|
|
|
|
|
// Coerce defaults to API response types.
|
|
|
|
defaultConfigMap["interval_duration"] = int(time.Duration(defaultConfigMap["interval_duration"].(float64)) / time.Second)
|
|
|
|
defaultConfigMap["issuer_safety_buffer"] = int(time.Duration(defaultConfigMap["issuer_safety_buffer"].(float64)) / time.Second)
|
|
|
|
defaultConfigMap["safety_buffer"] = int(time.Duration(defaultConfigMap["safety_buffer"].(float64)) / time.Second)
|
|
|
|
defaultConfigMap["pause_duration"] = time.Duration(defaultConfigMap["pause_duration"].(float64)).String()
|
Add cross-cluster revocation queues for PKI (#18784)
* Add global, cross-cluster revocation queue to PKI
This adds a global, cross-cluster replicated revocation queue, allowing
operators to revoke certificates by serial number across any cluster. We
don't support revoking with private key (PoP) in the initial
implementation.
In particular, building on the PBPWF work, we add a special storage
location for handling non-local revocations which gets replicated up to
the active, primary cluster node and back down to all secondary PR
clusters. These then check the pending revocation entry and revoke the
serial locally if it exists, writing a cross-cluster confirmation entry.
Listing capabilities are present under pki/certs/revocation-queue,
allowing operators to see which certs are present. However, a future
improvement to the tidy subsystem will allow automatic cleanup of stale
entries.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidying revocation queue entries
No manual operator control of revocation queue entries are allowed.
However, entries are stored with their request time, allowing tidy to,
after a suitable safety buffer, remove these unconfirmed and presumably
invalid requests.
Notably, when a cluster goes offline, it will be unable to process
cross-cluster revocations for certificates it holds. If tidy runs,
potentially valid revocations may be removed. However, it is up to the
administrator to ensure the tidy window is sufficiently long that any
required maintenance is done (or, prior to maintenance when an issue is
first noticed, tidy is temporarily disabled).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Only allow enabling global revocation queue on Vault Enterprise
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use a locking queue to handle revocation requests
This queue attempts to guarantee that PKI's invalidateFunc won't have
to wait long to execute: by locking only around access to the queue
proper, and internally using a list, we minimize the time spent locked,
waiting for queue accesses.
Previously, we held a lock during tidy and processing that would've
prevented us from processing invalidateFunc calls.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* use_global_queue->cross_cluster_revocation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Grab revocation storage lock when processing queue
We need to grab the storage lock as we'll actively be revoking new
certificates in the revocation queue. This ensures nobody else is
competing for storage access, across periodic funcs, new revocations,
and tidy operations.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix expected tidy status test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow probing RollbackManager directly in tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback on revocationQueue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add more cancel checks, fix starting manual tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 14:29:27 +00:00
|
|
|
defaultConfigMap["revocation_queue_safety_buffer"] = int(time.Duration(defaultConfigMap["revocation_queue_safety_buffer"].(float64)) / time.Second)
|
2022-12-14 20:35:21 +00:00
|
|
|
|
|
|
|
require.Equal(t, defaultConfigMap, resp.Data)
|
|
|
|
|
|
|
|
// Ensure setting issuer-tidy related fields stick.
|
|
|
|
resp, err = CBWrite(b, s, "config/auto-tidy", map[string]interface{}{
|
|
|
|
"tidy_expired_issuers": true,
|
|
|
|
"issuer_safety_buffer": "5s",
|
|
|
|
})
|
|
|
|
requireSuccessNonNilResponse(t, resp, err)
|
|
|
|
require.Equal(t, true, resp.Data["tidy_expired_issuers"])
|
|
|
|
require.Equal(t, 5, resp.Data["issuer_safety_buffer"])
|
|
|
|
}
|