open-vault/builtin
Alexander Scheel b3dc380c82
Add cross-cluster revocation queues for PKI (#18784)
* Add global, cross-cluster revocation queue to PKI

This adds a global, cross-cluster replicated revocation queue, allowing
operators to revoke certificates by serial number across any cluster. We
don't support revoking with private key (PoP) in the initial
implementation.

In particular, building on the PBPWF work, we add a special storage
location for handling non-local revocations which gets replicated up to
the active, primary cluster node and back down to all secondary PR
clusters. These then check the pending revocation entry and revoke the
serial locally if it exists, writing a cross-cluster confirmation entry.

Listing capabilities are present under pki/certs/revocation-queue,
allowing operators to see which certs are present. However, a future
improvement to the tidy subsystem will allow automatic cleanup of stale
entries.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Allow tidying revocation queue entries

No manual operator control of revocation queue entries are allowed.
However, entries are stored with their request time, allowing tidy to,
after a suitable safety buffer, remove these unconfirmed and presumably
invalid requests.

Notably, when a cluster goes offline, it will be unable to process
cross-cluster revocations for certificates it holds. If tidy runs,
potentially valid revocations may be removed. However, it is up to the
administrator to ensure the tidy window is sufficiently long that any
required maintenance is done (or, prior to maintenance when an issue is
first noticed, tidy is temporarily disabled).

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Only allow enabling global revocation queue on Vault Enterprise

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Use a locking queue to handle revocation requests

This queue attempts to guarantee that PKI's invalidateFunc won't have
to wait long to execute: by locking only around access to the queue
proper, and internally using a list, we minimize the time spent locked,
waiting for queue accesses.

Previously, we held a lock during tidy and processing that would've
prevented us from processing invalidateFunc calls.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* use_global_queue->cross_cluster_revocation

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Grab revocation storage lock when processing queue

We need to grab the storage lock as we'll actively be revoking new
certificates in the revocation queue. This ensures nobody else is
competing for storage access, across periodic funcs, new revocations,
and tidy operations.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Fix expected tidy status test

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Allow probing RollbackManager directly in tests

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Address review feedback on revocationQueue

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add more cancel checks, fix starting manual tidy

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2023-01-23 09:29:27 -05:00
..
audit Add option 'elide_list_responses' to audit backends (#18128) 2023-01-11 16:15:52 -05:00
credential Add AppRole response schema validation tests (#18636) 2023-01-13 15:23:36 -05:00
logical Add cross-cluster revocation queues for PKI (#18784) 2023-01-23 09:29:27 -05:00
plugin Plugins: Update running version everywhere running sha256 is set (#17292) 2022-09-23 11:19:38 +01:00