Commit Graph

1 Commits

Author SHA1 Message Date
Alexander Scheel ffa4825693
PKI - Fix order of chain building writes (#17772)
* Ensure correct write ordering in rebuildIssuersChains

When troubleshooting a recent migration failure from 1.10->1.11, it was
noted that some PKI mounts had bad chain construction despite having
valid, chaining issuers. Due to the cluster's leadership trashing
between nodes, the migration logic was re-executed several times,
partially succeeding each time. While the legacy CA bundle migration
logic was written with this in mind, one shortcoming in the chain
building code lead us to truncate the ca_chain: by sorting the list of
issuers after including non-written issuers (with random IDs), these
issuers would occasionally be persisted prior to storage _prior_ to
existing CAs with modified chains.

The migration code carefully imported the active issuer prior to its
parents. However, due to this bug, there was a chance that, if write to
the pending parent succeeded but updating the active issuer didn't, the
active issuer's ca_chain field would only contain the self-reference and
not the parent's reference as well. Ultimately, a workaround of setting
and subsequently unsetting a manual chain would force a chain
regeneration.

In this patch, we simply fix the write ordering: because we need to
ensure a stable chain sorting, we leave the sort location in the same
place, but delay writing the provided referenceCert to the last
position. This is because the reference is meant to be the user-facing
action: without transactional write capabilities, other chains may
succeed, but if the last user-facing action fails, the user will
hopefully retry the action. This will also correct migration, by
ensuring the subsequent issuer import will be attempted again,
triggering another chain build and only persisting this issuer when
all other issuers have also been updated.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Remigrate ca_chains to fix any missing issuers

In the previous commit, we identified an issue that would occur on
legacy issuer migration to the new storage format. This is easy enough
to detect for any given mount (by an operator), but automating scanning
and remediating all PKI mounts in large deployments might be difficult.

Write a new storage migration version to regenerate all chains on
upgrade, once.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add changelog entry

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add issue to PKI considerations documentation

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Correct %v -> %w in chain building errs

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-11-03 11:50:03 -04:00