Fix some duplication of partials, and add fix versions for update-primary data loss issue (#22182) (#23043)
This commit is contained in:
parent
28c15e2a98
commit
da9cd4c878
|
@ -117,7 +117,7 @@ The fix for this UI issue is coming in the Vault 1.13.1 release.
|
|||
|
||||
@include 'perf-standby-token-create-forwarding-failure.mdx'
|
||||
|
||||
@include 'update-primary-known-issue.mdx'
|
||||
@include 'known-issues/update-primary-data-loss.mdx'
|
||||
|
||||
## Feature deprecations and EOL
|
||||
|
||||
|
|
|
@ -171,8 +171,8 @@ Affects Vault 1.13.0+
|
|||
|
||||
@include 'perf-standby-token-create-forwarding-failure.mdx'
|
||||
|
||||
@include 'update-primary-known-issue.mdx'
|
||||
@include 'known-issues/update-primary-data-loss.mdx'
|
||||
|
||||
@include 'pki-double-migration-bug.mdx'
|
||||
|
||||
@include 'update-primary-addrs-panic.mdx'
|
||||
@include 'known-issues/update-primary-addrs-panic.mdx'
|
||||
|
|
|
@ -2,14 +2,7 @@
|
|||
|
||||
#### Affected versions
|
||||
|
||||
- All current versions of Vault
|
||||
|
||||
<Tip title="We are actively working on the underlying issue">
|
||||
|
||||
Look for **Fix a race condition with update-primary that could result in data
|
||||
loss after a DR failover.** in a future changelog for the resolution.
|
||||
|
||||
</Tip>
|
||||
All versions of Vault before 1.14.1, 1.13.5, 1.12.9, and 1.11.12.
|
||||
|
||||
#### Issue
|
||||
|
||||
|
|
|
@ -1,16 +0,0 @@
|
|||
### Using 'update_primary_addrs' on a demoted cluster causes Vault to panic ((#update-primary-addrs-panic))
|
||||
|
||||
#### Affected versions
|
||||
|
||||
- 1.13.3, 1.13.4 & 1.14.0
|
||||
|
||||
#### Issue
|
||||
|
||||
If the [`update_primary_addrs`](/vault/api-docs/system/replication/replication-performance#update_primary_addrs)
|
||||
parameter is used on a recently demoted cluster, Vault will panic due to no longer
|
||||
having information about the primary cluster.
|
||||
|
||||
#### Workaround
|
||||
|
||||
Instead of using `update_primary_addrs` on the recently demoted cluster, instead provide an
|
||||
[activation token](/vault/api-docs/system/replication/replication-performance#token-1).
|
|
@ -1,64 +0,0 @@
|
|||
### API calls to update-primary may lead to data loss ((#update-primary-data-loss))
|
||||
|
||||
#### Affected versions
|
||||
|
||||
- All current versions of Vault
|
||||
|
||||
<Tip title="We are actively working on the underlying issue">
|
||||
|
||||
Look for **Fix a race condition with update-primary that could result in data
|
||||
loss after a DR failover.** in a future changelog for the resolution.
|
||||
|
||||
</Tip>
|
||||
|
||||
#### Issue
|
||||
|
||||
The [update-primary](/vault/api-docs/system/replication/replication-performance#update-performance-secondary-s-primary)
|
||||
endpoint temporarily removes all mount entries except for those that are managed
|
||||
automatically by vault (e.g. identity mounts). In certain situations, a race
|
||||
condition between mount table truncation replication repairs may lead to data
|
||||
loss when updating secondary replication clusters.
|
||||
|
||||
Situations where the race condition may occur:
|
||||
|
||||
- **When the cluster has local data (e.g., PKI certificates, app role secret IDs)
|
||||
in shared mounts**.
|
||||
Calling `update-primary` on a performance secondary with local data in shared
|
||||
mounts may corrupt the merkle tree on the secondary. The secondary still
|
||||
contains all the previously stored data, but the corruption means that
|
||||
downstream secondaries will not receive the shared data and will interpret the
|
||||
update as a request to delete the information. If the downstream secondary is
|
||||
promoted before the merkle tree is repaired, the newly promoted secondary will
|
||||
not contain the expected local data. The missing data may be unrecoverable if
|
||||
the original secondary is is lost or destroyed.
|
||||
- **When the cluster has an `Allow` paths defined.**
|
||||
As of Vault 1.0.3.1, startup, unseal, and calling `update-primary` all trigger a
|
||||
background job that looks at the current mount data and removes invalid entries
|
||||
based on path filters. When a secondary has `Allow` path filters, the cleanup
|
||||
code may misfire in the windown of time after update-primary truncats the mount
|
||||
tables but before the mount tables are rewritten by replication. The cleanup
|
||||
code deletes data associated with the missing mount entries but does not modify
|
||||
the merkle tree. Because the merkle tree remains unchanged, replication will not
|
||||
know that the data is missing and needs to be repaired.
|
||||
|
||||
#### Workaround 1: PR secondary with local data in shared mounts
|
||||
|
||||
Watch for `cleaning key in merkle tree` in the TRACE log immediately after an
|
||||
update-primary call on a PR secondary to indicate the merkle tree may be
|
||||
corrupt. Repair the merkle tree by issuing a
|
||||
[replication reindex request](/vault/api-docs/system/replication#reindex-replication)
|
||||
to the PR secondary.
|
||||
|
||||
If TRACE logs are no longer available, we recommend pre-emptively reindexing the
|
||||
PR secondary as a precaution.
|
||||
|
||||
#### Workaround 2: PR secondary with "Allow" path filters
|
||||
|
||||
Watch for `deleted mistakenly stored mount entry from backend` in the INFO log.
|
||||
Reindex the performance secondary to update the merkle tree with the missing
|
||||
data and allow replication to disseminate the changes. **You will not be able to
|
||||
recover local data on shared mounts (e.g., PKI certificates)**.
|
||||
|
||||
If INFO logs are no longer available, query the shared mount in question to
|
||||
confirm whether your role and configuration data are present on the primary but
|
||||
missing from the secondary.
|
Loading…
Reference in New Issue