From da9cd4c878523298bf432da5f54221d07f45b527 Mon Sep 17 00:00:00 2001 From: Nick Cabatoff Date: Wed, 13 Sep 2023 08:14:42 -0400 Subject: [PATCH] Fix some duplication of partials, and add fix versions for update-primary data loss issue (#22182) (#23043) --- website/content/docs/release-notes/1.13.0.mdx | 2 +- .../docs/upgrading/upgrade-to-1.13.x.mdx | 6 +- .../known-issues/update-primary-data-loss.mdx | 11 +--- .../partials/update-primary-addrs-panic.mdx | 16 ----- .../partials/update-primary-known-issue.mdx | 64 ------------------- 5 files changed, 6 insertions(+), 93 deletions(-) delete mode 100644 website/content/partials/update-primary-addrs-panic.mdx delete mode 100644 website/content/partials/update-primary-known-issue.mdx diff --git a/website/content/docs/release-notes/1.13.0.mdx b/website/content/docs/release-notes/1.13.0.mdx index 088ea271d..8a6e343a5 100644 --- a/website/content/docs/release-notes/1.13.0.mdx +++ b/website/content/docs/release-notes/1.13.0.mdx @@ -117,7 +117,7 @@ The fix for this UI issue is coming in the Vault 1.13.1 release. @include 'perf-standby-token-create-forwarding-failure.mdx' -@include 'update-primary-known-issue.mdx' +@include 'known-issues/update-primary-data-loss.mdx' ## Feature deprecations and EOL diff --git a/website/content/docs/upgrading/upgrade-to-1.13.x.mdx b/website/content/docs/upgrading/upgrade-to-1.13.x.mdx index db3990364..f8df6376a 100644 --- a/website/content/docs/upgrading/upgrade-to-1.13.x.mdx +++ b/website/content/docs/upgrading/upgrade-to-1.13.x.mdx @@ -19,7 +19,7 @@ for Vault 1.13.x compared to 1.12. Please read it carefully. As of version 1.13, Vault will stop trying to validate user credentials if the user submits multiple invalid credentials in quick succession. During lockout, -Vault ignores requests from the barred user rather than responding with a +Vault ignores requests from the barred user rather than responding with a permission denied error. User lockout is enabled by default with a lockout threshold of 5 attempt, a @@ -171,8 +171,8 @@ Affects Vault 1.13.0+ @include 'perf-standby-token-create-forwarding-failure.mdx' -@include 'update-primary-known-issue.mdx' +@include 'known-issues/update-primary-data-loss.mdx' @include 'pki-double-migration-bug.mdx' -@include 'update-primary-addrs-panic.mdx' \ No newline at end of file +@include 'known-issues/update-primary-addrs-panic.mdx' diff --git a/website/content/partials/known-issues/update-primary-data-loss.mdx b/website/content/partials/known-issues/update-primary-data-loss.mdx index 6518c8d1b..955b798cc 100644 --- a/website/content/partials/known-issues/update-primary-data-loss.mdx +++ b/website/content/partials/known-issues/update-primary-data-loss.mdx @@ -2,14 +2,7 @@ #### Affected versions -- All current versions of Vault - - - -Look for **Fix a race condition with update-primary that could result in data -loss after a DR failover.** in a future changelog for the resolution. - - +All versions of Vault before 1.14.1, 1.13.5, 1.12.9, and 1.11.12. #### Issue @@ -17,7 +10,7 @@ The [update-primary](/vault/api-docs/system/replication/replication-performance# endpoint temporarily removes all mount entries except for those that are managed automatically by vault (e.g. identity mounts). In certain situations, a race condition between mount table truncation replication repairs may lead to data -loss when updating secondary replication clusters. +loss when updating secondary replication clusters. Situations where the race condition may occur: diff --git a/website/content/partials/update-primary-addrs-panic.mdx b/website/content/partials/update-primary-addrs-panic.mdx deleted file mode 100644 index d7e63828e..000000000 --- a/website/content/partials/update-primary-addrs-panic.mdx +++ /dev/null @@ -1,16 +0,0 @@ -### Using 'update_primary_addrs' on a demoted cluster causes Vault to panic ((#update-primary-addrs-panic)) - -#### Affected versions - -- 1.13.3, 1.13.4 & 1.14.0 - -#### Issue - -If the [`update_primary_addrs`](/vault/api-docs/system/replication/replication-performance#update_primary_addrs) -parameter is used on a recently demoted cluster, Vault will panic due to no longer -having information about the primary cluster. - -#### Workaround - -Instead of using `update_primary_addrs` on the recently demoted cluster, instead provide an -[activation token](/vault/api-docs/system/replication/replication-performance#token-1). \ No newline at end of file diff --git a/website/content/partials/update-primary-known-issue.mdx b/website/content/partials/update-primary-known-issue.mdx deleted file mode 100644 index f1514d808..000000000 --- a/website/content/partials/update-primary-known-issue.mdx +++ /dev/null @@ -1,64 +0,0 @@ -### API calls to update-primary may lead to data loss ((#update-primary-data-loss)) - -#### Affected versions - -- All current versions of Vault - - - -Look for **Fix a race condition with update-primary that could result in data -loss after a DR failover.** in a future changelog for the resolution. - - - -#### Issue - -The [update-primary](/vault/api-docs/system/replication/replication-performance#update-performance-secondary-s-primary) -endpoint temporarily removes all mount entries except for those that are managed -automatically by vault (e.g. identity mounts). In certain situations, a race -condition between mount table truncation replication repairs may lead to data -loss when updating secondary replication clusters. - -Situations where the race condition may occur: - -- **When the cluster has local data (e.g., PKI certificates, app role secret IDs) - in shared mounts**. - Calling `update-primary` on a performance secondary with local data in shared - mounts may corrupt the merkle tree on the secondary. The secondary still - contains all the previously stored data, but the corruption means that - downstream secondaries will not receive the shared data and will interpret the - update as a request to delete the information. If the downstream secondary is - promoted before the merkle tree is repaired, the newly promoted secondary will - not contain the expected local data. The missing data may be unrecoverable if - the original secondary is is lost or destroyed. -- **When the cluster has an `Allow` paths defined.** - As of Vault 1.0.3.1, startup, unseal, and calling `update-primary` all trigger a - background job that looks at the current mount data and removes invalid entries - based on path filters. When a secondary has `Allow` path filters, the cleanup - code may misfire in the windown of time after update-primary truncats the mount - tables but before the mount tables are rewritten by replication. The cleanup - code deletes data associated with the missing mount entries but does not modify - the merkle tree. Because the merkle tree remains unchanged, replication will not - know that the data is missing and needs to be repaired. - -#### Workaround 1: PR secondary with local data in shared mounts - -Watch for `cleaning key in merkle tree` in the TRACE log immediately after an -update-primary call on a PR secondary to indicate the merkle tree may be -corrupt. Repair the merkle tree by issuing a -[replication reindex request](/vault/api-docs/system/replication#reindex-replication) -to the PR secondary. - -If TRACE logs are no longer available, we recommend pre-emptively reindexing the -PR secondary as a precaution. - -#### Workaround 2: PR secondary with "Allow" path filters - -Watch for `deleted mistakenly stored mount entry from backend` in the INFO log. -Reindex the performance secondary to update the merkle tree with the missing -data and allow replication to disseminate the changes. **You will not be able to -recover local data on shared mounts (e.g., PKI certificates)**. - -If INFO logs are no longer available, query the shared mount in question to -confirm whether your role and configuration data are present on the primary but -missing from the secondary. \ No newline at end of file