remove redundant vault api retry logic
We upgraded Vault API module version to a version that has built-in
retry logic. So this code is no longer necessary.
Also add mention of re-configuring the provider in comments.
fix goroutine leak in renew testing
Test overwrote the stopWatcher() function variable for the test without
keeping and calling the original value. The original value is the
function that stops the goroutine... so it needs to be called.
All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date.
One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.
The fix outlined and merged in #15253 fixed the issue as it occurs in the primary
DC. There is a similar issue that arises when vault is used as the Connect CA in a
secondary datacenter that is fixed by this PR.
Additionally: this PR adds support to run the existing suite of vault related integration
tests against the last 4 versions of vault (1.9, 1.10, 1.11, 1.12)
Consul used to rely on implicit issuer selection when calling Vault endpoints to issue new CSRs. Vault 1.11+ changed that behavior, which caused Consul to check the wrong (previous) issuer when renewing its Intermediate CA. This patch allows Consul to explicitly set a default issuer when it detects that the response from Vault is 1.11+.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* Fix leaked Vault LifetimeRenewers
When the Vault CA Provider is reconfigured we do not stop the
LifetimeRenewers which can cause them to leak until the Consul processes
recycles. On Configure execute stopWatcher if it exists and is not nil
before starting a new renewal
* Add jitter before restarting the LifetimeWatcher
If we fail to login to Vault or our token is no longer valid we can
overwhelm a Vault instance with many requests very quickly by restarting
the LifetimeWatcher. Before restarting the LifetimeWatcher provide a
backoff time of 1 second or less.
* Use a retry.Waiter instead of RandomStagger
* changelog
* gofmt'd
* Swap out bool for atomic.Unit32 in test
* Provide some extra clarification in comment and changelog
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
* Support vault auth methods for the Vault connect CA provider
* Rotate the token (re-authenticate to vault using auth method) when the token can no longer be renewed
* Support Vault Namespaces explicitly in CA config
If there is a Namespace entry included in the Vault CA configuration,
set it as the Vault Namespace on the Vault client
Currently the only way to support Vault namespaces in the Consul CA
config is by doing one of the following:
1) Set the VAULT_NAMESPACE environment variable which will be picked up
by the Vault API client
2) Prefix all Vault paths with the namespace
Neither of these are super pleasant. The first requires direct access
and modification to the Consul runtime environment. It's possible and
expected, not super pleasant.
The second requires more indepth knowledge of Vault and how it uses
Namespaces and could be confusing for anyone without that context. It
also infers that it is not supported
* Add changelog
* Remove fmt.Fprint calls
* Make comment clearer
* Add next consul version to website docs
* Add new test for default configuration
* go mod tidy
* Add skip if vault not present
* Tweak changelog text
* add root_cert_ttl option for consul connect, vault ca providers
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* add changelog, pr feedback
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Update .changelog/11428.txt, more docs
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update website/content/docs/agent/options.mdx
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* trim carriage return from certificates when inserting rootCA in the inMemDB
* format rootCA properly when returning the CA on the connect CA endpoint
* Fix linter warnings
* Fix providers to trim certs before returning it
* trim newlines on write when possible
* add changelog
* make sure all provider return a trailing newline after the root and intermediate certs
* Fix endpoint to return trailing new line
* Fix failing test with vault provider
* make test more robust
* make sure all provider return a trailing newline after the leaf certs
* Check for suffix before removing newline and use function
* Add comment to consul provider
* Update change log
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix typo
* simplify code callflow
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* extract requireNewLine as shared func
* remove dependency to testify in testing file
* remove extra newline in vault provider
* Add cert newline fix to envoy xds
* remove new line from mock provider
* Remove adding a new line from provider and fix it when the cert is read
* Add a comment to explain the fix
* Add missing for leaf certs
* fix missing new line
* fix missing new line in leaf certs
* remove extra new line in test
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix in vault provider and when reading cache (RPC call)
* fix AWS provider
* fix failing test in the provider
* remove comments and empty lines
* add check for empty cert in test
* fix linter warnings
* add new line for leaf and private key
* use string concat instead of Sprintf
* fix new lines for leaf signing
* preallocate slice and remove append
* Add new line to `SignIntermediate` and `CrossSignCA`
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
After fixing that bug I uncovered a couple more:
Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
To reduce the chance of some tests not being run because it does not
match the regex passed to '-run'.
Also document why some tests are allowed to be skipped on CI.
If the CI environment is not correct for running tests the tests
should fail, so that we don't accidentally stop running some tests
because of a change to our CI environment.
Also removed a duplicate delcaration from init. I believe one was
overriding the other as they are both in the same package.
* Change CA Configure struct to pass Datacenter through
* Remove connect/ca/plugin as we don't have immediate plans to use it.
We still intend to one day but there are likely to be several changes to the CA provider interface before we do so it's better to rebuild from history when we do that work properly.
* Rename PrimaryDC; fix endpoint in secondary DCs
* pass logger through to provider
* test for proper operation of NeedsLogger
* remove public testServer function
* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault
* Fix all the other places in tests where we set the logger
* Allow CA Providers to persist some state
* Update CA provider plugin interface
* Fix plugin stubs to match provider changes
* Update agent/connect/ca/provider.go
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
* Cleanup review comments
* add NeedsLogger to Provider interface
* implements NeedsLogger in default provider
* pass logger through to provider
* test for proper operation of NeedsLogger
* remove public testServer function
* Switch test to actually assert on logging output rather than reflection.
--amend
* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault
* Fix all the other places in tests where we set the logger
* Add TODO comment
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.
* Ensure key type ad bits are populated from CA cert and clean up tests
* Add integration test and fix error when initializing secondary CA with RSA key.
* Add more tests, fix review feedback
* Update docs with key type config and output
* Apply suggestions from code review
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
This only affects vault versions >=1.1.1 because the prior code
accidentally relied upon a bug that was fixed in
https://github.com/hashicorp/vault/pull/6505
The existing tests should have caught this, but they were using a
vendored copy of vault version 0.10.3. This fixes the tests by running
an actual copy of vault instead of an in-process copy. This has the
added benefit of changing the dependency on vault to just vault/api.
Also update VaultProvider to use similar SetIntermediate validation code
as the ConsulProvider implementation.
All these changes should have no side-effects or change behavior:
- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
* Fix CA pruning when CA config uses string durations.
The tl;dr here is:
- Configuring LeafCertTTL with a string like "72h" is how we do it by default and should be supported
- Most of our tests managed to escape this by defining them as time.Duration directly
- Out actual default value is a string
- Since this is stored in a map[string]interface{} config, when it is written to Raft it goes through a msgpack encode/decode cycle (even though it's written from server not over RPC).
- msgpack decode leaves the string as a `[]uint8`
- Some of our parsers required string and failed
- So after 1 hour, a default configured server would throw an error about pruning old CAs
- If a new CA was configured that set LeafCertTTL as a time.Duration, things might be OK after that, but if a new CA was just configured from config file, intialization would cause same issue but always fail still so would never prune the old CA.
- Mostly this is just a janky error that got passed tests due to many levels of complicated encoding/decoding.
tl;dr of the tl;dr: Yay for type safety. Map[string]interface{} combined with msgpack always goes wrong but we somehow get bitten every time in a new way :D
We already fixed this once! The main CA config had the same problem so @kyhavlov already wrote the mapstructure DecodeHook that fixes it. It wasn't used in several places it needed to be and one of those is notw in `structs` which caused a dependency cycle so I've moved them.
This adds a whole new test thta explicitly tests the case that broke here. It also adds tests that would have failed in other places before (Consul and Vaul provider parsing functions). I'm not sure if they would ever be affected as it is now as we've not seen things broken with them but it seems better to explicitly test that and support it to not be bitten a third time!
* Typo fix
* Fix bad Uint8 usage