The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.
This same bug was fixed in two other places back in 2019, but this last one was
missed.
While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.
There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
While working on the CA system it is important to be able to run all the
tests related to the system, without having to wait for unrelated tests.
There are many slow and unrelated tests in agent/consul, so we need some
way to filter to only the relevant tests.
This PR renames all the CA system related tests to start with either
`TestCAMananger` for tests of internal operations that don't have RPC
endpoint, or `TestConnectCA` for tests of RPC endpoints. This allows us
to run all the test with:
go test -run 'TestCAMananger|TestConnectCA' ./agent/consul
The test naming follows an undocumented convention of naming tests as
follows:
Test[<struct name>_]<function name>[_<test case description>]
I tried to always keep Primary/Secondary at the end of the description,
and _Vault_ has to be in the middle because of our regex to run those
tests as a separate CI job.
You may notice some of the test names changed quite a bit. I did my best
to identify the underlying method being tested, but I may have been
slightly off in some cases.
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.
So for now, move this logic to the CAMananger, so that it is always correct.
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.
Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.
https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.
This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.
This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
In d2ab767fef21244e9fe3b9887ea70fc177912381 raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.
This commit removes the remaining resp.(error) checks, since they are all no-ops now.
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.
This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.
The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.
This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.
Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused operations_ent.go and operations_oss.go func
* convert `indexNodeCheck` of `session_checks` table
* partition `indexID` and `indexSession` of `tableSessionChecks`
* remove partition for Checks as it's always use the session partition
* partition sessions index id table
* fix rebase issues
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused operations_ent.go and operations_oss.go func
* remove unused const
* convert `IndexID` of `session_checks` table
* convert `indexSession` of `session_checks` table
* convert `indexNodeCheck` of `session_checks` table
* partition `indexID` and `indexSession` of `tableSessionChecks`
* fix oss linter
* fix review comments
* remove partition for Checks as it's always use the session partition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.
The first restriction is because the config entry is not supported in
secondary datacenters.
The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.
There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused func `prefixIndexFromServiceNameAsString`
* fix test error check
* remove unused operations_ent.go and operations_oss.go func
* remove unused const
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* partition kvs indexID table
* add `partitionedIndexEntryName` in oss for test purpose
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* remove entmeta reference from oss
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.
To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
* add root_cert_ttl option for consul connect, vault ca providers
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* add changelog, pr feedback
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Update .changelog/11428.txt, more docs
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update website/content/docs/agent/options.mdx
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).
We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.
Also renames the closure to be more descriptive.
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.
By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.
Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.
I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.
The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.
Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.
This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.
Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
The code for this was already removed, which suggests this is not actually testing what it claims.
I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are removed, nothing was calling stop.
Also remove an unused method from client.
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.
This is a step toward eventually creating separate types and moving these methods off of CAManager.
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
failing with ACL not found.
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.
We continue to write the tag so that during an upgarde older servers will see the tag.
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
failing with ACL not found.
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.
This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.
This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.
Also remove the ACLRequest struct, which is no longer referenced.
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.
This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
In preparation for removing ACL.Apply.
Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.
The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
structs.ACLForceSet was deprecated 4 years ago, it should be safe to remove now.
ACLBootstrapNow was removed in a recent commit. While it is technically possible that a cluster with mixed version
could still attempt a legacy boostrap, we documented that the legacy system was deprecated in 1.4, so no
clusters that are being upgraded should be attempting a legacy boostrap.
* Port consul-enterprise #1123 to OSS
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Fixup missing query field
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* change to re-trigger ci system
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* move intFromBool to be available for oss
* add expiry indexes
* remove dead code: `TokenExpirationIndex`
* fix remove indexer `TokenExpirationIndex`
* fix rebase issue
* convert `Roles` index to use `indexerSingle`
* split authmethod write indexer to oss and ent
* add index locality
* add locality unit tests
* move intFromBool to be available for oss
* use Bool func
* refactor `aclTokenList` to merge func
Some users are defining routing configurations that do not have associated services. This commit surfaces these configs in the topology visualization. Also fixes a minor internal bug with non-transparent proxy upstream/downstream references.
1) xds and grpc servers:
1.1) to use recovery middleware with callback that prints stack trace to log
1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
Remove the error return, so that not handling is not reported as an
error by errcheck. It was returning the error passed as an arg
unmodified so there is no reason to return the same value that was
passed in.
Remove the term upstreams to remove any confusion with the term used in
service mesh.
Remove the AutoDisable field, and replace it with the TTL value, using 0
to indicate the setting is turned off.
Replace "not Before" with "After".
Add some test coverage to show the behaviour is still correct.
This field was never user-configurable. We always overwrote the value with 120s from
NonUserSource. However, we also never copied the value from RuntimeConfig to consul.Config,
So the value in NonUserSource was always ignored, and we used the default value of 30s
set by consul.DefaultConfig.
All of this code is an unnecessary distraction because a user can not actually configure
this value.
This commit removes the fields and uses a constant value instad. Someone attempting to set
acl.disabled_ttl in their config will now get an error about an unknown field, but previously
the value was completely ignored, so the new behaviour seems more correct.
We have to keep this field in the AutoConfig response for backwards compatibility, but the value
will be ignored by the client, so it doesn't really matter what value we set.
Tests only specified one of the fields, but in production we copy the
value from a single place, so we can do the same in tests.
The AutoConfig test broke because of the problem noticed in a previous
commit. The DisabledTTL is not wired up properly so it reports 0s here.
Changed the test to use an explicit value.
Follow up to https://github.com/hashicorp/consul/pull/10737#discussion_r682147950
Renames all variables for acl.Authorizer to use `authz`. Previously some
places used `rule` which I believe was an old name carried over from the
legacy ACL system.
A couple places also used authorizer.
This commit also removes another couple of authorizer nil checks that
are no longer necessary.
The constructor for Server is not at all the appropriate place to be setting default
values for a config struct that was passed in.
In production this value is always set from agent/config. In tests we should set the
default in a test helper.
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.
This change was made by GoLand refactor, which did most of the work for me.
This method suffered from similar naming to a couple other methods on Server, and had not great
re-use (2 callers). By copying a few of the lines into one of the callers we can move the
implementation into the second caller.
Once moved, we can see that ResolveTokenAndDefaultMeta is identical in both Client and Server, and
likely should be further refactored, possibly into ACLResolver.
This change is being made to make ACL resolution easier to trace.
This method was an alias for ACLResolver.ResolveTokenToIdentityAndAuthorizer. By removing the
method that does nothing the code becomes easier to trace.
ACL filtering only needs an authorizer and a logger. We can decouple filtering from
the ACLResolver by passing in the necessary logger.
This change is being made in preparation for moving the ACLResolver into an acl package
filterACLWithAuthorizer could never return an error. This change moves us a little bit
closer to being able to enable errcheck and catch problems caused by unhandled error
return values.
These functions are moved to the one place they are called to improve code locality.
They are being moved out of agent/consul/acl.go in preparation for moving
ACLResolver to an acl package.
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.
This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Follow up to: https://github.com/hashicorp/consul/pull/10738#discussion_r680190210
Previously we were passing an Authorizer that would always allow the
operation, then later checking the authorization using vetServiceTxnOp.
On the surface this seemed strange, but I think it was actually masking
a bug as well. Over time `servicePreApply` was changed to add additional
authorization for `service.Proxy.DestinationServiceName`, but because
we were passing a nil Authorizer, that authorization was not handled on
the txn_endpoint.
`TxnServiceOp.FillAuthzContext` has some special handling in enterprise,
so we need to make sure to continue to use that from the Txn endpoint.
This commit removes the `vetServiceTxnOp` function, and passes in the
`FillAuthzContext` function so that `servicePreApply` can be used by
both the catalog and txn endpoints. This should be much less error prone
and prevent bugs like this in the future.
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted
Also improve the logging by including the metric name in the log message
* fix state index for `CAOpSetRootsAndConfig` op
* add changelog
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* remove the change log as it's not needed
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
These checks were a bit more involved. They were previously skipping some code paths
when the authorizer was nil. After looking through these it seems correct to remove the
authz == nil check, since it will never evaluate to true.
These case are already impossible conditions, because most of these functions already start
with a check for ACLs being disabled. So the code path being removed could never be reached.
The one other case (ConnectAuthorized) was already changed in a previous commit. This commit
removes an impossible branch because authz == nil can never be true.
* return an error when the index is not valid
* check response as bool when applying `CAOpSetConfig`
* remove check for bool response
* fix error message and add check to test
* fix comment
* add changelog
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.
This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
* ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* ca: move SignCertificate to the file where it is used
* auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
* fix linter issues
* check error when `raftApplyMsgpack`
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* check expiry date of the intermediate before using it to sign a leaf
* fix typo in comment
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* Fix test name
* do not check cert start date
* wrap error to mention it is the intermediate expired
* Fix failing test
* update comment
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use shim to avoid sleep in test
* add root cert validation
* remove duplicate code
* Revert "fix linter issues"
This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.
* fix import issue
* gofmt leader_connect_ca
* add changelog entry
* update error message
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix error message in test
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
raftApply was removed so ApplyCARequest needs to handle all the possible operations
Also set the providerShim to use the mock provider.
other changes are small test improvements that were necessary to debug the failures.
and small refactor to getCAProvider so that GoLand is less confused about what it is doing.
Previously it was reporting that the for condition was always true, which was not the case.
After moving ca.ConsulProviderStateDelegate into the interface we now
have the ApplyCARequest method which does the same thing. Use this more
specific method instead of raftApply.
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.
Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.
Instead of copying the fields around, use the tlsutil.Config struct
directly instead.
This is one small step in removing the many layers of duplicate
configuration.
* add intermediate ca metric routine
* add Gauge config for intermediate cert
* Stop metrics routine when stopping leader
* add changelog entry
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use variables instead of a map
* go imports sort
* Add metrics for primary and secondary ca
* start metrics routine in the right DC
* add telemetry documentation
* update docs
* extract expiry fetching in a func
* merge metrics for primary and secondary into signing ca metric
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* trim carriage return from certificates when inserting rootCA in the inMemDB
* format rootCA properly when returning the CA on the connect CA endpoint
* Fix linter warnings
* Fix providers to trim certs before returning it
* trim newlines on write when possible
* add changelog
* make sure all provider return a trailing newline after the root and intermediate certs
* Fix endpoint to return trailing new line
* Fix failing test with vault provider
* make test more robust
* make sure all provider return a trailing newline after the leaf certs
* Check for suffix before removing newline and use function
* Add comment to consul provider
* Update change log
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* fix typo
* simplify code callflow
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* extract requireNewLine as shared func
* remove dependency to testify in testing file
* remove extra newline in vault provider
* Add cert newline fix to envoy xds
* remove new line from mock provider
* Remove adding a new line from provider and fix it when the cert is read
* Add a comment to explain the fix
* Add missing for leaf certs
* fix missing new line
* fix missing new line in leaf certs
* remove extra new line in test
* updage changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* fix in vault provider and when reading cache (RPC call)
* fix AWS provider
* fix failing test in the provider
* remove comments and empty lines
* add check for empty cert in test
* fix linter warnings
* add new line for leaf and private key
* use string concat instead of Sprintf
* fix new lines for leaf signing
* preallocate slice and remove append
* Add new line to `SignIntermediate` and `CrossSignCA`
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
The query metrics are actually reported for all read queries, not only
ones that use a MinIndex to block for updates.
Also clarify the raft.apply metric is only on the leader.
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
With an optional interface that providers can use to indicate if they
use an intermediate cert in the primary DC.
This removes the need to look up the provider config when renewing the
intermediate.
CatalogDestinationsOnly is a passthrough that would enable dialing
addresses outside of Consul's catalog. However, when this flag is set to
true only _connect_ endpoints for services can be dialed.
This flag is being renamed to signal that non-Connect endpoints can't be
dialed by transparent proxies when the value is set to true.
Both NextLink and NextNoBlock had the same logic, with slightly
different return values. By adding a bool return value (similar to map
lookups) we can remove the duplicate method.
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.
Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).
The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.
This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.
This allows us to use a lock around the map of resolvers, preventing the data race.
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.
- use constants for both the prefix and feature flag name. A constant
makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
that the feature flag prefix can be unexported. Only expose a function
for encoding the flags into tags. This logic is now next to the logic
which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
and Server structs had the same implementation. And neither
implementation needed the method receiver.
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.
This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.
This would lead to strange behaviour which apparently confused the 'go test' runner.
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.
One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.