Commit Graph

1840 Commits

Author SHA1 Message Date
Daniel Nephin 0abaf29c10 ca: add test cases for rotating external trusted CA 2022-02-17 18:21:30 -05:00
Daniel Nephin aacc40012f ca: add a test for secondary with external CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 471b2098bb ca: examine the full chain in newCARoot
make TestNewCARoot much more strict
compare the full result instead of only a few fields.
add a test case with 2 and 3 certificates in the pem
2022-02-17 18:21:30 -05:00
Daniel Nephin 2d5254a73b
Merge pull request #12110 from hashicorp/dnephin/blocking-queries-not-found
rpc: make blocking queries for non-existent items more efficient
2022-02-17 18:09:39 -05:00
Florian Apolloner 895da50986
Support for connect native services in topology view. (#12098) 2022-02-16 16:51:54 -05:00
Chris S. Kim 18096fd2fb
Move IndexEntryName helpers to common files (#12365) 2022-02-16 12:56:38 -05:00
Daniel Nephin 06657e5be0 rpc: add errNotFound to all Get queries
Any query that returns a list of items is not part of this commit.
2022-02-15 18:24:34 -05:00
Daniel Nephin bdafa24c50 Make blockingQuery efficient with 'not found' results.
By using the query results as state.

Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.

Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.

This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.

This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
2022-02-15 18:24:33 -05:00
Daniel Nephin 6e73df7dc2 Add a test for blocking query on non-existent entry
This test shows how blocking queries are not efficient when the query
returns no results.  The test fails with 100+ calls instead of the
expected 2.

This test is still a bit flaky because it depends on the timing of the
writes. It can sometimes return 3 calls.

A future commit should fix this and make blocking queries even more
optimal for not-found results.
2022-02-15 18:23:17 -05:00
Daniel Nephin a4e1c59cd8 rpc: improve docs for blockingQuery
Follow the Go convention of accepting a small interface that documents
the methods used by the function.

Clarify the rules for implementing a query function passed to
blockingQuery.
2022-02-15 14:20:14 -05:00
R.B. Boyer b216d52b66
server: conditionally avoid writing a config entry to raft if it was already the same (#12321)
This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
2022-02-14 14:39:12 -06:00
FFMMM 1f8fb17be7
Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311)
This commit syncs ENT changes to the OSS repo.

Original commit details in ENT:

```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date:   Thu Feb 10 10:23:33 2022 -0800

    Vendor fork net rpc (#1538)

    * replace net/rpc w consul-net-rpc/net/rpc

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * replace msgpackrpc and go-msgpack with fork from mono repo

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * gofmt all files touched

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-02-14 09:45:45 -08:00
Mark Anderson fa95afdcf6 Refactor to make ACL errors more structured. (#12308)
* First phase of refactoring PermissionDeniedError

Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.

It may be worth folding the test and error return into a single helper
function, that can happen at a later date.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-02-11 12:53:23 -08:00
Freddy f45bec7779
Merge pull request #12223 from hashicorp/proxycfg/passthrough-cleanup 2022-02-10 17:35:51 -07:00
freddygv 7fba7456ec Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
Daniel Nephin db4675bd1a
Merge pull request #12277 from hashicorp/dnephin/panic-in-service-register
catalog: initialize the refs map to prevent a nil panic
2022-02-09 19:48:22 -05:00
Daniel Nephin 6376141464 config-entry: fix a panic when registering a service or ingress gateway 2022-02-09 18:49:48 -05:00
Daniel Nephin c20412ab14
Merge pull request #12265 from hashicorp/dnephin/logging-in-tests
sdk: add TestLogLevel for setting log level in tests
2022-02-07 16:11:23 -05:00
Daniel Nephin 5a0e6700c1 A test to reproduce the issue 2022-02-04 14:04:12 -05:00
Daniel Nephin 7b466a024b Make test more readable
And fix typo
2022-02-03 18:44:09 -05:00
Daniel Nephin 6721c1246d ca: relax and move private key type/bit validation for vault
This commit makes two changes to the validation.

Previously we would call this validation in GenerateRoot, which happens
both on initialization (when a follower becomes leader), and when a
configuration is updated. We only want to do this validation during
config update so the logic was moved to the UpdateConfiguration
function.

Previously we would compare the config values against the actual cert.
This caused problems when the cert was created manually in Vault (not
created by Consul).  Now we compare the new config against the previous
config. Using a already created CA cert should never error now.

Adding the key bit and types to the config should only error when
the previous values were not the defaults.
2022-02-03 17:21:20 -05:00
Daniel Nephin 3b78f81f9a ca: small cleanup of TestConnectCAConfig_Vault_TriggerRotation_Fails
Before adding more test cases
2022-02-03 17:21:20 -05:00
Daniel Nephin f6d7a0f7b2 testing: fix test failures caused by new log level
These two tests require debug logging enabled, because they look for log lines.

Also switched to testify assertions because the previous errors were not clear.
2022-02-03 17:07:39 -05:00
Daniel Nephin 1a9a656a7f sdk: add TestLogLevel for setting log level in tests
And default log level to WARN.
2022-02-03 13:42:28 -05:00
Daniel Nephin 44f9229b96 ca: add a test that uses an intermediate CA as the primary CA
This test found a bug in the secondary. We were appending the root cert
to the PEM, but that cert was already appended. This was failing
validation in Vault here:
https://github.com/hashicorp/vault/blob/sdk/v0.3.0/sdk/helper/certutil/types.go#L329

Previously this worked because self signed certs have the same
SubjectKeyID and AuthorityKeyID. So having the same self-signed cert
repeated doesn't fail that check.

However with an intermediate that is not self-signed, those values are
different, and so we fail the check. A test I added in a previous commit
should show that this continues to work with self-signed root certs as
well.
2022-02-02 13:41:35 -05:00
Daniel Nephin d00a9abca2 acl: un-embed ACLIdentity
This is safer than embedding two interface because there are a number of
places where we check the concrete type. If we check the concrete type
on the top-level interface it will fail. So instead expose the
ACLIdentity from a method.
2022-02-02 12:07:31 -05:00
Daniel Nephin 18ff00f985
Merge pull request #12167 from hashicorp/dnephin/acl-resolve-token-3
acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
2022-01-31 19:21:06 -05:00
Daniel Nephin ff64c13c3e
Merge pull request #12166 from hashicorp/dnephin/acl-resolve-token-2
acl: remove ResolveTokenToIdentity
2022-01-31 19:19:21 -05:00
Daniel Nephin aa4dbe2a17 acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
This change allows us to remove one of the last remaining duplicate
resolve token methods (Server.ResolveToken).

With this change we are down to only 2, where the second one also
handles setting the default EnterpriseMeta from the token.
2022-01-31 18:04:19 -05:00
Daniel Nephin 57eac90cae acl: remove unused methods on fakes, and add changelog
Also document the metric that was removed in a previous commit.
2022-01-31 17:53:53 -05:00
Daniel Nephin 1fb2d49826
Merge pull request #12165 from hashicorp/dnephin/acl-resolve-token
acl: remove some of the duplicate resolve token methods
2022-01-31 13:27:49 -05:00
Dan Upton ebdda4848f
streaming: split event buffer by key (#12080) 2022-01-28 12:27:00 +00:00
Daniel Nephin fa8ff28a63 ca/provider: remove ActiveRoot from Provider 2022-01-27 13:07:37 -05:00
Daniel Nephin d56a1dfb2c
Merge pull request #11663 from hashicorp/dnephin/ca-remove-one-call-to-active-root-2
ca: remove second call to Provider.ActiveRoot
2022-01-27 12:41:05 -05:00
Daniel Nephin d3324d0d27
Merge pull request #12109 from hashicorp/dnephin/blocking-query-1
rpc: make blockingQuery easier to read
2022-01-26 18:13:55 -05:00
Daniel Nephin 74dc9925cc Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-01-26 12:24:13 -05:00
Daniel Nephin 2c311161cc acl: extract a backend type for the ACLResolverBackend
This is a small step to isolate the functionality that is used for the
ACLResolver from the large Client and Server structs.
2022-01-26 12:24:10 -05:00
Daniel Nephin c1da07e2ea acl: remove calls to ResolveIdentityFromToken
We already have an ACLResolveResult, so we can get the accessor ID from
it.
2022-01-22 15:05:42 -05:00
Daniel Nephin ed1cc5f255 acl: remove ResolveTokenToIdentity
By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
2022-01-22 14:47:59 -05:00
Daniel Nephin 26f0ebd96f acl: return a resposne from ResolveToken that includes the ACLIdentity
So that we can duplicate duplicate methods.
2022-01-22 14:33:09 -05:00
Daniel Nephin 314614f073 acl: remove duplicate methods
Now that ACLResolver is embedded we don't need ResolveTokenToIdentity on
Client and Server.

Moving ResolveTokenAndDefaultMeta to ACLResolver removes the duplicate
implementation.
2022-01-22 14:12:08 -05:00
Daniel Nephin 62c09b2d0a acl: embed ACLResolver in Client and Server
In preparation for removing duplicate resolve token methods.
2022-01-22 14:07:26 -05:00
R.B. Boyer 05c7373a28 bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
R.B. Boyer c12b0ee3d2 test: normalize require.New and assert.New syntax 2022-01-20 10:45:56 -06:00
Dan Upton 088ba2edaf
[OSS] Remove remaining references to master (#11827) 2022-01-20 12:47:50 +00:00
Daniel Nephin 59206e38c7 rpc: cleanup exit and blocking condition logic in blockingQuery
Remove some unnecessary comments around query_blocking metric. The only
line that needs any comments in the atomic decrement.

Cleanup the block and return comments and logic. The old comment about
AbandonCh may have been relevant before, but it is expected behaviour
now.

The logic was simplified by inverting the err condition.
2022-01-17 16:59:25 -05:00
Daniel Nephin a28d1268cb rpc: extract rpcQueryTimeout method
This helps keep the logic in blockingQuery more focused. In the future we
may have a separate struct for RPC queries which may allow us to move this
off of Server.
2022-01-17 16:59:25 -05:00
Daniel Nephin 751bc2e7d3 rpc: move the index defaulting to setQueryMeta.
This safeguard should be safe to apply in general. We are already
applying it to non-blocking queries that call blockingQuery, so it
should be fine to apply it to others.
2022-01-17 16:59:25 -05:00
Daniel Nephin 95e471052b rpc: add subtests to blockingQuery test 2022-01-17 16:59:25 -05:00
Daniel Nephin 6bf8efe607 rpc: refactor blocking query
To remove the TODO, and make it more readable.

In general this reduces the scope of variables, making them easier to reason about.
It also introduces more early returns so that we can see the flow from the structure
of the function.
2022-01-17 16:58:47 -05:00
Daniel Nephin 1971a58b29
Merge pull request #11661 from hashicorp/dnephin/ca-remove-one-call-to-active-root
ca: remove one call to Provider.ActiveRoot
2022-01-13 16:48:12 -05:00
Kyle Havlovitz 2ba76486d0 Add virtual IP generation for term gateway backed services 2022-01-12 12:08:49 -08:00
Daniel Nephin 262898e561 ca: remove unnecessary var, and slightly reduce cyclo complexity
`newIntermediate` is always equal to `needsNewIntermediate`, so we can
remove the extra variable and use the original directly.

Also remove the `activeRoot.ID != newActiveRoot.ID` case from an if,
because that case is already checked above, and `needsNewIntermediate` will
already be true in that case.

This condition now reads a lot better:

> Persist a new root if we did not have one before, or if generated a new intermediate.
2022-01-06 16:56:49 -05:00
Daniel Nephin d406f78c5c ca: remove unused provider.ActiveRoot call
In the previous commit the single use of this storedRoot was removed.

In this commit the original objective is completed. The
Provider.ActiveRoot is being removed because

1. the secondary should get the active root from the Consul primary DC,
   not the provider, so that secondary DCs do not need to communicate
   with a provider instance in a different DC.
2. so that the Provider.ActiveRoot interface can be changed without
   impacting other code paths.
2022-01-06 16:56:48 -05:00
Daniel Nephin 4d15e8a9ec ca: extract the lookup of the active primary CA
This method had only one caller, which always looked for the active
root. This commit moves the lookup into the method to reduce the logic
in the one caller.

This is being done in preparation for a larger change. Keeping this
separate so it is easier to see.

The `storedRootID != primaryRoots.ActiveRootID` is being removed because
these can never be different.

The `storedRootID` comes from `provider.ActiveRoot`, the
`primaryRoots.ActiveRootID` comes from the store `CARoot` from the
primary. In both cases the source of the data is the primary DC.

Technically they could be different if someone modified the provider
outside of Consul, but that would break many things, so is not a
supported flow.

If these were out of sync because of ordering of events then the
secondary will soon receive an update to `primaryRoots` and everything
will be sorted out again.
2022-01-06 16:56:48 -05:00
Daniel Nephin 37b09df427 ca: update godoc
To clarify what to expect from the data stored in this field, and the
behaviour of this function.
2022-01-06 16:56:48 -05:00
Daniel Nephin 1f670c22f5 ca: remove one call to provider.ActiveRoot
ActiveRoot should not be called from the secondary DC, because there
should not be a requirement to run the same Vault instance in a
secondary DC. SignIntermediate is called in a secondary DC, so it should
not call ActiveRoot

We would also like to change the interface of ActiveRoot so that we can
support using an intermediate cert as the primary CA in Consul. In
preparation for making that change I am reducing the number of calls to
ActiveRoot, so that there are fewer code paths to modify when the
interface changes.

This change required a change to the mockCAServerDelegate we use in
tests. It was returning the RootCert for SignIntermediate, but that is
not an accurate fake of production. In production this would also be a
separate cert.
2022-01-06 16:55:50 -05:00
Daniel Nephin 1f66120c20 ca: remove redundant append of an intermediate cert
Immediately above this line we are already appending the full list of
intermediates. The `provider.ActiveIntermediate` MUST be in this list of
intermediates because it must be available to all the other non-leader
Servers.  If it was not in this list of intermediates then any proxy
that received data from a non-leader would have the wrong certs.

This is being removed now because we are planning on changing the
`Provider.ActiveIntermediate` interface, and removing these extra calls ahead of
time helps make that change easier.
2022-01-06 16:55:50 -05:00
Daniel Nephin b66d259c1a ca: only generate a single private key for the whole test case
Using tracing and cpu profiling I found that the majority of the time in
these test cases is spent generating a private key. We really don't need
separate private keys, so we can generate only one and use it for all
cases.

With this change the test runs much faster.
2022-01-06 16:55:50 -05:00
Daniel Nephin 92a054cfa6 ca: cleanup a test
Fix the name to match the function it is testing

Remove unused code

Fix the signature, instead of returning (error, string) which should be (string, error)
accept a testing.T to emit errors.

Handle the error from encode.
2022-01-06 16:55:49 -05:00
Daniel Nephin 9ec7e07db4 ca: use the new leaf signing lookup func in leader metrics 2022-01-06 16:55:49 -05:00
Daniel Nephin 4983c27703 snapshot: return the error from replyFn
The only function passed to SnapshotRPC today always returns a nil error, so there's no
way to exercise this bug in practice. This change is being made for correctness so that
it doesn't become a problem in the future, if we ever pass a different function to
SnapshotRPC.
2022-01-05 17:51:03 -05:00
Jared Kirschner a9371f18e5 Clarify service and check error messages (use ID)
Error messages related to service and check operations previously included
the following substrings:
- service %q
- check %q

From this error message, it isn't clear that the expected field is the ID for
the entity, not the name. For example, if the user has a service named test,
the error message would read 'Unknown service "test"'. This is misleading -
a service with that *name* does exist, but not with that *ID*.

The substrings above have been modified to make it clear that ID is needed,
not name:
- service with ID %q
- check with ID %q
2022-01-04 11:42:37 -08:00
Chris S. Kim d87fe70a82
testing: Revert assertion for virtual IP flag (#11932) 2022-01-04 11:24:56 -05:00
Daniel Nephin 48d123e241
Merge pull request #11796 from hashicorp/dnephin/cleanup-test-server
testing: stop using an old version in testServer
2021-12-22 16:04:04 -05:00
Freddy f7eeffb98d
Use anonymousToken when querying by secret ID (#11813)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>

This query has been incorrectly querying by accessor ID since New ACLs
were added. However, the legacy token compat allowed this to continue to
work, since it made a fallback query for the anonymousToken ID.

PR #11184 removed this legacy token query, which means that the query by
accessor ID is now the only check for the anonymous token's existence.

This PR updates the GetBySecret call to use the secret ID of the token.
2021-12-13 10:56:09 -07:00
R.B. Boyer a0156785dd
various partition related todos (#11822) 2021-12-13 11:43:33 -06:00
Kyle Havlovitz 9187070a93
Merge pull request #11798 from hashicorp/vip-goroutine-check
leader: move the virtual IP version check into a goroutine
2021-12-10 15:59:35 -08:00
Kyle Havlovitz 45402dad63 state: fix freed VIP table id index 2021-12-10 14:41:45 -08:00
Kyle Havlovitz ccc119c549 Exit before starting the vip check routine if possible 2021-12-10 14:30:50 -08:00
Daniel Nephin 6444d1d4b3 testing: Deprecate functions for creating a server.
These helper functions actually end up hiding important setup details
that should be visible from the test case. We already have a convenient
way of setting this config when calling newTestServerWithConfig.
2021-12-09 20:09:29 -05:00
Daniel Nephin 74e92316de testing: remove old config.Build version
DefaultConfig already sets the version to version.Version, so by removing this
our tests will run with the version that matches the code.
2021-12-09 20:09:29 -05:00
Kyle Havlovitz 2a52630067 leader: move the virtual IP version check into a goroutine 2021-12-09 17:00:33 -08:00
Daniel Nephin 15c4de0c15 ca: prune some unnecessary lookups in the tests 2021-12-08 18:42:52 -05:00
Daniel Nephin bf798094d5 ca: remove duplicate WaitFor function 2021-12-08 18:42:52 -05:00
Daniel Nephin 984986f007 ca: fix flakes in RenewIntermediate tests
I suspect one problem was that we set structs.IntermediateCertRenewInterval to 1ms, which meant
that in some cases the intermediate could renew before we stored the original value.

Another problem was that the 'wait for intermediate' loop was calling the provider.ActiveIntermediate,
but the comparison needs to use the RPC endpoint to accurately represent a user request. So
changing the 'wait for' to use the state store ensures we don't race.

Also moves the patching into a separate function.

Removes the addition of ca.CertificateTimeDriftBuffer as part of calculating halfTime. This was added
in a previous commit to attempt to fix the flake, but it did not appear to fix the problem. Adding the
time here was making the tests fail when using the shared patch
function. It's not clear to me why, but there's no reason we should be
including this time in the halfTime calculation.
2021-12-08 18:42:52 -05:00
Daniel Nephin bc7ec4455f ca: improve RenewIntermediate tests
Use the new verifyLearfCert to show the cert verifies with intermediates
from both sources. This required using the RPC interface so that the
leaf pem was constructed correctly.

Add IndexedCARoots.Active since that is a common operation we see in a
few places.
2021-12-08 18:42:52 -05:00
Daniel Nephin 0784073d5e ca: add a test for Vault in secondary DC 2021-12-08 18:42:51 -05:00
Daniel Nephin 373f445db5 ca: Add CARoots.Active method
Which will be used in the next commit.
2021-12-08 18:41:51 -05:00
R.B. Boyer 2f345cca33
acl: ensure that the agent recovery token is properly partitioned (#11782) 2021-12-08 17:11:55 -06:00
Daniel Nephin 0f95a2c3b1
Merge pull request #11721 from hashicorp/dnephin/ca-export-fsm-operation
ca: use the real FSM operation in tests
2021-12-08 17:49:00 -05:00
Daniel Nephin be1ddc5942 ca: use the real FSM operation in tests
Previously we had a couple copies that reproduced the FSM operation.
These copies introduce risk that the test does not accurately match
production.

This PR removes the test versions of the FSM operation, and exports the
real production FSM operation so that it can be used in tests.

The consul provider tests did need to change because of this. Previously
we would return a hardcoded value of 2, but in production this value is
always incremented.
2021-12-08 17:29:44 -05:00
R.B. Boyer 957758cb61
test: test server should auto cleanup (#11779) 2021-12-08 13:26:06 -06:00
Evan Culver 32a04317bf
rpc: Unset partition before forwarding to remote datacenter (#11758) 2021-12-08 11:02:14 -08:00
Daniel Nephin 52c8b4994b Merge remote-tracking branch 'origin/main' into serve-panic-recovery 2021-12-07 16:30:41 -05:00
Chris S. Kim b74ddd7b70
Godocs updates for catalog endpoints (#11716) 2021-12-07 10:18:28 -05:00
Dan Upton 8bc11b08dc
Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746) 2021-12-07 12:39:28 +00:00
Dan Upton 0230ebb4ef
agent/token: rename `agent_master` to `agent_recovery` (internally) (#11744) 2021-12-07 12:12:47 +00:00
R.B. Boyer 89e90d1ffc return the max 2021-12-06 15:36:52 -06:00
freddygv 65875a7c69 Remove support for failover to partition
Failing over to a partition is more siimilar to failing over to another
datacenter than it is to failing over to a namespace. In a future
release we should update how localities for failover are specified. We
should be able to accept a list of localities which can include both
partition and datacenter.
2021-12-06 12:32:24 -07:00
freddygv a1c1e36be7 Allow cross-partition references in disco chain
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
2021-12-06 12:32:19 -07:00
R.B. Boyer 5ea4b82940
light refactors to support making partitions and serf-based wan federation are mutually exclusive (#11755) 2021-12-06 13:18:02 -06:00
Freddy d86b98c503
Merge pull request #11739 from hashicorp/ap/exports-rename 2021-12-06 08:20:50 -07:00
freddygv a2fd30e514 Clean up additional refs to partition exports 2021-12-04 15:16:40 -07:00
freddygv 02fb323652 Rename partition-exports to exported-services
Using a name less tied to partitions gives us more flexibility to use
this config entry in OSS for exports between datacenters/meshes.
2021-12-03 17:47:31 -07:00
freddygv fcfed67246 Update intention topology to use new table 2021-12-03 17:28:31 -07:00
freddygv 4acbdc4618 Avoid updating default decision from wildcard ixn
Given that we do not allow wildcard partitions in intentions, no one ixn
can override the DefaultAllow setting. Only the default ACL policy
applies across all partitions.
2021-12-03 17:28:12 -07:00
freddygv 142d8193e5 Add a new table to query service names by kind
This table purposefully does not index by partition/namespace. It's a
global view into all service names.

This table is intended to replace the current serviceListTxn watch in
intentionTopologyTxn. For cross-partition transparent proxying we need
to be able to calculate upstreams from intentions in any partition. This
means that the existing serviceListTxn function is insufficient since
it's scoped to a partition.

Moving away from that function is also beneficial because it watches the
main "services" table, so watchers will wake up when any instance is
registered or deregistered.
2021-12-03 17:28:12 -07:00
Freddy 3eddf98e62
Merge pull request #11680 from hashicorp/ap/partition-exports-oss 2021-12-03 16:57:50 -07:00
Dan Upton 2f4b8d7a7d
internal: support `ResultsFilteredByACLs` flag/header (#11643) 2021-12-03 23:04:24 +00:00
Dan Upton 43e28a3af6
query: support `ResultsFilteredByACLs` in query list endpoint (#11620) 2021-12-03 23:04:09 +00:00
Dhia Ayachi e38ccf0a22
port oss changes (#11736) 2021-12-03 17:23:55 -05:00
Freddy 3791d6d7da
Merge pull request #11720 from hashicorp/bbolt 2021-12-03 14:44:36 -07:00
Dan Upton 1d694df02b
fedstate: support `ResultsFilteredByACLs` in `ListMeshGateways` endpoint (#11644) 2021-12-03 20:56:55 +00:00
Dan Upton 0489ea187d
catalog: support `ResultsFilteredByACLs` flag/header (#11594) 2021-12-03 20:56:14 +00:00
Dan Upton 8bb1b89554
coordinate: support `ResultsFilteredByACLs` flag/header (#11617) 2021-12-03 20:51:02 +00:00
Dan Upton a62aa3847d
sessions: support `ResultsFilteredByACLs` flag/header (#11606) 2021-12-03 20:43:43 +00:00
Dan Upton 0a7ba5162e
txn: support `ResultsFilteredByACLs` flag in `Read` endpoint (#11632) 2021-12-03 20:41:03 +00:00
Dhia Ayachi a8874c65f7
sessions partitioning tests (#11734)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

* fix tests

* fix tests

* do not namespace nodeChecks index

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-12-03 15:36:07 -05:00
Dan Upton b10e69ffda
intention: support `ResultsFilteredByACLs` flag/header (#11612) 2021-12-03 20:35:54 +00:00
Mark Anderson e8f542030e
Cross port of ent #1383 (#11726)
Cross port of ent #1383 "Reject non-default datacenter when making partitioned ACLs"

On the OSS side this is a minor refactor to add some more checks that are only applicable to enterprise code.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-12-03 10:20:25 -08:00
Dan Upton 1d571bb503
config: support `ResultsFilteredByACLs` in list/list all endpoints (#11621) 2021-12-03 17:39:47 +00:00
Dan Upton 44bc833318
kv: support `ResultsFilteredByACLs` in list/list keys (#11593) 2021-12-03 17:31:48 +00:00
Dan Upton 3ad8540d23
health: support `ResultsFilteredByACLs` flag/header (#11602) 2021-12-03 17:31:32 +00:00
Dan Upton 0efe478044
Groundwork for exposing when queries are filtered by ACLs (#11569) 2021-12-03 17:11:26 +00:00
Kyle Havlovitz dbb58b726a
Merge pull request #11724 from hashicorp/service-virtual-ips
oss: add virtual IP generation for connect services
2021-12-02 16:16:57 -08:00
Kyle Havlovitz db88f95fbe consul: add virtual IP generation for connect services 2021-12-02 15:42:47 -08:00
R.B. Boyer 6ec84cfbe2
agent: add variation of force-leave that exclusively works on the WAN (#11722)
Fixes #6548
2021-12-02 17:15:10 -06:00
Matt Keeler 68e629a476 Emit raft-boltdb metrics 2021-12-02 16:56:15 -05:00
Daniel Nephin 8e2c71528f config: add NoFreelistSync option
# Conflicts:
#	agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden
#	agent/consul/server.go
2021-12-02 16:56:15 -05:00
Matt Keeler 1f49738167 Use raft-boltdb/v2 2021-12-02 16:56:15 -05:00
Daniel Nephin fa32c78429 ca: set the correct SigningKeyID after config update with Vault provider
The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.

This same bug was fixed in two other places back in 2019, but this last one was
missed.

While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.

There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
2021-12-02 16:07:11 -05:00
Daniel Nephin a0014e13fd
Merge pull request #11713 from hashicorp/dnephin/ca-test-names
ca: make test naming consistent
2021-12-02 16:05:42 -05:00
Daniel Nephin 720d782225
Merge pull request #11671 from hashicorp/dnephin/ca-fix-storing-vault-intermediate
ca: fix storing the leaf signing cert with Vault provider
2021-12-02 16:02:24 -05:00
Daniel Nephin a0160f7426
Merge pull request #11677 from hashicorp/dnephin/freeport-interface
sdk: use t.Cleanup in freeport and remove unnecessary calls
2021-12-02 15:58:41 -05:00
Daniel Nephin c1cb77b829 ca: make test naming consistent
While working on the CA system it is important to be able to run all the
tests related to the system, without having to wait for unrelated tests.
There are many slow and unrelated tests in agent/consul, so we need some
way to filter to only the relevant tests.

This PR renames all the CA system related tests to start with either
`TestCAMananger` for tests of internal operations that don't have RPC
endpoint, or `TestConnectCA` for tests of RPC endpoints. This allows us
to run all the test with:

    go test -run 'TestCAMananger|TestConnectCA' ./agent/consul

The test naming follows an undocumented convention of naming tests as
follows:

    Test[<struct name>_]<function name>[_<test case description>]

I tried to always keep Primary/Secondary at the end of the description,
and _Vault_ has to be in the middle because of our regex to run those
tests as a separate CI job.

You may notice some of the test names changed quite a bit. I did my best
to identify the underlying method being tested, but I may have been
slightly off in some cases.
2021-12-02 14:57:09 -05:00
Daniel Nephin 460f8919c9 ca: make getLeafSigningCertFromRoot safer
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.

So for now, move this logic to the CAMananger, so that it is always correct.
2021-12-02 12:42:49 -05:00
Daniel Nephin 64532ef636 ca: fix stored CARoot representation with Vault provider
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.

Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
2021-12-02 12:42:49 -05:00
Chris S. Kim 67eacee31e
ENT to OSS sync (#11703) 2021-12-01 14:56:10 -05:00
R.B. Boyer 70b143ddc5
auto-config: ensure the feature works properly with partitions (#11699) 2021-12-01 13:32:34 -06:00
Daniel Nephin 963a9819d0 ca: add some godoc and func for finding leaf signing cert
This will be used in a follow up commit.
2021-11-30 18:36:41 -05:00
Daniel Nephin 056a52ba64 sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Chris S. Kim e9c661db7f
Refactor test helper (#11689)
Allow custom ACL root tokens to be passed
2021-11-30 13:22:07 -05:00
Chris S. Kim 0ec67cc2d1
acl: Fill authzContext from token in Coordinate endpoints (#11688) 2021-11-30 13:17:41 -05:00
freddygv 76146dfc5b Move ent config test to ent file 2021-11-29 12:15:17 -07:00
Daniel Nephin 4f0d092c95 testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin 20a8e11bf2 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Daniel Nephin 2cf41e4dc8 go-sso: remove returnFunc now that freeport handles return 2021-11-27 15:29:38 -05:00
Daniel Nephin 772d8f7381 ca: clean up unnecessary raft.Apply response checking
In d2ab767fef21244e9fe3b9887ea70fc177912381 raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.

This commit removes the remaining resp.(error) checks, since they are all no-ops now.
2021-11-26 17:57:55 -05:00
Daniel Nephin 48954adfdc
Merge pull request #11339 from hashicorp/dnephin/ca-manager-isolate-secondary-2
ca: reduce use of state in the secondary
2021-11-26 14:41:45 -05:00
Daniel Nephin 8240286956 ca: remove state check in secondarySetPrimaryRoots
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.

This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
2021-11-26 14:14:47 -05:00
Daniel Nephin 877094e2fa ca: remove actingSecondaryCA
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.

The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.

This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
2021-11-26 14:14:47 -05:00
Daniel Nephin cd5f6b2dfb ca: reduce consul provider backend interface a bit
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.

Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
2021-11-25 11:46:06 -05:00
Dhia Ayachi f605689154
Partition/kv indexid sessions (#11639)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* remove partition for Checks as it's always use the session partition

* partition sessions index id table

* fix rebase issues

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 11:34:36 -05:00
Dhia Ayachi b1c4be3da0
Partition session checks store (#11638)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 09:10:38 -05:00
Daniel Nephin 07a33a1526 ca: accept only the cluster ID to SpiffeIDSigningForCluster
To make it more obivous where ClusterID is used, and remove the need to create a struct
when only one field is used.
2021-11-16 16:57:21 -05:00
R.B. Boyer 83bf7ab3ff
re-run gofmt on 1.17 (#11579)
This should let freshly recompiled golangci-lint binaries using Go 1.17
pass 'make lint'
2021-11-16 12:04:01 -06:00
R.B. Boyer 086ff42b56
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
freddygv 5ac1ab359b Move assertion to after config fetch 2021-11-10 10:50:08 -07:00
freddygv 2261d51515 Use ClusterID to check for readiness
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
2021-11-10 10:45:22 -07:00
freddygv 482d3bc610 Prevent replicating partition-exports 2021-11-09 16:42:42 -07:00
freddygv 739490df12 handle error scenario of empty local DC 2021-11-09 16:42:42 -07:00
freddygv b9b41625b9 Restrict DC for partition-exports writes
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.

The first restriction is because the config entry is not supported in
secondary datacenters.

The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
2021-11-09 16:42:42 -07:00
Freddy 0ad360fadf
Merge pull request #11514 from hashicorp/dnephin/ca-fix-secondary-init
ca: properly handle the case where the secondary initializes after the primary
2021-11-08 17:16:16 -07:00
freddygv e6622ab0ab Avoid returning empty roots with uninitialized CA
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.

There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
2021-11-08 16:51:49 -07:00
Dhia Ayachi f61892393f
refactor session state store tables to use the new index pattern (#11525)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused func `prefixIndexFromServiceNameAsString`

* fix test error check

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 16:20:50 -05:00
Dhia Ayachi dfafd4e38c
KV refactoring, part 2 (#11512)
* add partition to the kv get pretty print

* fix failing test

* add test for kvs RPC endpoint
2021-11-08 11:43:21 -05:00
Dhia Ayachi 17190c0076
KV state store refactoring and partitioning (#11510)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* partition kvs indexID table

* add `partitionedIndexEntryName` in oss for test purpose

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* remove entmeta reference from oss

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 09:35:56 -05:00
Giulio Micheloni 10cdc0a5c8
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Daniel Nephin 69ad7c0544 ca: Only initialize clusterID in the primary
The secondary must get the clusterID from the primary
2021-11-05 18:08:44 -04:00
Daniel Nephin 3173582b75 ca: return an error when secondary fails to initialize
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.

To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
2021-11-05 18:02:51 -04:00
Daniel Nephin db29ad346b acl: remove id and revision from Policy constructors
The fields were removed in a previous commit.

Also remove an unused constructor for PolicyMerger
2021-11-05 15:45:08 -04:00
R.B. Boyer 1d8e7bb565
rename helper method to reflect the non-deprecated terminology (#11509) 2021-11-05 13:51:50 -05:00
FFMMM 27227c0fd2
add root_cert_ttl option for consul connect, vault ca providers (#11428)
* add root_cert_ttl option for consul connect, vault ca providers

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

* add changelog, pr feedback

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update .changelog/11428.txt, more docs

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Update website/content/docs/agent/options.mdx

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
2021-11-02 11:02:10 -07:00
Daniel Nephin 00ed2b243f
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
Daniel Nephin eaaceedf31
Merge pull request #11338 from hashicorp/dnephin/ca-manager-isolate-secondary
ca: clearly identify methods that are primary-only or secondary-only
2021-11-01 14:10:31 -04:00
Daniel Upton a620b6be2e
Support Check-And-Set deletion of config entries (#11419)
Implements #11372
2021-11-01 16:42:01 +00:00
Dhia Ayachi 4d763ef9e6
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
R.B. Boyer 017e9d5ae4
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
Daniel Nephin a7fcf14c5c telemetry: fix cert expiry metrics by removing labels
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).

We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.

Also renames the closure to be more descriptive.
2021-10-27 15:19:25 -04:00
Daniel Nephin 4300daa2e6 telemetry: only emit leader cert expiry metrics on the servers 2021-10-27 15:19:25 -04:00
Daniel Nephin 9de725c17d telemetry: prevent stale values from cert monitors
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.

By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
2021-10-27 15:19:25 -04:00
Daniel Nephin 616cc9b6f8 telemetry: improve cert expiry metrics
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.

Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
2021-10-27 15:19:25 -04:00
Daniel Nephin 24951f0c7e subscribe: attempt to fix a flaky test
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.

I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.

The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.

Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
2021-10-27 15:09:09 -04:00
freddygv 592965d61e Rework acl exports interface 2021-10-27 12:50:39 -06:00
Freddy 9bbeea0432
Merge pull request #11433 from hashicorp/exported-service-acls
[OSS] acl: Expand ServiceRead and NodeRead to account for partition exports
2021-10-27 12:48:08 -06:00
Freddy d8ae915160
Merge pull request #11431 from hashicorp/ap/exports-proxycfg
[OSS] Update partitioned mesh gw handling for connect proxies
2021-10-27 11:27:43 -06:00
Freddy 8e23a6a0cc
Merge pull request #11416 from hashicorp/ap/exports-update
Rename service-exports to partition-exports
2021-10-27 11:27:31 -06:00
freddygv af662c8c1c Avoid mixing named and unnamed params 2021-10-26 23:42:25 -06:00
freddygv 1de62bb0a2 Avoid passing nil config pointer 2021-10-26 23:42:25 -06:00
freddygv 4a2e40aa3c Avoid panic on nil partitionAuthorizer config
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.

This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
2021-10-26 23:42:25 -06:00
freddygv 015d85cd74 Update NodeRead for partition-exports
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
2021-10-26 23:42:11 -06:00
Kyle Havlovitz afb0976eac acl: pass PartitionInfo through ent ACLConfig 2021-10-26 23:41:52 -06:00
Kyle Havlovitz 56d1858c4a acl: Expand ServiceRead logic to look at service-exports for cross-partition 2021-10-26 23:41:32 -06:00
freddygv 3966677aaf Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv feaebde1f1 Remove useInDatacenter from disco chain requests
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
2021-10-26 23:35:21 -06:00
R.B. Boyer e27e58c6cc
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim 27f8a85664
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
freddygv c3e381b4c1 Rename service-exports to partition-exports
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.

Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
2021-10-25 17:58:48 -06:00
Daniel Nephin f24bad2a52
Merge pull request #11232 from hashicorp/dnephin/acl-legacy-remove-docs
acl: add docs and changelog for the removal of the legacy ACL system
2021-10-25 18:38:00 -04:00
Daniel Nephin f7cdd210fe Update agent/consul/acl_client.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-10-25 17:25:14 -04:00
Daniel Nephin 732b841dd7 state: remove support for updating legacy ACL tokens 2021-10-25 17:25:14 -04:00
Daniel Nephin 76b007dacd acl: remove init check for legacy anon token
This token should always already be migrated from a previous version.
2021-10-25 17:25:14 -04:00
Daniel Nephin 8ae6ee4e36 acl: remove legacy parameter to ACLDatacenter
It is no longer used now that legacy ACLs have been removed.
2021-10-25 17:25:14 -04:00
Daniel Nephin d778113773 acl: remove ACLTokenTypeManagement 2021-10-25 17:25:14 -04:00
Daniel Nephin 88c6aeea34 acl: remove legacy arg to store.ACLTokenSet
And remove the tests for legacy=true
2021-10-25 17:25:14 -04:00
Daniel Nephin b31a7fc498 acl: remove EmbeddedPolicy
This method is no longer. It only existed for legacy tokens, which are no longer supported.
2021-10-25 17:25:14 -04:00
Daniel Nephin ceaa36f983 acl: remove tests for resolving legacy tokens
The code for this was already removed, which suggests this is not actually testing what it claims.

I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
2021-10-25 17:25:14 -04:00