Commit Graph

1708 Commits

Author SHA1 Message Date
Daniel Nephin 8f4b6af68a
Merge pull request #12298 from jorgemarey/b-persistnewrootandconfig
Avoid raft change when no config is provided on persistNewRootAndConfig
2022-03-03 11:03:50 -05:00
Daniel Nephin 2082bdc286 ca: make sure the test fails without the fix
Also change the path used for the secondary so that both primary and secondary do not overwrite each other.
2022-03-02 18:22:49 -05:00
R.B. Boyer 679cea7171
raft: upgrade to v1.3.6 (#12496)
Add additional protections on the Consul side to prevent NonVoters from bootstrapping raft.

This should un-flake TestServer_Expect_NonVoters
2022-03-02 17:00:02 -06:00
Daniel Nephin 849d86e7f5
Merge pull request #12467 from hashicorp/dnephin/ci-vault-test-safer
ca: require that tests that use Vault are named correctly
2022-03-01 12:54:02 -05:00
R.B. Boyer 033e0ed13f
test: parallelize more of TestLeader_ReapOrLeftMember_IgnoreSelf (#12468)
before:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	21.147s

after:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	5.402s
2022-03-01 10:30:06 -06:00
Jorge Marey aba9e724a8 Fix vault test with suggested changes 2022-03-01 10:20:00 +01:00
Jorge Marey 8b1b264b6f Add test case to verify #12298 2022-03-01 09:25:52 +01:00
Jorge Marey 2ca00df0d8 Avoid raft change when no config is provided on CAmanager
- This avoids a change to the raft store when no roots or config
are provided to persistNewRootAndConfig
2022-03-01 09:25:52 +01:00
Daniel Nephin dd565aa5e4 ca: fix a test
This test does not use Vault, so does not need ca.SkipIfVaultNotPresent
2022-02-28 16:26:18 -05:00
R.B. Boyer 3804677570
server: suppress spurious blocking query returns where multiple config entries are involved (#12362)
Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:

- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`

All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.

### No Config Entries Found

In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.

### No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.

### Extra Endpoints

Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:

- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
2022-02-25 15:46:34 -06:00
R.B. Boyer 4b0f657b31
fix flaky test panic (#12446) 2022-02-24 17:35:46 -06:00
R.B. Boyer a97d20cf63
catalog: compare node names case insensitively in more places (#12444)
Many places in consul already treated node names case insensitively.
The state store indexes already do it, but there are a few places that
did a direct byte comparison which have now been corrected.

One place of particular consideration is ensureCheckIfNodeMatches
which is executed during snapshot restore (among other places). If a
node check used a slightly different casing than the casing of the node
during register then the snapshot restore here would deterministically
fail. This has been fixed.

Primary approach:

    git grep -i "node.*[!=]=.*node" -- ':!*_test.go' ':!docs'
    git grep -i '\[[^]]*member[^]]*\]
    git grep -i '\[[^]]*\(member\|name\|node\)[^]]*\]' -- ':!*_test.go' ':!website' ':!ui' ':!agent/proxycfg/testing.go:' ':!*.md'
2022-02-24 16:54:47 -06:00
R.B. Boyer d860384731
server: partly fix config entry replication issue that prevents replication in some circumstances (#12307)
There are some cross-config-entry relationships that are enforced during
"graph validation" at persistence time that are required to be
maintained. This means that config entries may form a digraph at times.

Config entry replication procedes in a particular sorted order by kind
and name.

Occasionally there are some fixups to these digraphs that end up
replicating in the wrong order and replicating the leaves
(ingress-gateway) before the roots (service-defaults) leading to
replication halting due to a graph validation error related to things
like mismatched service protocol requirements.

This PR changes replication to give each computed change (upsert/delete)
a fair shot at being applied before deciding to terminate that round of
replication in error. In the case where we've simply tried to do the
operations in the wrong order at least ONE of the outstanding requests
will complete in the right order, leading the subsequent round to have
fewer operations to do, with a smaller likelihood of graph validation
errors.

This does not address all scenarios, but for scenarios where the edits
are being applied in the wrong order this should avoid replication
halting.

Fixes #9319

The scenario that is NOT ADDRESSED by this PR is as follows:

1. create: service-defaults: name=new-web, protocol=http
2. create: service-defaults: name=old-web, protocol=http
3. create: service-resolver: name=old-web, redirect-to=new-web
4. delete: service-resolver: name=old-web
5. update: service-defaults: name=old-web, protocol=grpc
6. update: service-defaults: name=new-web, protocol=grpc
7. create: service-resolver: name=old-web, redirect-to=new-web

If you shutdown dc2 just before (4) and turn it back on after (7)
replication is impossible as there is no single edit you can make to
make forward progress.
2022-02-23 17:27:48 -06:00
Daniel Nephin 3639f4b551
Merge pull request #11910 from hashicorp/dnephin/ca-provider-interface-for-ica-in-primary
ca: add support for an external trusted CA
2022-02-22 13:14:52 -05:00
R.B. Boyer 11fdc70b34
configentry: make a new package to hold shared config entry structs that aren't used for RPC or the FSM (#12384)
First two candidates are ConfigEntryKindName and DiscoveryChainConfigEntries.
2022-02-22 10:36:36 -06:00
Daniel Nephin cb1a80184f rpc: set response to nil when not found
Otherwise when the query times out we might incorrectly send a value for
the reply, when we should send an empty reply.

Also document errNotFound and how to handle the result in that case.
2022-02-18 12:26:06 -05:00
Daniel Nephin 79820738cc ca: test that original certs from secondary still verify
There's a chance this could flake if the secondary hasn't received the
update yet, but running this test many times doesn't show any flakes
yet.
2022-02-17 18:45:16 -05:00
Daniel Nephin ca4e60e09b Update TODOs to reference an issue with more details
And remove a no longer needed TODO
2022-02-17 18:21:30 -05:00
Daniel Nephin 0abaf29c10 ca: add test cases for rotating external trusted CA 2022-02-17 18:21:30 -05:00
Daniel Nephin aacc40012f ca: add a test for secondary with external CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 471b2098bb ca: examine the full chain in newCARoot
make TestNewCARoot much more strict
compare the full result instead of only a few fields.
add a test case with 2 and 3 certificates in the pem
2022-02-17 18:21:30 -05:00
Daniel Nephin 2d5254a73b
Merge pull request #12110 from hashicorp/dnephin/blocking-queries-not-found
rpc: make blocking queries for non-existent items more efficient
2022-02-17 18:09:39 -05:00
Florian Apolloner 895da50986
Support for connect native services in topology view. (#12098) 2022-02-16 16:51:54 -05:00
Chris S. Kim 18096fd2fb
Move IndexEntryName helpers to common files (#12365) 2022-02-16 12:56:38 -05:00
Daniel Nephin 06657e5be0 rpc: add errNotFound to all Get queries
Any query that returns a list of items is not part of this commit.
2022-02-15 18:24:34 -05:00
Daniel Nephin bdafa24c50 Make blockingQuery efficient with 'not found' results.
By using the query results as state.

Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.

Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.

This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.

This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
2022-02-15 18:24:33 -05:00
Daniel Nephin 6e73df7dc2 Add a test for blocking query on non-existent entry
This test shows how blocking queries are not efficient when the query
returns no results.  The test fails with 100+ calls instead of the
expected 2.

This test is still a bit flaky because it depends on the timing of the
writes. It can sometimes return 3 calls.

A future commit should fix this and make blocking queries even more
optimal for not-found results.
2022-02-15 18:23:17 -05:00
Daniel Nephin a4e1c59cd8 rpc: improve docs for blockingQuery
Follow the Go convention of accepting a small interface that documents
the methods used by the function.

Clarify the rules for implementing a query function passed to
blockingQuery.
2022-02-15 14:20:14 -05:00
R.B. Boyer b216d52b66
server: conditionally avoid writing a config entry to raft if it was already the same (#12321)
This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
2022-02-14 14:39:12 -06:00
FFMMM 1f8fb17be7
Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311)
This commit syncs ENT changes to the OSS repo.

Original commit details in ENT:

```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date:   Thu Feb 10 10:23:33 2022 -0800

    Vendor fork net rpc (#1538)

    * replace net/rpc w consul-net-rpc/net/rpc

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * replace msgpackrpc and go-msgpack with fork from mono repo

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * gofmt all files touched

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-02-14 09:45:45 -08:00
Mark Anderson fa95afdcf6 Refactor to make ACL errors more structured. (#12308)
* First phase of refactoring PermissionDeniedError

Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.

It may be worth folding the test and error return into a single helper
function, that can happen at a later date.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-02-11 12:53:23 -08:00
Freddy f45bec7779
Merge pull request #12223 from hashicorp/proxycfg/passthrough-cleanup 2022-02-10 17:35:51 -07:00
freddygv 7fba7456ec Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
Daniel Nephin db4675bd1a
Merge pull request #12277 from hashicorp/dnephin/panic-in-service-register
catalog: initialize the refs map to prevent a nil panic
2022-02-09 19:48:22 -05:00
Daniel Nephin 6376141464 config-entry: fix a panic when registering a service or ingress gateway 2022-02-09 18:49:48 -05:00
Daniel Nephin c20412ab14
Merge pull request #12265 from hashicorp/dnephin/logging-in-tests
sdk: add TestLogLevel for setting log level in tests
2022-02-07 16:11:23 -05:00
Daniel Nephin 5a0e6700c1 A test to reproduce the issue 2022-02-04 14:04:12 -05:00
Daniel Nephin 7b466a024b Make test more readable
And fix typo
2022-02-03 18:44:09 -05:00
Daniel Nephin 6721c1246d ca: relax and move private key type/bit validation for vault
This commit makes two changes to the validation.

Previously we would call this validation in GenerateRoot, which happens
both on initialization (when a follower becomes leader), and when a
configuration is updated. We only want to do this validation during
config update so the logic was moved to the UpdateConfiguration
function.

Previously we would compare the config values against the actual cert.
This caused problems when the cert was created manually in Vault (not
created by Consul).  Now we compare the new config against the previous
config. Using a already created CA cert should never error now.

Adding the key bit and types to the config should only error when
the previous values were not the defaults.
2022-02-03 17:21:20 -05:00
Daniel Nephin 3b78f81f9a ca: small cleanup of TestConnectCAConfig_Vault_TriggerRotation_Fails
Before adding more test cases
2022-02-03 17:21:20 -05:00
Daniel Nephin f6d7a0f7b2 testing: fix test failures caused by new log level
These two tests require debug logging enabled, because they look for log lines.

Also switched to testify assertions because the previous errors were not clear.
2022-02-03 17:07:39 -05:00
Daniel Nephin 1a9a656a7f sdk: add TestLogLevel for setting log level in tests
And default log level to WARN.
2022-02-03 13:42:28 -05:00
Daniel Nephin 44f9229b96 ca: add a test that uses an intermediate CA as the primary CA
This test found a bug in the secondary. We were appending the root cert
to the PEM, but that cert was already appended. This was failing
validation in Vault here:
https://github.com/hashicorp/vault/blob/sdk/v0.3.0/sdk/helper/certutil/types.go#L329

Previously this worked because self signed certs have the same
SubjectKeyID and AuthorityKeyID. So having the same self-signed cert
repeated doesn't fail that check.

However with an intermediate that is not self-signed, those values are
different, and so we fail the check. A test I added in a previous commit
should show that this continues to work with self-signed root certs as
well.
2022-02-02 13:41:35 -05:00
Daniel Nephin d00a9abca2 acl: un-embed ACLIdentity
This is safer than embedding two interface because there are a number of
places where we check the concrete type. If we check the concrete type
on the top-level interface it will fail. So instead expose the
ACLIdentity from a method.
2022-02-02 12:07:31 -05:00
Daniel Nephin 18ff00f985
Merge pull request #12167 from hashicorp/dnephin/acl-resolve-token-3
acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
2022-01-31 19:21:06 -05:00
Daniel Nephin ff64c13c3e
Merge pull request #12166 from hashicorp/dnephin/acl-resolve-token-2
acl: remove ResolveTokenToIdentity
2022-01-31 19:19:21 -05:00
Daniel Nephin aa4dbe2a17 acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
This change allows us to remove one of the last remaining duplicate
resolve token methods (Server.ResolveToken).

With this change we are down to only 2, where the second one also
handles setting the default EnterpriseMeta from the token.
2022-01-31 18:04:19 -05:00
Daniel Nephin 57eac90cae acl: remove unused methods on fakes, and add changelog
Also document the metric that was removed in a previous commit.
2022-01-31 17:53:53 -05:00
Daniel Nephin 1fb2d49826
Merge pull request #12165 from hashicorp/dnephin/acl-resolve-token
acl: remove some of the duplicate resolve token methods
2022-01-31 13:27:49 -05:00
Dan Upton ebdda4848f
streaming: split event buffer by key (#12080) 2022-01-28 12:27:00 +00:00