Commit Graph

1629 Commits

Author SHA1 Message Date
R.B. Boyer 62ac98b564
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Dhia Ayachi b725605fe4
config raft apply silent error (#10657)
* return an error when the index is not valid

* check response as bool when applying `CAOpSetConfig`

* remove check for bool response

* fix error message and add check to test

* fix comment

* add changelog
2021-07-22 10:32:27 -04:00
Daniel Nephin db29c51cd2 acl: use SetHash consistently in testPolicyForID
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
2021-07-16 17:59:56 -04:00
Giulio Micheloni 3a1afd8f57 acl: fix error type into a string type for serialization issue
acl_endpoint_test.go:507:
        	Error Trace:	acl_endpoint_test.go:507
        	            				retry.go:148
        	            				retry.go:149
        	            				retry.go:103
        	            				acl_endpoint_test.go:504
        	Error:      	Received unexpected error:
        	            	codec.decoder: decodeValue: Cannot decode non-nil codec value into nil error (1 methods)
        	Test:       	TestACLEndpoint_ReplicationStatus
2021-07-15 11:31:44 +02:00
Daniel Nephin 27871498f0 Fix a data race in TestACLResolver_Client
By setting the hash when we create the policy.

```
WARNING: DATA RACE
Read at 0x00c0028b4b10 by goroutine 1182:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:701 +0x40d
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Previous write at 0x00c0028b4b10 by goroutine 1509:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:730 +0x3a8
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Goroutine 1182 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1669 +0x459
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 1509 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1668 +0x415
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
2021-07-14 18:58:16 -04:00
Daniel Nephin ff26294d63 consul: fix data race in leader CA tests
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.

This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
2021-07-14 18:58:15 -04:00
Giulio Micheloni 96fe1f4078 acl: acl replication routine to report the last error message 2021-07-14 11:50:23 +02:00
Dhia Ayachi 53b45a8441
check expiry date of the root/intermediate before using it to sign a leaf (#10500)
* ca: move provider creation into CAManager

This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.

* ca: move SignCertificate to CAManager

To reduce the scope of Server, and keep all the CA logic together

* ca: move SignCertificate to the file where it is used

* auto-config: move autoConfigBackend impl off of Server

Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.

* fix linter issues

* check error when `raftApplyMsgpack`

* ca: move SignCertificate to CAManager

To reduce the scope of Server, and keep all the CA logic together

* check expiry date of the intermediate before using it to sign a leaf

* fix typo in comment

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>

* Fix test name

* do not check cert start date

* wrap error to mention it is the intermediate expired

* Fix failing test

* update comment

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use shim to avoid sleep in test

* add root cert validation

* remove duplicate code

* Revert "fix linter issues"

This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.

* fix import issue

* gofmt leader_connect_ca

* add changelog entry

* update error message

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix error message in test

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-07-13 12:15:06 -04:00
R.B. Boyer ae8b526be8
connect/ca: ensure edits to the key type/bits for the connect builtin CA will regenerate the roots (#10330)
progress on #9572
2021-07-13 11:12:07 -05:00
R.B. Boyer 0537922c6c
connect/ca: require new vault mount points when updating the key type/bits for the vault connect CA provider (#10331)
progress on #9572
2021-07-13 11:11:46 -05:00
Daniel Nephin 58cf5767a8
Merge pull request #10479 from hashicorp/dnephin/ca-provider-explore-2
ca: move Server.SignIntermediate to CAManager
2021-07-12 19:03:43 -04:00
Daniel Nephin a22bdb2ac9
Merge pull request #10445 from hashicorp/dnephin/ca-provider-explore
ca: isolate more of the CA logic in CAManager
2021-07-12 15:26:23 -04:00
Daniel Nephin fdb0ba8041 ca: use provider constructors to be more consistent
Adds a contructor for the one provider that did not have one.
2021-07-12 14:04:34 -04:00
Dhia Ayachi 3eac4ffda4 check error when `raftApplyMsgpack` 2021-07-12 13:42:51 -04:00
Daniel Nephin 34c8585b29 auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
2021-07-12 13:42:40 -04:00
Daniel Nephin 605275b4dc ca: move SignCertificate to the file where it is used 2021-07-12 13:42:39 -04:00
Daniel Nephin c2e85f25d4 ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
2021-07-12 13:42:39 -04:00
Dhia Ayachi a0320169fe add missing state reset when stopping ca manager 2021-07-12 09:32:36 -04:00
Daniel Nephin 68d5f7769a ca: fix mockCAServerDelegate to work with the new interface
raftApply was removed so ApplyCARequest needs to handle all the possible operations

Also set the providerShim to use the mock provider.

other changes are small test improvements that were necessary to debug the failures.
2021-07-12 09:32:36 -04:00
Daniel Nephin 6d4b0ce194 ca: remove unused method
and small refactor to getCAProvider so that GoLand is less confused about what it is doing.
Previously it was reporting that the for condition was always true, which was not the case.
2021-07-12 09:32:35 -04:00
Daniel Nephin 4330122d9a ca: remove raftApply from delegate interface
After moving ca.ConsulProviderStateDelegate into the interface we now
have the ApplyCARequest method which does the same thing. Use this more
specific method instead of raftApply.
2021-07-12 09:32:35 -04:00
Daniel Nephin fae0a8f851 ca: move generateCASignRequest to the delegate
This method on Server was only used by the caDelegateWithState, so move it there
until we can move it entirely into CAManager.
2021-07-12 09:32:35 -04:00
Daniel Nephin d4bb9fd97a ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
2021-07-12 09:32:33 -04:00
Daniel Nephin fc629d9eaa ca-manager: move provider shutdown into CAManager
Reducing the coupling between Server and CAManager
2021-07-12 09:27:28 -04:00
Daniel Nephin 1e23d181b5 config: remove misleading UseTLS field
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.

Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
2021-07-09 19:01:45 -04:00
Daniel Nephin 3c60a46376 config: remove duplicate TLSConfig fields from agent/consul.Config
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.

Instead of copying the fields around, use the tlsutil.Config struct
directly instead.

This is one small step in removing the many layers of duplicate
configuration.
2021-07-09 18:49:42 -04:00
Evan Culver 5ff191ad99
Add support for returning ACL secret IDs for accessors with acl:write (#10546) 2021-07-08 15:13:08 -07:00
Dhia Ayachi e5dbf5e55b
Add ca certificate metrics (#10504)
* add intermediate ca metric routine

* add Gauge config for intermediate cert

* Stop metrics routine when stopping leader

* add changelog entry

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use variables instead of a map

* go imports sort

* Add metrics for primary and secondary ca

* start metrics routine in the right DC

* add telemetry documentation

* update docs

* extract expiry fetching in a func

* merge metrics for primary and secondary into signing ca metric

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-07-07 09:41:01 -04:00
Daniel Nephin 14527dd005
Merge pull request #10552 from hashicorp/dnephin/ca-remove-rotation-period
ca: remove unused RotationPeriod field
2021-07-06 18:49:33 -04:00
jkirschner-hashicorp 31bbab8ae7
Merge pull request #10560 from jkirschner-hashicorp/change-sane-to-reasonable
Replace use of 'sane' where appropriate
2021-07-06 11:46:04 -04:00
Daniel Nephin b4a10443d1 ca: remove unused RotationPeriod field
This field was never used. Since it is persisted as part of a map[string]interface{} it
is pretty easy to remove it.
2021-07-05 19:15:44 -04:00
Jared Kirschner 4c3b1b8b7b Replace use of 'sane' where appropriate
HashiCorp voice, style, and language guidelines recommend avoiding ableist
language unless its reference to ability is accurate in a particular use.
2021-07-02 12:18:46 -04:00
Dhia Ayachi b57cf27e8f
Format certificates properly (rfc7468) with a trailing new line (#10411)
* trim carriage return from certificates when inserting rootCA in the inMemDB

* format rootCA properly when returning the CA on the connect CA endpoint

* Fix linter warnings

* Fix providers to trim certs before returning it

* trim newlines on write when possible

* add changelog

* make sure all provider return a trailing newline after the root and intermediate certs

* Fix endpoint to return trailing new line

* Fix failing test with vault provider

* make test more robust

* make sure all provider return a trailing newline after the leaf certs

* Check for suffix before removing newline and use function

* Add comment to consul provider

* Update change log

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix typo

* simplify code callflow

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* extract requireNewLine as shared func

* remove dependency to testify in testing file

* remove extra newline in vault provider

* Add cert newline fix to envoy xds

* remove new line from mock provider

* Remove adding a new line from provider and fix it when the cert is read

* Add a comment to explain the fix

* Add missing for leaf certs

* fix missing new line

* fix missing new line in leaf certs

* remove extra new line in test

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* fix in vault provider and when reading cache (RPC call)

* fix AWS provider

* fix failing test in the provider

* remove comments and empty lines

* add check for empty cert in test

* fix linter warnings

* add new line for leaf and private key

* use string concat instead of Sprintf

* fix new lines for leaf signing

* preallocate slice and remove append

* Add new line to `SignIntermediate` and `CrossSignCA`

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-06-30 20:48:29 -04:00
Daniel Nephin 86244967c5 docs: correct some misleading telemetry docs
The query metrics are actually reported for all read queries, not only
ones that use a MinIndex to block for updates.

Also clarify the raft.apply metric is only on the leader.
2021-06-28 12:20:53 -04:00
R.B. Boyer c3d5a2a5ab
connect/ca: cease including the common name field in generated certs (#10424)
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.

Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.

Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:

    ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
2021-06-25 13:00:00 -05:00
Daniel Nephin 72b30174fa ca: replace ca.PrimaryIntermediateProviders
With an optional interface that providers can use to indicate if they
use an intermediate cert in the primary DC.

This removes the need to look up the provider config when renewing the
intermediate.
2021-06-23 15:47:30 -04:00
Daniel Nephin f4c1f982d1
Merge pull request #9924 from hashicorp/dnephin/cert-expiration-metric
connect: emit a metric for the seconds until root CA expiry
2021-06-18 14:18:55 -04:00
Daniel Nephin e36800cefa Update metric name
and handle the case where there is no active root CA.
2021-06-14 17:01:16 -04:00
Daniel Nephin 548796ae13 connect: emit a metric for the number of seconds until root CA expiration 2021-06-14 16:57:01 -04:00
Freddy f399fd2add
Rename CatalogDestinationsOnly (#10397)
CatalogDestinationsOnly is a passthrough that would enable dialing
addresses outside of Consul's catalog. However, when this flag is set to
true only _connect_ endpoints for services can be dialed.

This flag is being renamed to signal that non-Connect endpoints can't be
dialed by transparent proxies when the value is set to true.
2021-06-14 14:15:09 -06:00
Freddy 61ae2995b7
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
Daniel Nephin 20f7a72792 stream: remove bufferItem.NextLink
Both NextLink and NextNoBlock had the same logic, with slightly
different return values. By adding a bool return value (similar to map
lookups) we can remove the duplicate method.
2021-06-07 17:04:46 -04:00
Daniel Nephin 48f388f590 stream: fix a bug with creating a snapshot
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
2021-06-04 18:33:04 -04:00
Paul Ewing e454a9aae0
usagemetrics: add cluster members to metrics API (#10340)
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.

Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
2021-06-03 08:25:53 -07:00
Daniel Nephin 0dfb7da610 grpc: fix a data race by using a static resolver
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).

The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.

This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.

This allows us to use a lock around the map of resolvers, preventing the data race.
2021-06-02 11:35:38 -04:00
Dhia Ayachi 0c13f80d5a
RPC Timeout/Retries account for blocking requests (#8978) 2021-05-27 17:29:43 -04:00
Matt Keeler 7e4ea16149 Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler 58b934133d hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 82f5cb3f08 Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Daniel Nephin f2cf586414 Refactor of serf feature flag tags.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.

- use constants for both the prefix and feature flag name. A constant
  makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
  that the feature flag prefix can be unexported. Only expose a function
  for encoding the flags into tags. This logic is now next to the logic
  which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
  and Server structs had the same implementation. And neither
  implementation needed the method receiver.
2021-05-20 12:57:06 -04:00
R.B. Boyer b90877b440
server: ensure that central service config flattening properly resets the state each time (#10239)
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.

This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
2021-05-14 10:21:44 -05:00
Daniel Nephin 3dd951ab1e testing: don't run t.Parallel in a goroutine
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.

This would lead to strange behaviour which apparently confused the 'go test' runner.
2021-05-10 13:30:10 -04:00
Daniel Nephin c9ae72e72f
Merge pull request #10075 from hashicorp/dnephin/handle-raft-apply-errors
rpc: some cleanup of canRetry and ForwardRPC
2021-05-06 16:59:53 -04:00
Daniel Nephin 39d7d07922 state: reduce arguments to validateProposedConfigEntryInServiceGraph 2021-05-06 13:47:40 -04:00
Daniel Nephin 4905ac6f44 rpc: add tests for canRetry
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.

One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.
2021-05-06 13:30:07 -04:00
Daniel Nephin c38f4869ad rpc: remove unnecessary arg to ForwardRPC 2021-05-06 13:30:07 -04:00
Daniel Nephin 55f620d636
Merge pull request #10155 from hashicorp/dnephin/config-entry-remove-fields
config-entry: remove Kind and Name field from Mesh config entry
2021-05-04 17:27:56 -04:00
Daniel Nephin 0e5e1270b6 config-entries: add a test for the API client
Also fixes a bug with listing kind=mesh config entries. ValidateConfigEntryKind was only being used by
the List endpoint, and was yet another place where we have to enumerate all the kinds.

This commit removes ValidateConfigEntryKind and uses MakeConfigEntry instead. This change removes
the need to maintain two separate functions at the cost of creating an instance of the config entry which will be thrown away immediately.
2021-05-04 17:14:21 -04:00
Daniel Nephin df98027ad1 lint: fix warning by removing reference to deprecated interface 2021-05-04 14:09:14 -04:00
Paul Banks d47eea3a3f
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
Luke Kysow eb84a856c4
Give descriptive error if auth method not found (#10163)
* Give descriptive error if auth method not found

Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.

Now the error will be "auth method blah not found", which is much easier
to debug.
2021-05-03 13:39:13 -07:00
Daniel Nephin bf4c289804 config-entry: remove Kind and Name field from Mesh config entry
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.

In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
2021-04-29 17:11:21 -04:00
Freddy 401f3010e0
Rename "cluster" config entry to "mesh" (#10127)
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.

Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
2021-04-28 16:13:29 -06:00
Daniel Nephin 3cda0a7cc4 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Matt Keeler 6c639be8ec
Add prometheus guage definitions for replication metrics. (#10109) 2021-04-23 17:05:33 -04:00
Matt Keeler 09bf05ec5d
Add replication metrics (#10073) 2021-04-22 11:20:53 -04:00
Daniel Nephin 281d7616fa
Merge pull request #10045 from hashicorp/dnephin/state-proxy-defaults
state: remove config-entries kind index
2021-04-20 17:12:50 -04:00
Daniel Nephin 6d1a5b3629 Handle ErrChunkingResubmit.Error properly
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.

With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.

This commit updates canRetry to check for ErrChunkingResubmit in err.
2021-04-20 13:29:31 -04:00
Daniel Nephin 8654adfc53 Handle FSM.Apply errors in raftApply
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.

This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.

Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.
2021-04-20 13:29:29 -04:00
Daniel Nephin 95b361ecc8 state: remove unnecessary kind index
The query can be performed using a prefix query on the ID index.

Also backport some enterprise changes to prevent conflicts.
2021-04-15 17:37:28 -04:00
Daniel Nephin eb7f4b7ea4 state: use index constants for ConfigEntry indexes 2021-04-15 17:30:07 -04:00
Freddy 5a9b75a443
Merge pull request #10016 from hashicorp/topology-update 2021-04-15 14:11:23 -06:00
R.B. Boyer c88512fe14
connect: update centralized upstreams representation in service-defaults (#10015) 2021-04-15 14:21:44 -05:00
Daniel Nephin 2a10f01bf5 snapshot: fix saving of auth methods
Previously only a single auth method would be saved to the snapshot. This commit fixes the typo
and adds to the test, to show that all auth methods are now saved.
2021-04-14 16:51:21 -04:00
freddygv 2ff8b9f2f5 Avoid returning a nil slice 2021-04-14 10:52:05 -06:00
Matt Keeler aa0eb60f57
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
freddygv 7fd4c569ce Update viz endpoint to include topology from intentions 2021-04-14 10:20:15 -06:00
freddygv 83501d5415 Augment intention decision summary with DefaultAllow mode 2021-04-12 19:32:09 -06:00
freddygv eeccba945d Replace TransparentProxy bool with ProxyMode
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:

- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.

This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.
2021-04-12 09:35:14 -06:00
Freddy 920ba3db39
Merge pull request #9976 from hashicorp/centralized-upstream-fixups 2021-04-08 12:26:56 -06:00
Daniel Nephin 93818ebc5a
Merge pull request #9950 from hashicorp/dnephin/state-use-txn-everywhere
state: use Txn interface everywhere
2021-04-08 12:02:03 -04:00
Daniel Nephin 9db8ffb1c5
Merge pull request #9880 from hashicorp/dnephin/catalog-events-test-pattern
state: use runCase pattern for large test
2021-04-08 11:54:41 -04:00
freddygv a1fd3b0271 Pass down upstream defaults to client proxies
This is needed in case the client proxy is in TransparentProxy mode.
Typically they won't have explicit configuration for every upstream, so
this ensures the settings can be applied to all of them when generating
xDS config.
2021-04-07 09:32:47 -06:00
freddygv c2e74e21bc Prevent requests without UpstreamIDs from being flagged as legacy.
New clients in transparent proxy mode can send requests for service
config resolution without any upstream args because they do not have
explicitly defined upstreams.

Old clients on the other hand will never send requests without the
Upstreams args unless they don't have upstreams, in which case we do not
send back upstream config.
2021-04-07 09:32:47 -06:00
R.B. Boyer 82245585c6
connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973)
This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.
2021-04-06 13:19:59 -05:00
Daniel Nephin f0590e7c18 state: support additional test cases in indexer tests
And add a few additional cases.
2021-03-31 14:39:33 -04:00
Kyle Havlovitz 3cdd495600 Backport enterprise changes to prevent merge conflicts
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-31 14:05:26 -04:00
Daniel Nephin e4a60a2a8d state: use tableIndex constant 2021-03-29 18:52:20 -04:00
Daniel Nephin 7cb2255838 state: use ReadTxn and WriteTxn interface
Instead of *txn, so that we can replace the txn implementation with others, and so
that the function is easily documented as a read or write function.
2021-03-29 18:52:16 -04:00
Daniel Nephin d785c86db1 state: convert checks.service index to new pattern 2021-03-29 16:38:53 -04:00
Daniel Nephin f859ba6d4b state: convert checks.status indexer
As part of this change the indexer will now be case insensitive by using
the lower case value. This should be safe because previously we always
had lower case strings.

This change was made out of convenience. All the other indexers use
lowercase, so we can re-use the indexFromQuery function by using
lowercase here as well.
2021-03-29 16:38:50 -04:00
Daniel Nephin 9251ac881a state: add tests for checks indexers 2021-03-29 16:38:47 -04:00
Daniel Nephin 98d6dcbdf8 state: use constants for table checks 2021-03-29 16:38:43 -04:00
Daniel Nephin bcbdc9cab3 state: pass Query in from caller
To reduce the number of arguments
2021-03-29 15:42:30 -04:00
Daniel Nephin 372d274b34 state: convert services.kind to functional indexer pattern 2021-03-29 15:42:30 -04:00
Daniel Nephin bcde8d2fad state: add tests for services.kind indexer 2021-03-29 15:42:27 -04:00
Daniel Nephin 9f9eadd569 state: convert services table service and connect indexer
To the new functional indexer pattern
2021-03-29 15:42:24 -04:00
Daniel Nephin 11311c1fcc state: add tests for services table service and connect indexers 2021-03-29 15:42:22 -04:00
Daniel Nephin 9a3daf3100 state: use constant for tableServices 2021-03-29 15:42:18 -04:00
Daniel Nephin ec04df66bd state: remove duplication of Query indexer 2021-03-29 14:35:11 -04:00
Daniel Nephin 28866e48ad state: remove duplication in acl tables schema 2021-03-29 14:21:27 -04:00
Daniel Nephin c6a1ca701d state: reduce duplication in catalog table schema 2021-03-29 14:21:23 -04:00
Daniel Nephin d9dacb8388 state: share more indexer functions for config_entries 2021-03-29 14:21:20 -04:00
Daniel Nephin f303120f2d state: remove old schema test
This test has been replaced by TestNewDBSchema_Indexers
2021-03-29 14:21:13 -04:00
Daniel Nephin 150decff2a state: use addNamespaceIndex again 2021-03-29 14:21:02 -04:00
Daniel Nephin 4a3b462c28
Merge pull request #9911 from hashicorp/dnephin/state-index-acl-roles
state: convert ACLRoles policies index to new functional indexer pattern
2021-03-24 18:28:19 -04:00
Daniel Nephin 25b791ba47 state: add tests for checks.ID indexer 2021-03-22 18:06:43 -04:00
Daniel Nephin abbe5c3701 state: use tx.First instead of tx.FirstWatch
Where appropriate. After removing the helper function a bunch of  these calls can
be changed to tx.First.
2021-03-22 18:06:33 -04:00
Daniel Nephin 49938bc472 state: convert checks.ID index to new pattern 2021-03-22 18:06:08 -04:00
Hans Hasselberg 052662bcf9
introduce certopts (#9606)
* introduce cert opts

* it should be using the same signer

* lint and omit serial
2021-03-22 10:16:41 +01:00
Daniel Nephin 1d3fe64bba state: use uuid for acl-roles.policies index
Previously we were encoding the UUID as a string, but the index it references uses a UUID
so this index can also use an encoded UUID to save a bit of memory.
2021-03-19 19:45:37 -04:00
Daniel Nephin 3c01bb1156 state: convert acl-roles.policies index to new pattern 2021-03-19 19:45:37 -04:00
Daniel Nephin 474e95b9f5 state: convert acl-roles.name index to the functional indexer pattern 2021-03-19 19:45:37 -04:00
Daniel Nephin f836ed256b state: add indexer tests for acl-roles table 2021-03-19 19:45:37 -04:00
Daniel Nephin 6bc2c0e1ce state: use constants for acl-roles table and indexes 2021-03-19 19:45:37 -04:00
Daniel Nephin d4e02024fe state: convert acl-policies table to new pattern 2021-03-19 15:24:00 -04:00
Daniel Nephin 845a10354e state: use constants and add tests for acl-policies table 2021-03-19 15:19:57 -04:00
Daniel Nephin f6533a08f8 state: add indexer test for services.ID index 2021-03-19 14:13:14 -04:00
Daniel Nephin 1d1c03d0cd state: handle wildcard for services.ID index
When listing services, use the id_prefix directly if wildcards are allowed.

Error if a wildcard is used for a query that does not index the wildcard
2021-03-19 14:12:19 -04:00
Daniel Nephin bae69b2352 state: fix prefix index with the new pattern
Prefix queries are generally being used to match part of a partial
index. We can support these indexes by using a function that accept
different types for each subset of the index.

What I found interesting is that in the generic StringFieldIndexer the
implementation for PrefixFromArgs would remove the trailing null, but
at least in these 2 cases we actually want a null terminated string.
We simply want fewer components in the string.
2021-03-19 14:12:17 -04:00
Daniel Nephin ec50454fb3 state: move services.ID to new pattern 2021-03-19 14:11:59 -04:00
Daniel Nephin f5a52a4501 state: add tests for gateway-service table indexers 2021-03-18 12:09:42 -04:00
Daniel Nephin 66632538d8 state: use constants and remove wrapping
for GatewayServices table
2021-03-18 12:08:59 -04:00
Daniel Nephin d77bdd26c5 state: Move UpstreamDownstream to state package 2021-03-18 12:08:59 -04:00
Daniel Nephin ca3686f4aa state: add tests for mesh-topology table indexers 2021-03-18 12:08:57 -04:00
Daniel Nephin 8a1a11814d state: use constants for mesh-topology table operations 2021-03-18 12:08:03 -04:00
Freddy 8ac9f2521b
Merge pull request #9900 from hashicorp/ent-fixes
Fixup enterprise tests from tproxy changes
2021-03-18 08:33:30 -06:00
Freddy 28c29e6ab4
Merge pull request #9899 from hashicorp/wildcard-ixn-oss
Add methods to check intention has wildcard src or dst
2021-03-18 08:33:07 -06:00
freddygv b56bd690aa Fixup enterprise tests from tproxy changes 2021-03-17 23:05:00 -06:00
freddygv 1c46470a29 Add methods to check intention has wildcard src or dst 2021-03-17 22:15:48 -06:00
freddygv 6c43195e2a Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 0c8b618ca0 Temporarily silence spurious wakeup. Addressing false positive in beta. 2021-03-17 17:25:29 -06:00
freddygv 60690cf5c9 Merge remote-tracking branch 'origin/master' into intention-topology-endpoint 2021-03-17 17:14:38 -06:00
Freddy 63dcb7fa76
Add TransparentProxy option to proxy definitions 2021-03-17 17:01:45 -06:00
Freddy fb252e87a4
Add per-upstream configuration to service-defaults 2021-03-17 16:59:51 -06:00
freddygv 15a145b9f6 Add changelog and cleanup todo for beta 2021-03-17 16:45:13 -06:00
freddygv d19a5830dd Do not include consul as upstream or downstream 2021-03-17 13:40:04 -06:00
Daniel Nephin d2591312f8 state: add tests for config-entry indexers 2021-03-17 14:41:46 -04:00
Daniel Nephin 1b8f8b135e state: convert config-entries kind index to new pattern 2021-03-17 14:40:57 -04:00
Daniel Nephin bfcf463c3a state: remove config-entries namespace index
Use a prefix of the ID index instead.
2021-03-17 14:40:57 -04:00
Daniel Nephin dcbb1ba5dd state: remove unnecessary method receiver 2021-03-17 14:40:57 -04:00
Daniel Nephin b43977423f state: convert config-entries table to new indexer pattern
Using functional indexes to isolate enterprise differentiation and
remove reflection.
2021-03-17 14:40:57 -04:00
Daniel Nephin 98c32599e4
Merge pull request #9881 from hashicorp/dnephin/state-index-service-check-nodes
state: convert services.node and checks.node indexes
2021-03-17 14:12:02 -04:00
Daniel Nephin b771baa1f5
Merge pull request #9863 from hashicorp/dnephin/config-entry-kind-name
state: move ConfigEntryKindName
2021-03-17 14:09:39 -04:00
Daniel Nephin 0b3930272d state: convert services.node and checks.node indexes
Using NodeIdentity to share the indexes with both.
2021-03-16 13:00:31 -04:00
freddygv b79039c21c Prefix match type vars to match use 2021-03-16 09:49:24 -06:00
freddygv fed983fe9a Pass txn into service list queries 2021-03-16 09:33:08 -06:00
freddygv 26ba0c0fc8 Pass txn into intention match queries 2021-03-16 08:03:52 -06:00
freddygv d7f3bcc8bb Replace CertURI.Authorize() calls.
AuthorizeIntentionTarget is a generalized version of the old function,
and can be evaluated against sources or destinations.
2021-03-15 18:06:04 -06:00
freddygv eb6c0cbea0 Fixup typo, comments, and regression 2021-03-15 17:50:47 -06:00
freddygv 940b7a98d1 Finish cleanup from ServiceConfigRequest changes 2021-03-15 16:38:01 -06:00
Daniel Nephin 0b5dfee00a state: use runCase pattern for large test
The TestServiceHealthEventsFromChanges function was over 1400 lines.
Attempting to debug test failures in test functions this large is
difficult. It requires scrolling to the line which defines the testcase
because the failure message only includes the line number of the
assertion, not the line number of the test case.

This is an excellent example of where test tables stop working well, and
start being a problem. To mitigate this problem, the runCase pattern can
be used. When one of these tests fails, a failure message will print the
line number of both the test case and the assertion. This allows a
developer to quickly jump to both of the relevant lines, signficanting
reducing the time it takes to debug test failures.

For example, one such failure could look like this:

    catalog_events_test.go:1610: case: service reg, new node
    catalog_events_test.go:1605: assertion failed: values are not equal
2021-03-15 17:53:16 -04:00
freddygv 04fbc104cd Pass MeshGateway config in service config request
ResolveServiceConfig is called by service manager before the proxy
registration is in the catalog. Therefore we should pass proxy
registration flags in the request rather than trying to fetch
them from the state store (where they may not exist yet).
2021-03-15 14:32:13 -06:00
freddygv d90240d367 Restore old Envoy prefix on escape hatches
This is done because after removing ID and NodeName from
ServiceConfigRequest we will no longer know whether a request coming in
is for a Consul client earlier than v1.10.
2021-03-15 14:12:57 -06:00
freddygv 3b2169b36d Add RPC endpoint for intention upstreams 2021-03-15 08:50:35 -06:00
freddygv e4e14639b2 Add state store function for intention upstreams 2021-03-15 08:50:35 -06:00
freddygv 4976c000b7 Refactor IntentionDecision
This enables it to be called for many upstreams or downstreams of a
service while only querying intentions once.

Additionally, decisions are now optionally denied due to L7 permissions
being present. This enables the function to be used to filter for
potential upstreams/downstreams of a service.
2021-03-15 08:50:35 -06:00
Daniel Nephin 2a53b8293a proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin c33570be34 catalog_events: set the right key for connect snapshots 2021-03-12 11:35:43 -05:00
Daniel Nephin e2215d9f0f rpcclient: use streaming for connect health 2021-03-12 11:35:42 -05:00
Kyle Havlovitz 237b41ac8f
Merge pull request #9672 from hashicorp/ca-force-skip-xc
connect/ca: Allow ForceWithoutCrossSigning for all providers
2021-03-11 11:49:15 -08:00
freddygv 7a3625f58b Add TransparentProxy opt to proxy definition 2021-03-11 11:37:21 -07:00
freddygv c30157d2f2 Turn Limits and PassiveHealthChecks into pointers 2021-03-11 11:04:40 -07:00
freddygv b98abb6f09 Update server-side config resolution and client-side merging 2021-03-10 21:05:11 -07:00
Daniel Nephin 4877183bc6
Merge pull request #9797 from hashicorp/dnephin/state-index-node-id
state: convert nodes.ID to the new pattern of functional indexers
2021-03-10 17:34:23 -05:00
Daniel Nephin 51ad94360b state: move ConfigEntryKindName
Previously this type was defined in structs, but unlike the other types in structs this type
is not used by RPC requests. By moving it to state we can better indicate that this is not
an API type, but part of the state implementation.
2021-03-10 12:27:22 -05:00
Daniel Nephin 5c5ba9564d
Merge pull request #9796 from hashicorp/dnephin/state-cleanup-catalog-index-oss
state: remove duplicate tableCheck indexes
2021-03-10 12:20:09 -05:00
Daniel Nephin 94820e67a8 structs: remove EnterpriseMeta.GetNamespace
I added this recently without realizing that the method already existed and was named
NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault
as appropriate.
2021-03-09 15:17:26 -05:00
Daniel Nephin 97bc073bd9 state: adjust compare for catalog events
Document that this comparison should roughly match MatchesKey

Only sort by overrideKey or service name, but not both
Add namespace to the sort.

The client side also builds a map of these based on the namespace/node/service key, so the only order
that really matters is the ordering of register/dereigster events.
2021-03-09 14:00:36 -05:00
Daniel Nephin 0d3bb68255 state: handle terminating gateway events properly in snapshot
Refactored out a function that can be used for both the snapshot and stream of events to translate
an event into an appropriate connect event.

Previously terminating gateway events would have used the wrong key in the snapshot, which would have
caused them to be filtered out later on.

Also removed an unused function, and some commented out code.
2021-03-09 14:00:35 -05:00
Kyle Havlovitz de3fba8ef3 Add remaining terminating gateway tests for namespaces
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin 38aeb88908 Start to setup enterprise tests for terminating gateway streaming events.
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin d0b37f18f0 state: Add support for override of namespace
in MatchesKey
also tests for MatchesKey

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin ba59727337 state: update calls to ensureConfigEntryTxn
The EnterpriseMeta paramter was removed after this code was written, but before it merged.

Also the table name constant has changed.
2021-03-09 14:00:35 -05:00
Daniel Nephin 730cc575e6 state: add 2 more test cases for terminate gateway streaming events
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Kyle Havlovitz eadc8546a9 Added 6 new test cases for terminating gateway events
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin 15b0d5f62b state: Add two more tests for connect events with terminating gateways
And expand one test case to cover more.

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin abab373b89 state: Include the override key in the sorting of events
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Kyle Havlovitz f31582624d state: Add terminating gateway events on updating a config entry
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin f42a2ca8a3 state: add first terminating catalog catalog event
Health of a terminating gateway instance changes
- Generate an event for creating/destroying this instance of the terminating gateway,
  duplicate it for each affected service

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:33 -05:00
Daniel Nephin 1184ceff9e state: convert nodes.ID to new functional pattern
In preparation for adding other identifiers to the index.
2021-03-05 12:30:40 -05:00
Daniel Nephin 4a44cfd676
Merge pull request #9188 from hashicorp/dnephin/more-streaming-tests
Add more streaming tests
2021-02-26 12:36:55 -05:00
Daniel Nephin 4ef9578a07
Merge pull request #9703 from pierresouchay/streaming_tags_and_case_insensitive
Streaming filter tags + case insensitive lookups for Service Names
2021-02-26 12:06:26 -05:00
Daniel Nephin 2cc3282d5d catalog_events: set the right key for connect snapshots
Add a test for catalog_event snapshot on connect topic
2021-02-25 14:30:39 -05:00
Daniel Nephin 85da1af04c consul: Add integration tests of streaming.
Restored from streaming-rpc-final branch.

Co-authored-by: Paul Banks <banks@banksco.de>
2021-02-25 14:30:39 -05:00
Daniel Nephin e8beda4685 state: Add a test for ServiceHealthSnapshot 2021-02-25 14:08:10 -05:00
Daniel Nephin dd45c4cfe4 state: add a test case for memdb indexers 2021-02-19 17:14:46 -05:00
Daniel Nephin 7e4d693aaa state: support for functional indexers
These new functional indexers provide a few advantages:

1. enterprise differences can be isolated to a single function (the
   indexer function), making code easier to change
2. as a consequence of (1) we no longer need to wrap all the calls to
   Txn operations, making code easier to read.
3. by removing reflection we should increase the performance of all
   operations.

One important change is in making all the function signatures the same.

https://blog.golang.org/errors-are-values

An extra boolean return value for SingleIndexer.FromObject is superfluous.
The error value can indicate when the index value could not be created.
By removing this extra return value we can use the same signature for both
indexer functions.

This has the nice properly of a function being usable for both indexing operations.
2021-02-19 17:14:46 -05:00
Daniel Nephin 88a9bd6d3c state: remove duplicate index on the checks table
By using a new pattern for more specific indexes. This allows us to use
the same index for both service checks and node checks. It removes the
abstraction around memdb.Txn operations, and isolates all of the
enterprise differences in a single place (the indexer).
2021-02-19 17:14:46 -05:00
Daniel Nephin b781fec664 state: remove duplicate function
catalogChecksForNodeService was a duplicate of catalogListServiceChecks
2021-02-19 17:14:46 -05:00
Daniel Nephin d33bc493af
Merge pull request #9720 from hashicorp/dnephin/ent-meta-ergo-1
structs: rename EnterpriseMeta constructor
2021-02-16 15:31:58 -05:00
Daniel Nephin 53c82cee86
Merge pull request #9772 from hashicorp/streamin-fix-bad-cached-snapshot
streaming: fix snapshot cache bug
2021-02-16 15:28:00 -05:00
Daniel Nephin b17967827d
Merge pull request #9728 from hashicorp/dnephin/state-index-table
state: document how index table is used
2021-02-16 15:27:27 -05:00
Daniel Nephin c40d063a0e structs: rename EnterpriseMeta constructor
To match the Go convention.
2021-02-16 14:45:43 -05:00
Daniel Nephin a29b848e3b stream: fix a snapshot cache bug
Previously a snapshot created as part of a resumse-stream request could have incorrectly
cached the newSnapshotToFollow event. This would cause clients to error because they
received an unexpected framing event.
2021-02-16 12:52:23 -05:00
Daniel Nephin 2726c65fbe stream: test the snapshot cache is saved correctly
when the cache entry is created from resuming a stream.
2021-02-16 12:08:43 -05:00
R.B. Boyer 91d9544803
connect: connect CA Roots in the primary datacenter should use a SigningKeyID derived from their local intermediate (#9428)
This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513
2021-02-08 13:18:51 -06:00
Daniel Nephin cdda3b9321 state: Use the tableIndex constant 2021-02-05 18:37:45 -05:00
Daniel Nephin de841bd459 state: Document index table
And move the IndexEntry (which is stored in the table) next to the table
schema definition.
2021-02-05 18:37:45 -05:00
Daniel Nephin 23cfbc8f8d
Merge pull request #9719 from hashicorp/oss/state-store-4
state: remove registerSchema
2021-02-05 14:02:38 -05:00
Daniel Nephin dc70f583d4
Merge pull request #9718 from hashicorp/oss/dnephin/ent-meta-in-state-store-3
state: convert all table name constants to the new prefix pattern
2021-02-05 14:02:07 -05:00
Daniel Nephin eb5d71fd19
Merge pull request #9665 from hashicorp/dnephin/state-store-indexes-2
state: move config-entries table definition to config_entries_schema.go
2021-02-05 14:01:08 -05:00
Daniel Nephin 9beadc578b
Merge pull request #9664 from hashicorp/dnephin/state-store-indexes
state: move ACL schema and index definitions to acl_schema.go
2021-02-05 13:38:31 -05:00
Daniel Nephin b747b27afd state: remove the need for registerSchema
registerSchema creates some indirection which is not necessary in this
case. newDBSchema can call each of the tables.

Enterprise tables can be added from the existing withEnterpriseSchema
shim.
2021-02-05 12:19:56 -05:00
Daniel Nephin 33621706ac state: rename table name constants to use pattern
the 'table' prefix is shorter, and also reads better in queries.
2021-02-05 12:12:19 -05:00
Daniel Nephin 8569295116 state: rename connect constants 2021-02-05 12:12:19 -05:00
Daniel Nephin afdbf2a8ef state: rename table name constants to new pattern
Using Apps Hungarian Notation for these constants makes the memdb queries more readable.
2021-02-05 12:12:18 -05:00
Pierre Souchay c466b08481 Streaming filter tags + case insensitive lookups for Service Names
Will fix:
 * https://github.com/hashicorp/consul/issues/9695
 * https://github.com/hashicorp/consul/issues/9702
2021-02-04 11:00:51 +01:00
Daniel Nephin f929a7117e state: Remove unnecessary entMeta arg to EnsureConfigEntry 2021-02-03 18:10:38 -05:00
Kyle Havlovitz 1dee4173c1 connect/ca: Allow ForceWithoutCrossSigning for all providers
This allows setting ForceWithoutCrossSigning when reconfiguring the CA
for any provider, in order to forcibly move to a new root in cases where
the old provider isn't reachable or able to cross-sign for whatever
reason.
2021-01-29 13:38:11 -08:00
Daniel Nephin 09425b22a1 state: rename config-entries table const to match new pattern 2021-01-28 20:34:34 -05:00
Daniel Nephin 7d17e20270 state: move config-entries table to new pattern 2021-01-28 20:34:15 -05:00
Daniel Nephin 825b8ade39 state: use indexID
this change was already made to enterprise, so backporting it.
2021-01-28 20:30:08 -05:00
Daniel Nephin 2a262f07fc state: Move ACL schema indexes to match Ent
and use constants for table and index names.
2021-01-28 20:05:09 -05:00
Matt Keeler 1379b5f7d6
Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644)
Fixes: 9626
2021-01-27 11:14:52 -05:00
Hans Hasselberg 623aab5880
Add flags to support CA generation for Connect (#9585) 2021-01-27 08:52:15 +01:00
R.B. Boyer 5777fa1f59
server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 17:30:38 -06:00
Daniel Nephin d7d081f402
Merge pull request #9420 from hashicorp/dnephin/reduce-duplicate-in-catalog-schema
state: reduce interface for Enterprise schema
2021-01-25 17:04:25 -05:00
R.B. Boyer 6622185d64
server: use the presense of stored federation state data as a sign that we already activated the federation state feature flag (#9519)
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
2021-01-25 13:24:32 -06:00
R.B. Boyer 0247f409a0
server: when wan federating via mesh gateways only do heuristic primary DC bypass on the leader (#9366)
Fixes #9341
2021-01-22 10:03:24 -06:00
Freddy 5519051c84
Update topology mapping Refs on all proxy instance deletions (#9589)
* Insert new upstream/downstream mapping to persist new Refs

* Avoid upserting mapping copy if it's a no-op

* Add test with panic repro

* Avoid deleting up/downstreams from inside memdb iterator

* Avoid deleting gateway mappings from inside memdb iterator

* Add CHANGELOG entry

* Tweak changelog entry

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 15:17:26 +00:00
Daniel Nephin 979749d86e state: do not delete from inside an iteration
Deleting from memdb inside an interation can cause a panic from Iterator.Next. This
case is technically safe (for now) because the iterator is using the root radix tree
not a modified one.

However this could break at any time if someone adds an insert or delete to the coordinates table
before this place in the function.

It also sets a bad example, because generally deletes in an interator are not safe. So this
commit uses the pattern we have in other places to move the deletes out of the iteration.
2021-01-19 17:00:07 -05:00
Matt Keeler 2d2ce1fb0c
Ensure that CA initialization does not block leader election.
After fixing that bug I uncovered a couple more:

Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
2021-01-19 15:27:48 -05:00
Daniel Nephin 52a1d78e39 state: add a regression test for state store schema
To allow the index to be refactored without accidental changes.

To update the expected value run: 'go test ./agent/consul/state -update'
2021-01-15 18:49:55 -05:00
Daniel Nephin aa21c1ea04 state: reduce interface for Enterprise schema
Using withEnterpriseSchema() we can apply any enterprise schema changes
with a single shim, removing the need to duplicate all of the table
definitions.

Also move all the catalog schemas to a new file to shrink catalog.go a bit.
2021-01-15 18:49:55 -05:00
Daniel Nephin e8427a48ab agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin e5320c2db6 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00
Daniel Nephin f2b504873a
Merge pull request #9460 from hashicorp/dnephin/fix-data-races
Fix a couple data races in tests
2021-01-14 17:07:01 -05:00
Chris Piraino baad708929
Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 15:31:47 -06:00
Chris Piraino 2eac571276
Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-01-08 14:03:06 -06:00
Daniel Nephin 45f0afcbf4 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Daniel Nephin 27c38bfebb
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
R.B. Boyer db62541676
acl: use the presence of a management policy in the state store as a sign that we already migrated to v2 acls (#9505)
This way we only have to wait for the serf barrier to pass once before
we can upgrade to v2 acls. Without this patch every restart needs to
re-compute the change, and potentially if a stray older node joins after
a migration it might regress back to v1 mode which would be problematic.
2021-01-05 17:04:27 -06:00
Matt Keeler 3a79b559f9
Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487)
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
2021-01-04 14:05:23 -05:00
R.B. Boyer 42dea6f01e
server: deletions of intentions by name using the intention API is now idempotent (#9278)
Restoring a behavior inadvertently changed while fixing #9254
2021-01-04 11:27:00 -06:00
Daniel Nephin 088831c91e Maybe fix another data race in a test 2020-12-22 18:53:54 -05:00
Daniel Nephin d0f2eca8de Fix one race caused by t.Parallel 2020-12-22 18:27:18 -05:00
Daniel Nephin c66a63275f
Merge pull request #9340 from hashicorp/dnephin/skip-slow-tests-with-short
testing: skip slow tests with -short
2020-12-11 13:33:44 -05:00
R.B. Boyer f9dcaf7f6b
acl: global tokens created by auth methods now correctly replicate to secondary datacenters (#9351)
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
2020-12-09 15:22:29 -06:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Kyle Havlovitz 57210a59c3 connect: Fix a case where the active root would get unset even when there wasn't a new one 2020-12-02 11:42:23 -08:00
Kyle Havlovitz 91d5d6c586
Merge pull request #9009 from hashicorp/update-secondary-ca
connect: Fix an issue with updating CA config in a secondary datacenter
2020-11-30 14:49:28 -08:00
Kyle Havlovitz c5167cf9c4 Use a buffered channel for CA intermediate renew func 2020-11-30 14:37:24 -08:00
R.B. Boyer 6d6b6c15c6
server: fix panic when deleting a non existent intention (#9254)
* server: fix panic when deleting a non existent intention

* add changelog

* Always return an error when deleting non-existent ixn

Co-authored-by: freddygv <gh@freddygv.xyz>
2020-11-24 13:44:20 -05:00
Hans Hasselberg 25f9e232af add missing descriptions for metrics 2020-11-23 22:06:30 +01:00
Kit Patella 7a8844ccce add entries for missing fsm operations and mark duplicated metrics prefixes as deprecated 2020-11-23 12:42:51 -08:00
Kyle Havlovitz a01f853aa5 Clean up the logic in persistNewRootAndConfig 2020-11-20 15:54:44 -08:00
Kyle Havlovitz 26a9c985c5 Add CA server delegate interface for testing 2020-11-19 20:08:06 -08:00
Kit Patella 4ad076207e add telemetry and definition help entries for missing catalog and acl metrics 2020-11-19 13:29:44 -08:00
Kit Patella 46205bbf27 remove stale entries and rename/define acl.resolveToken 2020-11-19 13:06:28 -08:00
Freddy e4e306210a
Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 10:14:48 -07:00
Kyle Havlovitz c8d4a40a87 connect: update some function comments in CA manager 2020-11-17 16:00:19 -08:00
Daniel Nephin b9306d8827 acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Daniel Nephin 9e7c8dd19d Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Matt Keeler 4bca029be9
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Kit Patella 4dfcdbab26
Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-16 15:54:50 -08:00
Matt Keeler 197a37a860
Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 17:08:17 -05:00
Daniel Nephin de88ceed1c
Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 14:20:07 -05:00
Kit Patella 0b18f5612e trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 374748dafc merge master 2020-11-16 10:46:53 -08:00
Kit Patella af719981f3 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kyle Havlovitz 0a86533e20 Reorganize some CA manager code for correctness/readability 2020-11-13 14:46:01 -08:00
Kyle Havlovitz 5de81c1375 connect: Add CAManager for synchronizing CA operations 2020-11-13 14:33:44 -08:00
Kyle Havlovitz 0b4876f906 connect: Add logic for updating secondary DC intermediate on config set 2020-11-13 14:33:44 -08:00
R.B. Boyer db1184c094
server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 16:19:12 -06:00
Kit Patella b486c1bce8 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
R.B. Boyer e323014faf
server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 14:42:21 -06:00
R.B. Boyer 6300abed18
server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 13:56:41 -06:00
Mike Morris a343365da7
ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 13:02:59 -05:00
R.B. Boyer 758384893d
server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 09:15:39 -06:00
Kit Patella 9533372ded first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
R.B. Boyer a5bd1ba323
agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 10:38:32 -06:00
Matt Keeler 2badb01d30
Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 11:21:42 -05:00
Matt Keeler 1f40f51a58
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
Matt Keeler 755fb72994
Switch to using the external autopilot module 2020-11-09 09:22:11 -05:00
Daniel Nephin e4a78c977d stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
Daniel Nephin 4fc073b1f4 stream: rename FilterByKey 2020-11-05 19:21:16 -05:00
Daniel Nephin d4cd2fa6a8 stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 8a26bca020 stream: move event filtering to PayloadEvents
Removes the weirdness around PayloadEvents.FilterByKey
2020-11-05 17:50:17 -05:00
Daniel Nephin dcacfd3548 stream: Remove unused method 2020-11-05 16:49:59 -05:00
Daniel Nephin 621f1db766
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin cd220e5d6c
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
Daniel Nephin f6b629852f state: test EventPayloadCheckServiceNode.FilterByKey
Also fix a bug in that function when only one of key or namespace were the empty string.
2020-10-30 14:35:57 -04:00
Daniel Nephin 60df44df4f stream: Add tests for filterByKey with namespace
And fix a bug where a request with a Namespace but no Key would not be properly filtered
2020-10-30 14:35:42 -04:00
Daniel Nephin 318dfbe6e4 stream: Move FilterByKey events to a table
In preparation for adding new tests.
2020-10-30 14:35:28 -04:00
Daniel Nephin 2d0030da39 state: use enterprise meta for creating events 2020-10-30 14:34:04 -04:00
Daniel Nephin b57c7afcbb stream: include the namespace in the snap cache key
Otherwise the wrong snapshot could be returned when the same key is used in different namespaces
2020-10-30 14:34:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
R.B. Boyer 67a0d0c426
state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 15:28:31 -05:00
R.B. Boyer 78014653b3 restore prior signature of test helper so enterprise compiles 2020-10-29 13:52:15 -05:00
Daniel Nephin 61ce0964a4 stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin 8ef4c0fcc5 state: use go-cmp for comparison
The output of the previous assertions made it impossible to debug the tests without code changes.

With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
2020-10-28 16:33:00 -04:00
Daniel Nephin 44da869ed4 stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin eea87e1acf store: use a ReadDB for snapshots
to remove the cyclic dependency between the snapshot handlers and the state.Store
2020-10-28 13:07:42 -04:00
Daniel Nephin cfe0ffde15
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin 03d2be03e7
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Daniel Nephin abd8cfcfe9 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
R.B. Boyer 0a80e82f21
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin f9b2834171 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin 26387cdc0e
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy d23038f94f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
Daniel Nephin fb8b68a6ec stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Pierre Souchay 54f9f247f8
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
s-christoff a62705101f
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00
Kyle Havlovitz 707f4a8d26 Stop intermediate renew routine on leader stop 2020-10-09 12:30:57 -07:00
Kyle Havlovitz 926a393a5c
Merge pull request #8784 from hashicorp/renew-intermediate-primary
connect: Enable renewing the intermediate cert in the primary DC
2020-10-09 12:18:59 -07:00
Daniel Nephin dd0e8d42c4
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Chris Piraino 4f77f87065
Emit service usage metrics with correct labeling strategy (#8856)
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
2020-10-09 11:01:45 -05:00
Kyle Havlovitz 50543d678e Fix intermediate refresh test comments 2020-10-09 08:53:33 -07:00
R.B. Boyer d2f09ca306
upstream some differences from enterprise (#8902) 2020-10-09 09:42:53 -05:00
Kyle Havlovitz 968fd8660d Update CI for leader renew CA test using Vault 2020-10-09 05:48:15 -07:00
Kyle Havlovitz 62270c3f9a
Merge branch 'master' into renew-intermediate-primary 2020-10-09 04:40:34 -07:00
Kyle Havlovitz b78f618beb connect: Check for expired root cert when cross-signing 2020-10-09 04:35:56 -07:00
Freddy 89d52f41c4
Add protocol to the topology endpoint response (#8868) 2020-10-08 17:31:54 -06:00
Matt Keeler 141eb60f06
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Daniel Nephin 05df7b18a9 config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Freddy de4af766f3
Support ingress gateways in mesh viz endpoint (#8864)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-08 09:47:09 -06:00
Daniel Nephin a94fe054f0
Merge pull request #8809 from hashicorp/streaming/materialize-view
Add StreamingHealthServices cache-type
2020-10-07 21:26:38 -04:00
Daniel Nephin e0236b5a9f
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin 783627aeef
Merge pull request #8768 from hashicorp/streaming/add-subscribe-service
subscribe: add subscribe service for streaming change events
2020-10-07 21:24:03 -04:00
Freddy 7d1f50d2e6
Return intention info in svc topology endpoint (#8853) 2020-10-07 18:35:34 -06:00
R.B. Boyer 35c4efd220
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
R.B. Boyer d6dce2332a
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Daniel Nephin 83401194ab streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin f857aef4a8 submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin ad29cf4f94 stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin fa115c6249 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00
Daniel Nephin 011109a6f6 subscirbe: extract streamID and logging from Subscribe
By extracting all of the tracing logic the core logic of the Subscribe
endpoint is much easier to read.
2020-10-06 12:49:35 -04:00
Daniel Nephin 4c4441997a subscribe: add integration test for acl token updates 2020-10-06 12:49:35 -04:00
Daniel Nephin 371ec2d70a subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin ae433947a4
Merge pull request #8799 from hashicorp/streaming/rename-framing-events
stream: remove EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-06 12:42:58 -04:00
R.B. Boyer a77b518542
server: create new memdb table for storing system metadata (#8703)
This adds a new very tiny memdb table and corresponding raft operation
for updating a very small effective map[string]string collection of
"system metadata". This can persistently record a fact about the Consul
state machine itself.

The first use of this feature will come in a later PR.
2020-10-06 10:08:37 -05:00
Daniel Nephin 2706cf9b2a
Merge pull request #8802 from hashicorp/dnephin/extract-lib-retry
lib/retry - extract a new package from lib/retry.go
2020-10-05 14:22:37 -04:00
freddygv 82a17ccee6 Do not evaluate discovery chain for topology upstreams 2020-10-05 10:24:50 -06:00
freddygv 63c50e15bc Single DB txn for ServiceTopology and other PR comments 2020-10-05 10:24:50 -06:00
freddygv 263bd9dd92 Add topology HTTP endpoint 2020-10-05 10:24:50 -06:00
freddygv 7c11580e93 Add topology RPC endpoint 2020-10-05 10:24:50 -06:00
freddygv 21c4708fe9 Add topology ACL filter 2020-10-05 10:24:50 -06:00
freddygv ac54bf99b3 Add func to combine up+downstream queries 2020-10-05 10:24:50 -06:00
freddygv 160a6539d1 factor in discovery chain when querying up/downstreams 2020-10-05 10:24:50 -06:00
freddygv 214b25919f support querying upstreams/downstreams from registrations 2020-10-05 10:24:50 -06:00
freddygv 3653045cb0 Add method for downstreams from disco chain 2020-10-05 10:24:50 -06:00
Daniel Nephin 40aac46cf4 lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin 0c7f9c72d7 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Daniel Nephin 9c5181c897 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin 0769f54fe1 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 5ef630f664
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
R.B. Boyer e84d52ba3a
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer ccd0200bd9
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2956313f2d connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00