Commit Graph

1823 Commits

Author SHA1 Message Date
Daniel Nephin 25b791ba47 state: add tests for checks.ID indexer 2021-03-22 18:06:43 -04:00
Daniel Nephin abbe5c3701 state: use tx.First instead of tx.FirstWatch
Where appropriate. After removing the helper function a bunch of  these calls can
be changed to tx.First.
2021-03-22 18:06:33 -04:00
Daniel Nephin 49938bc472 state: convert checks.ID index to new pattern 2021-03-22 18:06:08 -04:00
Hans Hasselberg 052662bcf9
introduce certopts (#9606)
* introduce cert opts

* it should be using the same signer

* lint and omit serial
2021-03-22 10:16:41 +01:00
Daniel Nephin 1d3fe64bba state: use uuid for acl-roles.policies index
Previously we were encoding the UUID as a string, but the index it references uses a UUID
so this index can also use an encoded UUID to save a bit of memory.
2021-03-19 19:45:37 -04:00
Daniel Nephin 3c01bb1156 state: convert acl-roles.policies index to new pattern 2021-03-19 19:45:37 -04:00
Daniel Nephin 474e95b9f5 state: convert acl-roles.name index to the functional indexer pattern 2021-03-19 19:45:37 -04:00
Daniel Nephin f836ed256b state: add indexer tests for acl-roles table 2021-03-19 19:45:37 -04:00
Daniel Nephin 6bc2c0e1ce state: use constants for acl-roles table and indexes 2021-03-19 19:45:37 -04:00
Daniel Nephin d4e02024fe state: convert acl-policies table to new pattern 2021-03-19 15:24:00 -04:00
Daniel Nephin 845a10354e state: use constants and add tests for acl-policies table 2021-03-19 15:19:57 -04:00
Daniel Nephin f6533a08f8 state: add indexer test for services.ID index 2021-03-19 14:13:14 -04:00
Daniel Nephin 1d1c03d0cd state: handle wildcard for services.ID index
When listing services, use the id_prefix directly if wildcards are allowed.

Error if a wildcard is used for a query that does not index the wildcard
2021-03-19 14:12:19 -04:00
Daniel Nephin bae69b2352 state: fix prefix index with the new pattern
Prefix queries are generally being used to match part of a partial
index. We can support these indexes by using a function that accept
different types for each subset of the index.

What I found interesting is that in the generic StringFieldIndexer the
implementation for PrefixFromArgs would remove the trailing null, but
at least in these 2 cases we actually want a null terminated string.
We simply want fewer components in the string.
2021-03-19 14:12:17 -04:00
Daniel Nephin ec50454fb3 state: move services.ID to new pattern 2021-03-19 14:11:59 -04:00
Daniel Nephin f5a52a4501 state: add tests for gateway-service table indexers 2021-03-18 12:09:42 -04:00
Daniel Nephin 66632538d8 state: use constants and remove wrapping
for GatewayServices table
2021-03-18 12:08:59 -04:00
Daniel Nephin d77bdd26c5 state: Move UpstreamDownstream to state package 2021-03-18 12:08:59 -04:00
Daniel Nephin ca3686f4aa state: add tests for mesh-topology table indexers 2021-03-18 12:08:57 -04:00
Daniel Nephin 8a1a11814d state: use constants for mesh-topology table operations 2021-03-18 12:08:03 -04:00
Freddy 8ac9f2521b
Merge pull request #9900 from hashicorp/ent-fixes
Fixup enterprise tests from tproxy changes
2021-03-18 08:33:30 -06:00
Freddy 28c29e6ab4
Merge pull request #9899 from hashicorp/wildcard-ixn-oss
Add methods to check intention has wildcard src or dst
2021-03-18 08:33:07 -06:00
freddygv b56bd690aa Fixup enterprise tests from tproxy changes 2021-03-17 23:05:00 -06:00
freddygv 1c46470a29 Add methods to check intention has wildcard src or dst 2021-03-17 22:15:48 -06:00
freddygv 6c43195e2a Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 0c8b618ca0 Temporarily silence spurious wakeup. Addressing false positive in beta. 2021-03-17 17:25:29 -06:00
freddygv 60690cf5c9 Merge remote-tracking branch 'origin/master' into intention-topology-endpoint 2021-03-17 17:14:38 -06:00
Freddy 63dcb7fa76
Add TransparentProxy option to proxy definitions 2021-03-17 17:01:45 -06:00
Freddy fb252e87a4
Add per-upstream configuration to service-defaults 2021-03-17 16:59:51 -06:00
freddygv 15a145b9f6 Add changelog and cleanup todo for beta 2021-03-17 16:45:13 -06:00
freddygv d19a5830dd Do not include consul as upstream or downstream 2021-03-17 13:40:04 -06:00
Daniel Nephin d2591312f8 state: add tests for config-entry indexers 2021-03-17 14:41:46 -04:00
Daniel Nephin 1b8f8b135e state: convert config-entries kind index to new pattern 2021-03-17 14:40:57 -04:00
Daniel Nephin bfcf463c3a state: remove config-entries namespace index
Use a prefix of the ID index instead.
2021-03-17 14:40:57 -04:00
Daniel Nephin dcbb1ba5dd state: remove unnecessary method receiver 2021-03-17 14:40:57 -04:00
Daniel Nephin b43977423f state: convert config-entries table to new indexer pattern
Using functional indexes to isolate enterprise differentiation and
remove reflection.
2021-03-17 14:40:57 -04:00
Daniel Nephin 98c32599e4
Merge pull request #9881 from hashicorp/dnephin/state-index-service-check-nodes
state: convert services.node and checks.node indexes
2021-03-17 14:12:02 -04:00
Daniel Nephin b771baa1f5
Merge pull request #9863 from hashicorp/dnephin/config-entry-kind-name
state: move ConfigEntryKindName
2021-03-17 14:09:39 -04:00
Daniel Nephin 0b3930272d state: convert services.node and checks.node indexes
Using NodeIdentity to share the indexes with both.
2021-03-16 13:00:31 -04:00
freddygv b79039c21c Prefix match type vars to match use 2021-03-16 09:49:24 -06:00
freddygv fed983fe9a Pass txn into service list queries 2021-03-16 09:33:08 -06:00
freddygv 26ba0c0fc8 Pass txn into intention match queries 2021-03-16 08:03:52 -06:00
freddygv d7f3bcc8bb Replace CertURI.Authorize() calls.
AuthorizeIntentionTarget is a generalized version of the old function,
and can be evaluated against sources or destinations.
2021-03-15 18:06:04 -06:00
freddygv eb6c0cbea0 Fixup typo, comments, and regression 2021-03-15 17:50:47 -06:00
freddygv 940b7a98d1 Finish cleanup from ServiceConfigRequest changes 2021-03-15 16:38:01 -06:00
Daniel Nephin 0b5dfee00a state: use runCase pattern for large test
The TestServiceHealthEventsFromChanges function was over 1400 lines.
Attempting to debug test failures in test functions this large is
difficult. It requires scrolling to the line which defines the testcase
because the failure message only includes the line number of the
assertion, not the line number of the test case.

This is an excellent example of where test tables stop working well, and
start being a problem. To mitigate this problem, the runCase pattern can
be used. When one of these tests fails, a failure message will print the
line number of both the test case and the assertion. This allows a
developer to quickly jump to both of the relevant lines, signficanting
reducing the time it takes to debug test failures.

For example, one such failure could look like this:

    catalog_events_test.go:1610: case: service reg, new node
    catalog_events_test.go:1605: assertion failed: values are not equal
2021-03-15 17:53:16 -04:00
freddygv 04fbc104cd Pass MeshGateway config in service config request
ResolveServiceConfig is called by service manager before the proxy
registration is in the catalog. Therefore we should pass proxy
registration flags in the request rather than trying to fetch
them from the state store (where they may not exist yet).
2021-03-15 14:32:13 -06:00
freddygv d90240d367 Restore old Envoy prefix on escape hatches
This is done because after removing ID and NodeName from
ServiceConfigRequest we will no longer know whether a request coming in
is for a Consul client earlier than v1.10.
2021-03-15 14:12:57 -06:00
freddygv 3b2169b36d Add RPC endpoint for intention upstreams 2021-03-15 08:50:35 -06:00
freddygv e4e14639b2 Add state store function for intention upstreams 2021-03-15 08:50:35 -06:00
freddygv 4976c000b7 Refactor IntentionDecision
This enables it to be called for many upstreams or downstreams of a
service while only querying intentions once.

Additionally, decisions are now optionally denied due to L7 permissions
being present. This enables the function to be used to filter for
potential upstreams/downstreams of a service.
2021-03-15 08:50:35 -06:00
Daniel Nephin 2a53b8293a proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin c33570be34 catalog_events: set the right key for connect snapshots 2021-03-12 11:35:43 -05:00
Daniel Nephin e2215d9f0f rpcclient: use streaming for connect health 2021-03-12 11:35:42 -05:00
Kyle Havlovitz 237b41ac8f
Merge pull request #9672 from hashicorp/ca-force-skip-xc
connect/ca: Allow ForceWithoutCrossSigning for all providers
2021-03-11 11:49:15 -08:00
freddygv 7a3625f58b Add TransparentProxy opt to proxy definition 2021-03-11 11:37:21 -07:00
freddygv c30157d2f2 Turn Limits and PassiveHealthChecks into pointers 2021-03-11 11:04:40 -07:00
freddygv b98abb6f09 Update server-side config resolution and client-side merging 2021-03-10 21:05:11 -07:00
Daniel Nephin 4877183bc6
Merge pull request #9797 from hashicorp/dnephin/state-index-node-id
state: convert nodes.ID to the new pattern of functional indexers
2021-03-10 17:34:23 -05:00
Daniel Nephin 51ad94360b state: move ConfigEntryKindName
Previously this type was defined in structs, but unlike the other types in structs this type
is not used by RPC requests. By moving it to state we can better indicate that this is not
an API type, but part of the state implementation.
2021-03-10 12:27:22 -05:00
Daniel Nephin 5c5ba9564d
Merge pull request #9796 from hashicorp/dnephin/state-cleanup-catalog-index-oss
state: remove duplicate tableCheck indexes
2021-03-10 12:20:09 -05:00
Daniel Nephin 94820e67a8 structs: remove EnterpriseMeta.GetNamespace
I added this recently without realizing that the method already existed and was named
NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault
as appropriate.
2021-03-09 15:17:26 -05:00
Daniel Nephin 97bc073bd9 state: adjust compare for catalog events
Document that this comparison should roughly match MatchesKey

Only sort by overrideKey or service name, but not both
Add namespace to the sort.

The client side also builds a map of these based on the namespace/node/service key, so the only order
that really matters is the ordering of register/dereigster events.
2021-03-09 14:00:36 -05:00
Daniel Nephin 0d3bb68255 state: handle terminating gateway events properly in snapshot
Refactored out a function that can be used for both the snapshot and stream of events to translate
an event into an appropriate connect event.

Previously terminating gateway events would have used the wrong key in the snapshot, which would have
caused them to be filtered out later on.

Also removed an unused function, and some commented out code.
2021-03-09 14:00:35 -05:00
Kyle Havlovitz de3fba8ef3 Add remaining terminating gateway tests for namespaces
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin 38aeb88908 Start to setup enterprise tests for terminating gateway streaming events.
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin d0b37f18f0 state: Add support for override of namespace
in MatchesKey
also tests for MatchesKey

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:35 -05:00
Daniel Nephin ba59727337 state: update calls to ensureConfigEntryTxn
The EnterpriseMeta paramter was removed after this code was written, but before it merged.

Also the table name constant has changed.
2021-03-09 14:00:35 -05:00
Daniel Nephin 730cc575e6 state: add 2 more test cases for terminate gateway streaming events
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Kyle Havlovitz eadc8546a9 Added 6 new test cases for terminating gateway events
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin 15b0d5f62b state: Add two more tests for connect events with terminating gateways
And expand one test case to cover more.

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin abab373b89 state: Include the override key in the sorting of events
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:34 -05:00
Kyle Havlovitz f31582624d state: Add terminating gateway events on updating a config entry
Co-Authored-By: Daniel Nephin <dnephin@hashicorp.com>
2021-03-09 14:00:34 -05:00
Daniel Nephin f42a2ca8a3 state: add first terminating catalog catalog event
Health of a terminating gateway instance changes
- Generate an event for creating/destroying this instance of the terminating gateway,
  duplicate it for each affected service

Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
2021-03-09 14:00:33 -05:00
Daniel Nephin 1184ceff9e state: convert nodes.ID to new functional pattern
In preparation for adding other identifiers to the index.
2021-03-05 12:30:40 -05:00
Daniel Nephin 4a44cfd676
Merge pull request #9188 from hashicorp/dnephin/more-streaming-tests
Add more streaming tests
2021-02-26 12:36:55 -05:00
Daniel Nephin 4ef9578a07
Merge pull request #9703 from pierresouchay/streaming_tags_and_case_insensitive
Streaming filter tags + case insensitive lookups for Service Names
2021-02-26 12:06:26 -05:00
Daniel Nephin 2cc3282d5d catalog_events: set the right key for connect snapshots
Add a test for catalog_event snapshot on connect topic
2021-02-25 14:30:39 -05:00
Daniel Nephin 85da1af04c consul: Add integration tests of streaming.
Restored from streaming-rpc-final branch.

Co-authored-by: Paul Banks <banks@banksco.de>
2021-02-25 14:30:39 -05:00
Daniel Nephin e8beda4685 state: Add a test for ServiceHealthSnapshot 2021-02-25 14:08:10 -05:00
Daniel Nephin dd45c4cfe4 state: add a test case for memdb indexers 2021-02-19 17:14:46 -05:00
Daniel Nephin 7e4d693aaa state: support for functional indexers
These new functional indexers provide a few advantages:

1. enterprise differences can be isolated to a single function (the
   indexer function), making code easier to change
2. as a consequence of (1) we no longer need to wrap all the calls to
   Txn operations, making code easier to read.
3. by removing reflection we should increase the performance of all
   operations.

One important change is in making all the function signatures the same.

https://blog.golang.org/errors-are-values

An extra boolean return value for SingleIndexer.FromObject is superfluous.
The error value can indicate when the index value could not be created.
By removing this extra return value we can use the same signature for both
indexer functions.

This has the nice properly of a function being usable for both indexing operations.
2021-02-19 17:14:46 -05:00
Daniel Nephin 88a9bd6d3c state: remove duplicate index on the checks table
By using a new pattern for more specific indexes. This allows us to use
the same index for both service checks and node checks. It removes the
abstraction around memdb.Txn operations, and isolates all of the
enterprise differences in a single place (the indexer).
2021-02-19 17:14:46 -05:00
Daniel Nephin b781fec664 state: remove duplicate function
catalogChecksForNodeService was a duplicate of catalogListServiceChecks
2021-02-19 17:14:46 -05:00
Daniel Nephin d33bc493af
Merge pull request #9720 from hashicorp/dnephin/ent-meta-ergo-1
structs: rename EnterpriseMeta constructor
2021-02-16 15:31:58 -05:00
Daniel Nephin 53c82cee86
Merge pull request #9772 from hashicorp/streamin-fix-bad-cached-snapshot
streaming: fix snapshot cache bug
2021-02-16 15:28:00 -05:00
Daniel Nephin b17967827d
Merge pull request #9728 from hashicorp/dnephin/state-index-table
state: document how index table is used
2021-02-16 15:27:27 -05:00
Daniel Nephin c40d063a0e structs: rename EnterpriseMeta constructor
To match the Go convention.
2021-02-16 14:45:43 -05:00
Daniel Nephin a29b848e3b stream: fix a snapshot cache bug
Previously a snapshot created as part of a resumse-stream request could have incorrectly
cached the newSnapshotToFollow event. This would cause clients to error because they
received an unexpected framing event.
2021-02-16 12:52:23 -05:00
Daniel Nephin 2726c65fbe stream: test the snapshot cache is saved correctly
when the cache entry is created from resuming a stream.
2021-02-16 12:08:43 -05:00
R.B. Boyer 91d9544803
connect: connect CA Roots in the primary datacenter should use a SigningKeyID derived from their local intermediate (#9428)
This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513
2021-02-08 13:18:51 -06:00
Daniel Nephin cdda3b9321 state: Use the tableIndex constant 2021-02-05 18:37:45 -05:00
Daniel Nephin de841bd459 state: Document index table
And move the IndexEntry (which is stored in the table) next to the table
schema definition.
2021-02-05 18:37:45 -05:00
Daniel Nephin 23cfbc8f8d
Merge pull request #9719 from hashicorp/oss/state-store-4
state: remove registerSchema
2021-02-05 14:02:38 -05:00
Daniel Nephin dc70f583d4
Merge pull request #9718 from hashicorp/oss/dnephin/ent-meta-in-state-store-3
state: convert all table name constants to the new prefix pattern
2021-02-05 14:02:07 -05:00
Daniel Nephin eb5d71fd19
Merge pull request #9665 from hashicorp/dnephin/state-store-indexes-2
state: move config-entries table definition to config_entries_schema.go
2021-02-05 14:01:08 -05:00
Daniel Nephin 9beadc578b
Merge pull request #9664 from hashicorp/dnephin/state-store-indexes
state: move ACL schema and index definitions to acl_schema.go
2021-02-05 13:38:31 -05:00
Daniel Nephin b747b27afd state: remove the need for registerSchema
registerSchema creates some indirection which is not necessary in this
case. newDBSchema can call each of the tables.

Enterprise tables can be added from the existing withEnterpriseSchema
shim.
2021-02-05 12:19:56 -05:00
Daniel Nephin 33621706ac state: rename table name constants to use pattern
the 'table' prefix is shorter, and also reads better in queries.
2021-02-05 12:12:19 -05:00
Daniel Nephin 8569295116 state: rename connect constants 2021-02-05 12:12:19 -05:00
Daniel Nephin afdbf2a8ef state: rename table name constants to new pattern
Using Apps Hungarian Notation for these constants makes the memdb queries more readable.
2021-02-05 12:12:18 -05:00
Pierre Souchay c466b08481 Streaming filter tags + case insensitive lookups for Service Names
Will fix:
 * https://github.com/hashicorp/consul/issues/9695
 * https://github.com/hashicorp/consul/issues/9702
2021-02-04 11:00:51 +01:00
Daniel Nephin f929a7117e state: Remove unnecessary entMeta arg to EnsureConfigEntry 2021-02-03 18:10:38 -05:00
Kyle Havlovitz 1dee4173c1 connect/ca: Allow ForceWithoutCrossSigning for all providers
This allows setting ForceWithoutCrossSigning when reconfiguring the CA
for any provider, in order to forcibly move to a new root in cases where
the old provider isn't reachable or able to cross-sign for whatever
reason.
2021-01-29 13:38:11 -08:00
Daniel Nephin 09425b22a1 state: rename config-entries table const to match new pattern 2021-01-28 20:34:34 -05:00
Daniel Nephin 7d17e20270 state: move config-entries table to new pattern 2021-01-28 20:34:15 -05:00
Daniel Nephin 825b8ade39 state: use indexID
this change was already made to enterprise, so backporting it.
2021-01-28 20:30:08 -05:00
Daniel Nephin 2a262f07fc state: Move ACL schema indexes to match Ent
and use constants for table and index names.
2021-01-28 20:05:09 -05:00
Matt Keeler 1379b5f7d6
Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644)
Fixes: 9626
2021-01-27 11:14:52 -05:00
Hans Hasselberg 623aab5880
Add flags to support CA generation for Connect (#9585) 2021-01-27 08:52:15 +01:00
R.B. Boyer 5777fa1f59
server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 17:30:38 -06:00
Daniel Nephin d7d081f402
Merge pull request #9420 from hashicorp/dnephin/reduce-duplicate-in-catalog-schema
state: reduce interface for Enterprise schema
2021-01-25 17:04:25 -05:00
R.B. Boyer 6622185d64
server: use the presense of stored federation state data as a sign that we already activated the federation state feature flag (#9519)
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
2021-01-25 13:24:32 -06:00
R.B. Boyer 0247f409a0
server: when wan federating via mesh gateways only do heuristic primary DC bypass on the leader (#9366)
Fixes #9341
2021-01-22 10:03:24 -06:00
Freddy 5519051c84
Update topology mapping Refs on all proxy instance deletions (#9589)
* Insert new upstream/downstream mapping to persist new Refs

* Avoid upserting mapping copy if it's a no-op

* Add test with panic repro

* Avoid deleting up/downstreams from inside memdb iterator

* Avoid deleting gateway mappings from inside memdb iterator

* Add CHANGELOG entry

* Tweak changelog entry

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 15:17:26 +00:00
Daniel Nephin 979749d86e state: do not delete from inside an iteration
Deleting from memdb inside an interation can cause a panic from Iterator.Next. This
case is technically safe (for now) because the iterator is using the root radix tree
not a modified one.

However this could break at any time if someone adds an insert or delete to the coordinates table
before this place in the function.

It also sets a bad example, because generally deletes in an interator are not safe. So this
commit uses the pattern we have in other places to move the deletes out of the iteration.
2021-01-19 17:00:07 -05:00
Matt Keeler 2d2ce1fb0c
Ensure that CA initialization does not block leader election.
After fixing that bug I uncovered a couple more:

Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
2021-01-19 15:27:48 -05:00
Daniel Nephin 52a1d78e39 state: add a regression test for state store schema
To allow the index to be refactored without accidental changes.

To update the expected value run: 'go test ./agent/consul/state -update'
2021-01-15 18:49:55 -05:00
Daniel Nephin aa21c1ea04 state: reduce interface for Enterprise schema
Using withEnterpriseSchema() we can apply any enterprise schema changes
with a single shim, removing the need to duplicate all of the table
definitions.

Also move all the catalog schemas to a new file to shrink catalog.go a bit.
2021-01-15 18:49:55 -05:00
Daniel Nephin e8427a48ab agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin e5320c2db6 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00
Daniel Nephin f2b504873a
Merge pull request #9460 from hashicorp/dnephin/fix-data-races
Fix a couple data races in tests
2021-01-14 17:07:01 -05:00
Chris Piraino baad708929
Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 15:31:47 -06:00
Chris Piraino 2eac571276
Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-01-08 14:03:06 -06:00
Daniel Nephin 45f0afcbf4 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Daniel Nephin 27c38bfebb
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
R.B. Boyer db62541676
acl: use the presence of a management policy in the state store as a sign that we already migrated to v2 acls (#9505)
This way we only have to wait for the serf barrier to pass once before
we can upgrade to v2 acls. Without this patch every restart needs to
re-compute the change, and potentially if a stray older node joins after
a migration it might regress back to v1 mode which would be problematic.
2021-01-05 17:04:27 -06:00
Matt Keeler 3a79b559f9
Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487)
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
2021-01-04 14:05:23 -05:00
R.B. Boyer 42dea6f01e
server: deletions of intentions by name using the intention API is now idempotent (#9278)
Restoring a behavior inadvertently changed while fixing #9254
2021-01-04 11:27:00 -06:00
Daniel Nephin 088831c91e Maybe fix another data race in a test 2020-12-22 18:53:54 -05:00
Daniel Nephin d0f2eca8de Fix one race caused by t.Parallel 2020-12-22 18:27:18 -05:00
Daniel Nephin c66a63275f
Merge pull request #9340 from hashicorp/dnephin/skip-slow-tests-with-short
testing: skip slow tests with -short
2020-12-11 13:33:44 -05:00
R.B. Boyer f9dcaf7f6b
acl: global tokens created by auth methods now correctly replicate to secondary datacenters (#9351)
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
2020-12-09 15:22:29 -06:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Kyle Havlovitz 57210a59c3 connect: Fix a case where the active root would get unset even when there wasn't a new one 2020-12-02 11:42:23 -08:00
Kyle Havlovitz 91d5d6c586
Merge pull request #9009 from hashicorp/update-secondary-ca
connect: Fix an issue with updating CA config in a secondary datacenter
2020-11-30 14:49:28 -08:00
Kyle Havlovitz c5167cf9c4 Use a buffered channel for CA intermediate renew func 2020-11-30 14:37:24 -08:00
R.B. Boyer 6d6b6c15c6
server: fix panic when deleting a non existent intention (#9254)
* server: fix panic when deleting a non existent intention

* add changelog

* Always return an error when deleting non-existent ixn

Co-authored-by: freddygv <gh@freddygv.xyz>
2020-11-24 13:44:20 -05:00
Hans Hasselberg 25f9e232af add missing descriptions for metrics 2020-11-23 22:06:30 +01:00
Kit Patella 7a8844ccce add entries for missing fsm operations and mark duplicated metrics prefixes as deprecated 2020-11-23 12:42:51 -08:00
Kyle Havlovitz a01f853aa5 Clean up the logic in persistNewRootAndConfig 2020-11-20 15:54:44 -08:00
Kyle Havlovitz 26a9c985c5 Add CA server delegate interface for testing 2020-11-19 20:08:06 -08:00
Kit Patella 4ad076207e add telemetry and definition help entries for missing catalog and acl metrics 2020-11-19 13:29:44 -08:00
Kit Patella 46205bbf27 remove stale entries and rename/define acl.resolveToken 2020-11-19 13:06:28 -08:00
Freddy e4e306210a
Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 10:14:48 -07:00
Kyle Havlovitz c8d4a40a87 connect: update some function comments in CA manager 2020-11-17 16:00:19 -08:00
Daniel Nephin b9306d8827 acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Daniel Nephin 9e7c8dd19d Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Matt Keeler 4bca029be9
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Kit Patella 4dfcdbab26
Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-16 15:54:50 -08:00
Matt Keeler 197a37a860
Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 17:08:17 -05:00
Daniel Nephin de88ceed1c
Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 14:20:07 -05:00
Kit Patella 0b18f5612e trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 374748dafc merge master 2020-11-16 10:46:53 -08:00
Kit Patella af719981f3 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kyle Havlovitz 0a86533e20 Reorganize some CA manager code for correctness/readability 2020-11-13 14:46:01 -08:00
Kyle Havlovitz 5de81c1375 connect: Add CAManager for synchronizing CA operations 2020-11-13 14:33:44 -08:00
Kyle Havlovitz 0b4876f906 connect: Add logic for updating secondary DC intermediate on config set 2020-11-13 14:33:44 -08:00
R.B. Boyer db1184c094
server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 16:19:12 -06:00
Kit Patella b486c1bce8 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
R.B. Boyer e323014faf
server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 14:42:21 -06:00
R.B. Boyer 6300abed18
server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 13:56:41 -06:00
Mike Morris a343365da7
ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 13:02:59 -05:00
R.B. Boyer 758384893d
server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 09:15:39 -06:00
Kit Patella 9533372ded first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
R.B. Boyer a5bd1ba323
agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 10:38:32 -06:00
Matt Keeler 2badb01d30
Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 11:21:42 -05:00
Matt Keeler 1f40f51a58
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
Matt Keeler 755fb72994
Switch to using the external autopilot module 2020-11-09 09:22:11 -05:00
Daniel Nephin e4a78c977d stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
Daniel Nephin 4fc073b1f4 stream: rename FilterByKey 2020-11-05 19:21:16 -05:00
Daniel Nephin d4cd2fa6a8 stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 8a26bca020 stream: move event filtering to PayloadEvents
Removes the weirdness around PayloadEvents.FilterByKey
2020-11-05 17:50:17 -05:00
Daniel Nephin dcacfd3548 stream: Remove unused method 2020-11-05 16:49:59 -05:00
Daniel Nephin 621f1db766
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin cd220e5d6c
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
Daniel Nephin f6b629852f state: test EventPayloadCheckServiceNode.FilterByKey
Also fix a bug in that function when only one of key or namespace were the empty string.
2020-10-30 14:35:57 -04:00
Daniel Nephin 60df44df4f stream: Add tests for filterByKey with namespace
And fix a bug where a request with a Namespace but no Key would not be properly filtered
2020-10-30 14:35:42 -04:00
Daniel Nephin 318dfbe6e4 stream: Move FilterByKey events to a table
In preparation for adding new tests.
2020-10-30 14:35:28 -04:00
Daniel Nephin 2d0030da39 state: use enterprise meta for creating events 2020-10-30 14:34:04 -04:00
Daniel Nephin b57c7afcbb stream: include the namespace in the snap cache key
Otherwise the wrong snapshot could be returned when the same key is used in different namespaces
2020-10-30 14:34:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
R.B. Boyer 67a0d0c426
state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 15:28:31 -05:00
R.B. Boyer 78014653b3 restore prior signature of test helper so enterprise compiles 2020-10-29 13:52:15 -05:00
Daniel Nephin 61ce0964a4 stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin 8ef4c0fcc5 state: use go-cmp for comparison
The output of the previous assertions made it impossible to debug the tests without code changes.

With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
2020-10-28 16:33:00 -04:00
Daniel Nephin 44da869ed4 stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin eea87e1acf store: use a ReadDB for snapshots
to remove the cyclic dependency between the snapshot handlers and the state.Store
2020-10-28 13:07:42 -04:00
Daniel Nephin cfe0ffde15
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin 03d2be03e7
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Daniel Nephin abd8cfcfe9 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
R.B. Boyer 0a80e82f21
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin f9b2834171 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin 26387cdc0e
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy d23038f94f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
Daniel Nephin fb8b68a6ec stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Pierre Souchay 54f9f247f8
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
s-christoff a62705101f
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00
Kyle Havlovitz 707f4a8d26 Stop intermediate renew routine on leader stop 2020-10-09 12:30:57 -07:00
Kyle Havlovitz 926a393a5c
Merge pull request #8784 from hashicorp/renew-intermediate-primary
connect: Enable renewing the intermediate cert in the primary DC
2020-10-09 12:18:59 -07:00
Daniel Nephin dd0e8d42c4
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Chris Piraino 4f77f87065
Emit service usage metrics with correct labeling strategy (#8856)
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
2020-10-09 11:01:45 -05:00
Kyle Havlovitz 50543d678e Fix intermediate refresh test comments 2020-10-09 08:53:33 -07:00
R.B. Boyer d2f09ca306
upstream some differences from enterprise (#8902) 2020-10-09 09:42:53 -05:00
Kyle Havlovitz 968fd8660d Update CI for leader renew CA test using Vault 2020-10-09 05:48:15 -07:00
Kyle Havlovitz 62270c3f9a
Merge branch 'master' into renew-intermediate-primary 2020-10-09 04:40:34 -07:00
Kyle Havlovitz b78f618beb connect: Check for expired root cert when cross-signing 2020-10-09 04:35:56 -07:00
Freddy 89d52f41c4
Add protocol to the topology endpoint response (#8868) 2020-10-08 17:31:54 -06:00
Matt Keeler 141eb60f06
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Daniel Nephin 05df7b18a9 config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Freddy de4af766f3
Support ingress gateways in mesh viz endpoint (#8864)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-08 09:47:09 -06:00
Daniel Nephin a94fe054f0
Merge pull request #8809 from hashicorp/streaming/materialize-view
Add StreamingHealthServices cache-type
2020-10-07 21:26:38 -04:00
Daniel Nephin e0236b5a9f
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin 783627aeef
Merge pull request #8768 from hashicorp/streaming/add-subscribe-service
subscribe: add subscribe service for streaming change events
2020-10-07 21:24:03 -04:00
Freddy 7d1f50d2e6
Return intention info in svc topology endpoint (#8853) 2020-10-07 18:35:34 -06:00
R.B. Boyer 35c4efd220
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
R.B. Boyer d6dce2332a
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Daniel Nephin 83401194ab streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin f857aef4a8 submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin ad29cf4f94 stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin fa115c6249 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00
Daniel Nephin 011109a6f6 subscirbe: extract streamID and logging from Subscribe
By extracting all of the tracing logic the core logic of the Subscribe
endpoint is much easier to read.
2020-10-06 12:49:35 -04:00
Daniel Nephin 4c4441997a subscribe: add integration test for acl token updates 2020-10-06 12:49:35 -04:00
Daniel Nephin 371ec2d70a subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin ae433947a4
Merge pull request #8799 from hashicorp/streaming/rename-framing-events
stream: remove EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-06 12:42:58 -04:00
R.B. Boyer a77b518542
server: create new memdb table for storing system metadata (#8703)
This adds a new very tiny memdb table and corresponding raft operation
for updating a very small effective map[string]string collection of
"system metadata". This can persistently record a fact about the Consul
state machine itself.

The first use of this feature will come in a later PR.
2020-10-06 10:08:37 -05:00
Daniel Nephin 2706cf9b2a
Merge pull request #8802 from hashicorp/dnephin/extract-lib-retry
lib/retry - extract a new package from lib/retry.go
2020-10-05 14:22:37 -04:00
freddygv 82a17ccee6 Do not evaluate discovery chain for topology upstreams 2020-10-05 10:24:50 -06:00
freddygv 63c50e15bc Single DB txn for ServiceTopology and other PR comments 2020-10-05 10:24:50 -06:00
freddygv 263bd9dd92 Add topology HTTP endpoint 2020-10-05 10:24:50 -06:00
freddygv 7c11580e93 Add topology RPC endpoint 2020-10-05 10:24:50 -06:00
freddygv 21c4708fe9 Add topology ACL filter 2020-10-05 10:24:50 -06:00
freddygv ac54bf99b3 Add func to combine up+downstream queries 2020-10-05 10:24:50 -06:00
freddygv 160a6539d1 factor in discovery chain when querying up/downstreams 2020-10-05 10:24:50 -06:00
freddygv 214b25919f support querying upstreams/downstreams from registrations 2020-10-05 10:24:50 -06:00
freddygv 3653045cb0 Add method for downstreams from disco chain 2020-10-05 10:24:50 -06:00
Daniel Nephin 40aac46cf4 lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin 0c7f9c72d7 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Daniel Nephin 9c5181c897 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin 0769f54fe1 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 5ef630f664
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
R.B. Boyer e84d52ba3a
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer ccd0200bd9
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2956313f2d connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00
freddygv ec6e8021c0 Resolve conflicts 2020-09-29 08:59:18 -06:00
Daniel Nephin d192b0a080 stream: move goroutine out of New
This change will make it easier to manage goroutine lifecycle from the caller.

Also expose EventPublisher from state.Store
2020-09-28 18:40:10 -04:00
Daniel Nephin e345c8d8a6 state: use pbsubscribe.Topic for topic values 2020-09-28 18:40:10 -04:00
Daniel Nephin 6e592ec485 state: rename and export EventPayload
The subscribe endpoint needs to be able to inspect the payload to filter
events, and convert them into the protobuf types.

Use the protobuf CatalogOp type for the operation field, for now. In the
future if we end up with multiple interfaces we should be able to remove
the protobuf dependency by changing this to an int32 and adding a test
for the mapping between the values.

Make the value of the payload a concrete type instead of interface{}. We
can create other payloads for other event types.
2020-09-28 18:34:30 -04:00
R.B. Boyer 45609fccdf
server: make sure that the various replication loggers use consistent logging (#8745) 2020-09-24 15:49:38 -05:00
Daniel Nephin 4b041a018d grpc: redeuce dependencies, unexport, and add godoc
Rename GRPCClient to ClientConnPool. This type appears to be more of a
conn pool than a client. The clients receive the connections from this
pool.

Reduce some dependencies by adjusting the interface baoundaries.

Remove the need to create a second slice of Servers, just to pick one and throw the rest away.

Unexport serverResolver, it is not used outside the package.

Use a RWMutex for ServerResolverBuilder, some locking is read-only.

Add more godoc.
2020-09-24 12:53:10 -04:00
Daniel Nephin 4b24470887 grpc: move client conn pool to grpc package 2020-09-24 12:48:12 -04:00
Daniel Nephin fad15171ec grpc: client conn pool and resolver
Extracted from 936522a13c07e8b732b6fde61bba23d05f7b9a70

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-24 12:46:22 -04:00
Daniel Nephin e0119a6e92
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Paul Banks 0594667c3a
Fix bad int -> string conversions caught by go vet changes in 1.15 (#8739) 2020-09-24 11:14:07 +01:00
Hans Hasselberg c6fa758d6f
fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 10:13:29 +02:00
Mike Morris fe984b3ee3
test: update tags for database service registrations and queries (#8693) 2020-09-16 14:05:01 -04:00
Kyle Havlovitz c8fd61abc7 Merge branch 'master' into vault-ca-renew-token 2020-09-15 14:39:04 -07:00
Daniel Nephin c621b4a420 agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 0536b2047e agent/consul: make router required 2020-09-15 17:26:26 -04:00
Kyle Havlovitz 63d3a5fc1f Clean up CA shutdown logic and error 2020-09-15 12:28:58 -07:00
freddygv 43efb4809c Merge master 2020-09-14 16:17:43 -06:00
Daniel Nephin 75515f3431
Merge pull request #8587 from hashicorp/streaming/add-grpc-server
streaming: add gRPC server for handling connections
2020-09-14 15:24:54 -04:00
freddygv 33af8dab9a Resolve conflicts against master 2020-09-11 18:41:58 -06:00
Kyle Havlovitz 1595add842 Clean up Vault renew tests and shutdown 2020-09-11 08:41:05 -07:00
freddygv 5871b667a5 Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
Kyle Havlovitz 7588e22739 Add a stop function to make sure the renewer is shut down on leader change 2020-09-10 06:12:48 -07:00
Kyle Havlovitz 1c57b72a9f Add a test for token renewal 2020-09-09 16:36:37 -07:00
Daniel Nephin 863a9df951 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00
Hans Hasselberg 51f079dcdd
secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 11:47:16 +02:00
Daniel Nephin c17a5b0628 state: handle terminating gateways in service health events 2020-09-03 16:58:05 -04:00
Daniel Nephin b241debee7 state: improve comments in catalog_events.go
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:05 -04:00
Daniel Nephin 870823e8ed state: use changeType in serviceChanges
To be a little more explicit, instead of nil implying an indirect change
2020-09-03 16:58:05 -04:00
Daniel Nephin 68682e7e83 don't over allocate slice 2020-09-03 16:58:04 -04:00
Daniel Nephin 5f52220f53 state: fix a bug in building service health events
The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.

Also refactor some things to reduce the arguments to functions.
2020-09-03 16:58:04 -04:00
Daniel Nephin c61313b78a state: Remove unused args and return values
Also rename some functions to identify them as constructors for events
2020-09-03 16:58:04 -04:00
Daniel Nephin 668b98bcce state: use an enum for tracking node changes 2020-09-03 16:58:04 -04:00
Daniel Nephin 7c3c627028 state: serviceHealthSnapshot
refactored to remove unused return value and remove duplication
2020-09-03 16:58:04 -04:00
Daniel Nephin fdfe176deb state: Add Change processor and snapshotter for service health
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:04 -04:00
Daniel Nephin 6a1a43721d state: fix bug in changeTrackerDB.publish
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
2020-09-03 16:58:01 -04:00
Daniel Nephin 81cc3daf69 stream: have SnapshotFunc accept a non-pointer SubscribeRequest
The value is not expected to be modified. Passing a value makes that explicit.
2020-09-03 16:54:02 -04:00
freddygv 56fdae9ace Update resolver defaulting 2020-09-03 13:08:44 -06:00
freddygv 02d6acd8fc Ensure resolver node with LB isn't considered default 2020-09-03 08:55:57 -06:00
freddygv daad3b9210 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
Chris Piraino df1381f77f
Merge pull request #8603 from hashicorp/feature/usage-metrics
Track node and service counts in the state store and emit them periodically as metrics
2020-09-02 13:23:39 -05:00
R.B. Boyer 4197bed23b
connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 10:47:19 -05:00
Chris Piraino b245d60200 Set metrics reporting interval to 9 seconds
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
2020-09-02 10:24:23 -05:00
Chris Piraino e9b397005c Update godoc string for memdb wrapper functions/structs 2020-09-02 10:24:22 -05:00
Chris Piraino 80f923a47a Refactor state store usage to track unique service names
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
2020-09-02 10:24:21 -05:00
Chris Piraino 79e6534345 Use ReadTxn interface in state store helper functions 2020-09-02 10:24:20 -05:00
Chris Piraino d90d95421d Add WriteTxn interface and convert more functions to ReadTxn
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
2020-09-02 10:24:19 -05:00
Chris Piraino 45a4057f60 Report node/service usage metrics from every server
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
2020-09-02 10:24:17 -05:00
Chris Piraino 3af96930eb Add new usage memdb table that tracks usage counts of various elements
We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
2020-09-02 10:24:16 -05:00
freddygv d7bda050e0 Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
Matt Keeler 335c604ced
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00
freddygv afb14b6705 Compile down LB policy to disco chain nodes 2020-08-28 13:11:04 -06:00
Daniel Nephin 845661c8af
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
R.B. Boyer f2b8bf109c
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Matt Keeler 106e1d50bd
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
André Cruz 673bd69f36
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
André Cruz a64686fab6
testing: Fix govet errors 2020-08-21 18:01:55 +01:00
Hans Hasselberg 02de4c8b76
add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 09:50:24 +02:00
Daniel Nephin 8d35e37b3c testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin 629c34085d state: remove unused Store method receiver
And use ReadTxn interface where appropriate.
2020-08-13 11:25:22 -04:00
Daniel Nephin fc797a279a
Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-13 11:16:48 -04:00
Daniel Nephin d8ffcd5686
Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 11:16:12 -04:00
R.B. Boyer 63422ca9c5
connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470)
Fixes #8466

Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)

If you were to do these in a different order, it would fail without a
crash:

1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
   protocol mismatch)

This PR introduces the missing validation. The two new behaviors are:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
   message about protocol mismatch)

In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
2020-08-12 11:19:20 -05:00
Hans Hasselberg 7a6d916ddc
Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 08:54:51 +02:00
Freddy 50fee12d62
Internal endpoint to query intentions associated with a gateway (#8400) 2020-08-11 17:20:41 -06:00
Kyle Havlovitz 8118e3db40 Fix a state store comment about version 2020-08-11 13:46:12 -07:00
Kyle Havlovitz 2601585017 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 11:49:52 -07:00
Hans Hasselberg e0297b6e99 Refactor keyring ops:
* changes some functions to return data instead of modifying pointer
  arguments
* renames globalRPC() to keyringRPCs() to make its purpose more clear
* restructures KeyringOperation() to make it more understandable
2020-08-11 13:42:03 +02:00
freddygv 6dcfa11c21 Update error handling 2020-08-10 17:48:22 -06:00
Daniel Nephin bef9348ca8 testing: remove unnecessary defers in tests
The data directory is now removed by the test helper that created it.
2020-08-07 17:28:16 -04:00
Daniel Nephin f3b63514d5 testing: Remove NotifyShutdown
NotifyShutdown was only used for testing. Now that t.Cleanup exists, we
can use that instead of attaching cleanup to the Server shutdown.

The Autopilot test which used NotifyShutdown doesn't need this
notification because Shutdown is synchronous. Waiting for the function
to return is equivalent.
2020-08-07 17:14:44 -04:00
Hans Hasselberg fdceb24323
auto_config implies connect (#8433) 2020-08-07 12:02:02 +02:00
freddygv 83f4e32376 PR comments and addtl tests 2020-08-05 16:07:11 -06:00
Daniel Nephin 061ae94c63 Rename NewClient/NewServer
Now that duplicate constructors have been removed we can use the shorter names for the single constructor.
2020-08-05 14:00:55 -04:00
Daniel Nephin e6c94c1411 Remove LogOutput from Server 2020-08-05 14:00:44 -04:00
Daniel Nephin fdf966896f Remove LogOutput from Client 2020-08-05 14:00:42 -04:00
Daniel Nephin 73493ca01b Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin c7c941811d config: Remove unused field 2020-08-05 13:25:08 -04:00
freddygv c87af29506 collect GatewayServices from iter in a function 2020-07-31 13:30:40 -06:00
Freddy 7c2c8815d7
Avoid panics during shutdown routine (#8412) 2020-07-30 11:11:10 -06:00
freddygv 94d1f0a310 end to end changes to pass gatewayservices to /ui/services/ 2020-07-30 10:21:11 -06:00
Matt Keeler 76add4f24c
Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 10:15:12 -04:00
Matt Keeler dad0f189a2
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler 3a1058a06b
Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 10:00:51 -04:00
Matt Keeler e7d8a02ae8
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Daniel Nephin 597dcf2bfb
Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2
stream: close subscriptions on shutdown
2020-07-23 13:12:50 -04:00
Matt Keeler c3e7d689b7
Refactor the agentpb package (#8362)
First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
2020-07-23 11:24:20 -04:00
Daniel Nephin decba06b7d stream: close all subs when EventProcessor is shutdown. 2020-07-22 19:04:10 -04:00
Daniel Nephin e802689bbe stream: fix overallocation in filter
And add tests
2020-07-22 19:04:10 -04:00
Daniel Nephin f64725f7aa state: speed up TestStateStore_ServicesByNodeMeta
Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.
2020-07-22 16:57:06 -04:00
Daniel Nephin a44ddea9ba state: Use subtests in TestStateStore_ServicesByNodeMeta
These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.
2020-07-22 16:39:09 -04:00
Daniel Nephin 6d3b042872
Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 15:21:52 -04:00
Matt Keeler 8ea8a939f0
Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update 2020-07-21 13:15:27 -04:00
Daniel Nephin 80ff174880 testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Matt Keeler 133a6d99f2
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin 7599e280de stream: handle empty event in TestEventSnapshot
When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers.

We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.
2020-07-20 18:20:02 -04:00
Daniel Nephin 75f10fb191 state: update calls that are no longer state methods
In a previous commit these methods were changed to functions, so remove the Store paramter.
2020-07-16 15:46:10 -04:00
Daniel Nephin 3fcb2e16f4 state: un-method funcs that don't use their receiver
This change was mostly automated with the following

First generate a list of functions with:

  git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'

Then the list was curated a bit with trial/error to remove and add funcs
as necessary.

Finally the replacement was done with:

  dir=agent/consul/state
  file=${1-funcnames}

  while read fn; do
    echo "$fn"
    sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
  done < $file
2020-07-16 15:30:39 -04:00
Daniel Nephin edb0a4f1f8 store: convert methods that don't use their receiver to functions
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.

This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
2020-07-16 15:30:10 -04:00
Daniel Nephin a2f8605c66 stream: Add forceClose and refactor subscription filtering
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.

Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.

NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.

Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.

Also improve some godoc

Only unwrap itemBuffer.Err when necessary
2020-07-14 15:57:47 -04:00
Daniel Nephin 2595436f62 stream: Improve docstrings
Also rename ResumeStrema to EndOfEmptySnapshot to be more consistent with other framing events

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:47 -04:00
Daniel Nephin 16a2b3fafc stream: change Topic to an interface
Consumers of the package can decide on which type to use for the Topic. In the future we may
use a gRPC type for the topic.
2020-07-14 15:57:47 -04:00
Daniel Nephin aa571bd0ce state: Move change processing out of EventPublisher
EventPublisher was receiving TopicHandlers, which had a couple of
problems:

- ChangeProcessors were being grouped by Topic, but they completely
  ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
  changes

By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.

Handlers is now only SnapshotHandlers, which are still mapped by Topic.

Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
2020-07-14 15:57:47 -04:00
Daniel Nephin 23a940daad server: Abandom state store to shutdown EventPublisher
So that we don't leak goroutines
2020-07-14 15:57:47 -04:00
Daniel Nephin e1305fe80c stream: unexport identifiers
Now that EventPublisher is part of stream a lot of the internals can be hidden
2020-07-14 15:57:47 -04:00
Daniel Nephin 9e37894778 stream: Move EventPublisher to stream package
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.

The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
2020-07-14 15:57:47 -04:00
Daniel Nephin 6e87e83d77 state: Make handleACLUpdate async once again
So that we keep as much as possible out of the FSM commit hot path.
2020-07-14 15:57:47 -04:00
Daniel Nephin a92dab724d state: Use interface for Txn
Also store the index in Changes instead of the Txn.

This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
2020-07-14 15:57:46 -04:00
Daniel Nephin c778d61b6a stream.Subscription unexport fields and additiona docstrings 2020-07-14 15:57:46 -04:00
Daniel Nephin 37a38629d7 Add a context for stopping EventPublisher goroutine 2020-07-14 15:57:46 -04:00
Daniel Nephin 02bc5a26e4 EventPublisher: Make Unsubscribe a function on Subscription
It is critical that Unsubscribe be called with the same pointer to a
SubscriptionRequest that was used to create the Subscription. The
docstring made that clear, but it sill allowed a caler to get it wrong by
creating a new SubscriptionRequest.

By hiding this detail from the caller, and only exposing an Unsubscribe
method, it should be impossible to fail to Unsubscribe.

Also update some godoc strings.
2020-07-14 15:57:46 -04:00
Daniel Nephin 86976cf23c EventPublisher: handleACL changes synchronously
Use a separate lock for subscriptions.ByToken to allow it to happen synchronously
in the commit flow.
This removes the need to create a new txn for the goroutine, and removes
the need for EventPublisher to contain a reference to DB.
2020-07-14 15:57:46 -04:00
Daniel Nephin 606121fae6 stream.EventSnapshot: reduce the fields on the struct
Many of the fields are only needed in one place, and by using a closure
they can be removed from the struct. This reduces the scope of the variables
making it esier to see how they are used.
2020-07-14 15:57:45 -04:00
Daniel Nephin 7196917051 stream.EventBuffer: Seed the fuzz test with time.Now()
Otherwise the test will run with exactly the same values each time.

By printing the seed we can attempt to reproduce the test by adding an env var to override the seed
2020-07-14 15:57:45 -04:00
Daniel Nephin 525b275a52 state: memdb_wrapper.go -> memdb.go
Renaming in a separate commit so that git can merge changes to the file.
2020-07-14 15:57:45 -04:00
Daniel Nephin b5d2bea770 state: publish changes from Commit
Make topicRegistry use functions instead of unbound methods
Use a regular memDB in EventPublisher to remove a reference cycle
Removes the need for EventPublisher to use a store
2020-07-14 15:57:45 -04:00
Daniel Nephin f626c3d6c5 EventPublisher: docstrings and getTopicBuffer
also rename commitCh -> publishCh
2020-07-14 15:57:45 -04:00
Daniel Nephin 2020e9c7c7 ProcessChanges: use stream.Event
Also remove secretHash, which was used to hash tokens. We don't expose
these tokens anywhere, so we can use the string itself instead of a
Hash.

Fix acl_events_test.go for storing a structs type.
2020-07-14 15:57:45 -04:00
Daniel Nephin 2e45bbbb3e stream: Use local types for Event Topic SubscriptionRequest 2020-07-14 15:57:45 -04:00
Daniel Nephin 3d62013062 Rename stream_publisher.go -> event_publisher.go 2020-07-14 15:57:44 -04:00
Daniel Nephin 526fb53f85 Add streaming package with Subscription and Snapshot components.
The remaining files from 7965767de0bd62ab07669b85d6879bd5f815d157

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:44 -04:00
Chris Piraino b80cbb499f
Set enterprise metadata after resolving the token (#8302)
The token can encode enterprise metadata information, and we must make
sure we set that on the reply so that we can correct filter ACLs.
2020-07-13 13:39:57 -05:00
Daniel Nephin 13e0d258b5
Merge pull request #8237 from hashicorp/dnephin/remove-acls-enabled-from-delegate
Remove ACLsEnabled from delegate interface
2020-07-09 16:35:43 -04:00
Matt Keeler 39d9babab3
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-08 12:36:11 -04:00
Matt Keeler a77ed471c8
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-08 11:05:44 -04:00
Matt Keeler 386ec3a2a2
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.
2020-07-08 11:05:44 -04:00
Daniel Nephin 8b6036c077 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00
Daniel Nephin dfa8856e5f agent/consul: Add support for NotModified to two endpoints
A query made with AllowNotModifiedResponse and a MinIndex, where the
result has the same Index as MinIndex, will return an empty response
with QueryMeta.NotModified set to true.

Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-02 17:05:46 -04:00
Matt Keeler 87764e5bfb
Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 09:49:49 -04:00
Yury Evtikhov dbf3c05fa5 DNS: add IsErrQueryNotFound function for easier error evaluation 2020-07-01 03:41:44 +01:00
Matt Keeler a97f9ff386
Overwrite agent leaf cert trust domain on the servers 2020-06-30 09:59:08 -04:00
Matt Keeler 5600069d69
Store the Connect CA rate limiter on the server
This fixes a bug where auto_encrypt was operating without utilizing a common rate limiter.
2020-06-30 09:59:07 -04:00
Matt Keeler fa42d9b34f
Fix auto_encrypt IP/DNS SANs
The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.
2020-06-30 09:59:07 -04:00
R.B. Boyer 72a515f5ec
connect: various changes to make namespaces for intentions work more like for other subsystems (#8194)
Highlights:

- add new endpoint to query for intentions by exact match

- using this endpoint from the CLI instead of the dump+filter approach

- enforcing that OSS can only read/write intentions with a SourceNS or
  DestinationNS field of "default".

- preexisting OSS intentions with now-invalid namespace fields will
  delete those intentions on initial election or for wildcard namespaces
  an attempt will be made to downgrade them to "default" unless one
  exists.

- also allow the '-namespace' CLI arg on all of the intention subcommands

- update lots of docs
2020-06-26 16:59:15 -05:00
Daniel Nephin 7d5f1ba6bd
Merge pull request #8176 from hashicorp/dnephin/add-linter-unparam-1
lint: add unparam linter and fix some of the issues
2020-06-25 15:34:48 -04:00
Matt Keeler d471977f62
Fix go routine leak in auto encrypt ca roots tracking 2020-06-24 17:09:50 -04:00
Matt Keeler 90e741c6d2
Allow cancelling blocking queries in response to shutting down. 2020-06-24 17:09:50 -04:00
Daniel Nephin 07c1081d39 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Matt Keeler 341aedbce9
Ensure that retryLoopBackoff can be cancelled
We needed to pass a cancellable context into the limiter.Wait instead of context.Background. So I made the func take a context instead of a chan as most places were just passing through a Done chan from a context anyways.

Fix go routine leak in the gateway locator
2020-06-24 12:41:08 -04:00
Matt Keeler 9dc9f7df15
Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 15:16:00 -04:00
Daniel Nephin b5ef9b7ea9 Remove bytesToUint64 from agent/consul 2020-06-18 12:45:43 -04:00
Daniel Nephin 81bc082b63 Remove unused private IP code from agent/consul 2020-06-18 12:40:38 -04:00
Matt Keeler 2c7844d220
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Matt Keeler f5d57ccd48
Allow the Agent its its child Client/Server to share a connection pool
This is needed so that we can make an AutoConfig RPC at the Agent level prior to creating the Client/Server.
2020-06-17 16:19:33 -04:00
Matt Keeler 8c601ad8db
Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 16:07:25 -04:00
Chris Piraino 79d003d395
Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-17 14:58:01 -05:00
Daniel Nephin 1ef8279ac9
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Matt Keeler eda8cb39fd
Implement the insecure version of the Cluster.AutoConfig RPC endpoint
Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.
2020-06-17 11:25:29 -04:00
Pierre Souchay f7a1189dba
gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 12:16:13 +02:00
Daniel Nephin 8753d1f1ba ci: Add ineffsign linter
And fix an additional ineffective assignment that was not caught by staticcheck
2020-06-16 17:32:50 -04:00
Daniel Nephin 97342de262
Merge pull request #8070 from hashicorp/dnephin/add-gofmt-simplify
ci: Enable gofmt simplify
2020-06-16 17:18:38 -04:00
Matt Keeler d994dc7b35
Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 15:03:22 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin 5f24171f13 ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Daniel Nephin 71e6534061 Rename txnWrapper to txn 2020-06-16 13:06:02 -04:00
Daniel Nephin 537ae1fd46 Rename db 2020-06-16 13:04:31 -04:00
Daniel Nephin 78c76f0773 Handle return value from txn.Commit 2020-06-16 13:04:31 -04:00
Daniel Nephin 50db8f409a state: Update docstrings for changeTrackerDB and txn
And un-embed memdb.DB to prevent accidental access to underlying
methods.
2020-06-16 13:04:31 -04:00
Paul Banks f9a6386c4a state: track changes so that they may be used to produce change events 2020-06-16 13:04:29 -04:00
Matt Keeler cdc4b20afa
ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 12:54:27 -04:00
freddygv cc4ff3ae02 Fixup stray sid references 2020-06-12 13:47:43 -06:00
freddygv 1e7e716742 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
freddygv 806b1fb608 Move GatewayServices out of Internal 2020-06-12 13:46:47 -06:00
Daniel Nephin 6719f1a6fa
Merge pull request #7900 from hashicorp/dnephin/add-linter-staticcheck-2
intentions: fix a bug in Intention.SetHash
2020-06-09 15:40:20 -04:00
Daniel Nephin 5f14eb124c
Merge pull request #8037 from hashicorp/dnephin/add-linter-staticcheck-5
ci: Enabled SA2002 staticcheck check
2020-06-09 15:31:24 -04:00
Hans Hasselberg 7404712854
acl: do not resolve local tokens from remote dcs (#8068) 2020-06-09 21:13:09 +02:00
Hans Hasselberg bec21c849d
Tokens converted from legacy ACLs get their Hash computed (#8047)
* Fixes #5606: Tokens converted from legacy ACLs get their Hash computed

This allows new style token replication to work for legacy tokens as well when they change.

* tests: fix timestamp comparison

Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2020-06-08 21:44:06 +02:00
Daniel Nephin 1cdfc4f290 ci: Enabled SA2002 staticcheck check
And handle errors in the main test goroutine
2020-06-05 17:50:11 -04:00
Daniel Nephin b9e4544ec3 intentions: fix a bug in Intention.SetHash
Found using staticcheck.

binary.Write does not accept int types without a size. The error from binary.Write was ignored, so we never saw this error. Casting the data to uint64 produces a correct hash.

Also deprecate the Default{Addr,Port} fields, and prevent them from being encoded. These fields will always be empty and are not used.
Removing these would break backwards compatibility, so they are left in place for now.

Co-authored-by: Hans Hasselberg <me@hans.io>
2020-06-05 14:51:43 -04:00
R.B. Boyer 3ad570ba99
server: don't activate federation state replication or anti-entropy until all servers are running 1.8.0+ (#8014) 2020-06-04 16:05:27 -05:00
Hans Hasselberg dd8cd9bc24
Merge pull request #7966 from hashicorp/pool_improvements
Agent connection pool cleanup
2020-06-04 08:56:26 +02:00
Matt Keeler 2c615807af
Fix legacy management tokens in unupgraded secondary dcs (#7908)
The ACL.GetPolicy RPC endpoint was supposed to return the “parent” policy and not always the default policy. In the case of legacy management tokens the parent policy was supposed to be “manage”. The result of us not sending this properly was that operations that required specifically a management token such as saving a snapshot would not work in secondary DCs until they were upgraded.
2020-06-03 11:22:22 -04:00
Matt Keeler 9fa9ec4ba0
Fix segfault due to race condition for checking server versions (#7957)
The ACL monitoring routine uses c.routers to check for server version updates. Therefore it needs to be started after initializing the routers.
2020-06-03 10:36:32 -04:00
Daniel Nephin e8a883e829
Replace goe/verify.Values with testify/require.Equal (#7993)
* testing: replace most goe/verify.Values with require.Equal

One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.

Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.

* Remove github.com/pascaldekloe/goe/verify

Reduce the number of assertion packages we use from 2 to 1
2020-06-02 12:41:25 -04:00
R.B. Boyer 7bd7895047
acl: allow auth methods created in the primary datacenter to optionally create global tokens (#7899) 2020-06-01 11:44:47 -05:00
R.B. Boyer 16db20b1f3
acl: remove the deprecated `acl_enforce_version_8` option (#7991)
Fixes #7292
2020-05-29 16:16:03 -05:00
Jono Sosulska 7a13c96a2a
Replace whitelist/blacklist terminology with allowlist/denylist (#7971)
* Replace whitelist/blacklist terminology with allowlist/denylist
2020-05-29 14:19:16 -04:00
Hans Hasselberg 1ed91cbdf6 pool: remove timeout parameter
Timeout was never used in a meaningful way by callers, which is why it
is now entirely internal to the pool.
2020-05-29 08:21:28 +02:00
Hans Hasselberg 5cda505495 pool: remove useTLS and ForceTLS
In the past TLS usage was enforced with these variables, but these days
this decision is made by TLSConfigurator and there is no reason to keep
using the variables.
2020-05-29 08:21:24 +02:00
Hans Hasselberg 9ef44ec3da pool: remove version
The version field has been used to decide which multiplexing to use. It
was introduced in 2457293dceec95ecd12ef4f01442e13710ea131a. But this is
6y ago and there is no need for this differentiation anymore.
2020-05-28 23:06:01 +02:00
Daniel Nephin ea6c2b2adc ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
R.B. Boyer 54c7f825d6
create lib/stringslice package (#7934) 2020-05-27 11:47:32 -05:00
R.B. Boyer 813d69622e
agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931)
The main fix here is to always union the `primary-gateways` list with
the list of mesh gateways in the primary returned from the replicated
federation states list. This will allow any replicated (incorrect) state
to be supplemented with user-configured (correct) state in the config
file. Eventually the game of random selection whack-a-mole will pick a
winning entry and re-replicate the latest federation states from the
primary. If the user-configured state is actually the incorrect one,
then the same eventual correct selection process will work in that case,
too.

The secondary fix is actually to finish making wanfed-via-mgws actually
work as originally designed. Once a secondary datacenter has replicated
federation states for the primary AND managed to stand up its own local
mesh gateways then all of the RPCs from a secondary to the primary
SHOULD go through two sets of mesh gateways to arrive in the consul
servers in the primary (one hop for the secondary datacenter's mesh
gateway, and one hop through the primary datacenter's mesh gateway).
This was neglected in the initial implementation. While everything
works, ideally we should treat communications that go around the mesh
gateways as just provided for bootstrapping purposes.

Now we heuristically use the success/failure history of the federation
state replicator goroutine loop to determine if our current mesh gateway
route is working as intended. If it is, we try using the local gateways,
and if those don't work we fall back on trying the primary via the union
of the replicated state and the go-discover configuration flags.

This can be improved slightly in the future by possibly initializing the
gateway choice to local on startup if we already have replicated state.
This PR does not address that improvement.

Fixes #7339
2020-05-27 11:31:10 -05:00
R.B. Boyer 7e42819a71
connect: ensure proxy-defaults protocol is used for upstreams (#7938) 2020-05-21 16:08:39 -05:00
Daniel Nephin f9a89db86e
Update agent/consul/state/catalog.go
Co-authored-by: Hans Hasselberg <me@hans.io>
2020-05-20 16:34:14 -04:00
Daniel Nephin e1e1c13b35 state: use an error to indicate compare failed
Errors are values. We can use the error value to identify the 'comparison failed' case which makes the function easier to use and should make it harder to miss handle the error case
2020-05-20 12:43:33 -04:00
Pierre Souchay 3b548f0d77
Allow to restrict servers that can join a given Serf Consul cluster. (#7628)
Based on work done in https://github.com/hashicorp/memberlist/pull/196
this allows to restrict the IP ranges that can join a given Serf cluster
and be a member of the cluster.

Restrictions on IPs can be done separatly using 2 new differents flags
and config options to restrict IPs for LAN and WAN Serf.
2020-05-20 11:31:19 +02:00
Daniel Nephin 6e3a7b0aa8 consul/state: refactor tnxService to avoid missed cases
Handling errors at the end of a log switch/case block is somewhat
brittle. This block included a couple cases where errors were ignored,
but it was not obvious the way it was written.

This change moves all error handling into each case block. There is
still potentially one case where err is ignored, which will be handled
in a follow up.
2020-05-19 16:50:14 -04:00
Daniel Nephin 545bd766e7 Fix a number of problems found by staticcheck
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2020-05-19 16:50:14 -04:00
Daniel Nephin aa8009ee45 Remove unused var
The usage of this var was removed in b92f895c233bf8a99ae35361117a90416fea29b5.

Found by using staticcheck
2020-05-19 16:50:14 -04:00
Chris Piraino ce099c9aca Do not return an error if requested service is not a gateway
This commit converts the previous error into just a Warn-level log
message. By returning an error when the requested service was not a
gateway, we did not appropriately update envoy because the cache Fetch
returned an error and thus did not propagate the update through proxycfg
and xds packages.
2020-05-18 09:08:04 -05:00
Aleksandr Zagaevskiy 75f0607d3b
Preserve ModifyIndex for unchanged entry in KVS TXN (#7832) 2020-05-14 13:25:04 -06:00
Matt Keeler 849eedd142
Fix identity resolution on clients and in secondary dcs (#7862)
Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests. 

With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.
2020-05-13 13:00:08 -04:00
Chris Piraino 3eb4e4012a
Make new gateway tests compatible with enterprise (#7856) 2020-05-12 13:48:20 -05:00
Daniel Nephin 2e0f750f1a Add unconvert linter
To find unnecessary type convertions
2020-05-12 13:47:25 -04:00
Drew Bailey 7e1734b1c6 Value is already an int, remove type cast 2020-05-12 13:13:09 -04:00
R.B. Boyer 940e5ad160
acl: add auth method for JWTs (#7846) 2020-05-11 20:59:29 -05:00
Chris Piraino 1173b31949
Return early from updateGatewayServices if nothing to update (#7838)
* Return early from updateGatewayServices if nothing to update

Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table

* PR comment and better formatting
2020-05-11 14:46:48 -05:00
Chris Piraino 1a7c99cf31
Fix TestInternal_GatewayServiceDump_Ingress (#7840)
Protocol was added as a field on GatewayServices after
GatewayServiceDump PR branch was created.
2020-05-11 14:46:31 -05:00
R.B. Boyer c54211ad52
cli: ensure 'acl auth-method update' doesn't deep merge the Config field (#7839) 2020-05-11 14:21:17 -05:00
Chris Piraino 107c7a9ca7 PR comment and better formatting 2020-05-11 14:04:59 -05:00
Chris Piraino 9f924400e0 Return early from updateGatewayServices if nothing to update
Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table
2020-05-11 12:38:04 -05:00
Freddy ebbb234ecb
Gateway Services Nodes UI Endpoint (#7685)
The endpoint supports queries for both Ingress Gateways and Terminating Gateways. Used to display a gateway's linked services in the UI.
2020-05-11 11:35:17 -06:00
Chris Piraino a635e23f86
Restoring config entries updates the gateway-services table (#7811)
- Adds a new validateConfigEntryEnterprise function
- Also fixes some state store tests that were failing in enterprise
2020-05-08 13:24:33 -05:00
Freddy a37d7a42c9
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Jono Sosulska 44011c81f2
Fix spelling of deregister (#7804) 2020-05-08 10:03:45 -04:00
Chris Piraino ad8a0544f2
Require individual services in ingress entry to match protocols (#7774)
We require any non-wildcard services to match the protocol defined in
the listener on write, so that we can maintain a consistent experience
through ingress gateways. This also helps guard against accidental
misconfiguration by a user.

- Update tests that require an updated protocol for ingress gateways
2020-05-06 16:09:24 -05:00
Chris Piraino ac115e39b2 A proxy-default config entry only exists in the default namespace 2020-05-06 15:06:14 -05:00
Chris Piraino 9a130f2ccc Remove outdated comment 2020-05-06 15:06:14 -05:00
Kyle Havlovitz 04b6bd637a Filter wildcard gateway services to match listener protocol
This now requires some type of protocol setting in ingress gateway tests
to ensure the services are not filtered out.

- small refactor to add a max(x, y) function
- Use internal configEntryTxn function and add MaxUint64 to lib
2020-05-06 15:06:13 -05:00
Chris Piraino 210dda5682 Allow Hosts field to be set on an ingress config entry
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
2020-05-06 15:06:13 -05:00
Kyle Havlovitz e4268c8b7f Support multiple listeners referencing the same service in gateway definitions 2020-05-06 15:06:13 -05:00
Kyle Havlovitz b21cd112e5 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
R.B. Boyer 1187d7288e
acl: oss plumbing to support auth method namespace rules in enterprise (#7794)
This includes website docs updates.
2020-05-06 13:48:04 -05:00
R.B. Boyer b6cc92020d
test: make the kube auth method test helper use freeport (#7788) 2020-05-05 16:55:21 -05:00
Hans Hasselberg e3e2b82a00 network_segments: stop advertising segment tags 2020-05-05 21:32:05 +02:00
Hans Hasselberg 854aac510f agent: refactor to use a single addrFn 2020-05-05 21:08:10 +02:00
Hans Hasselberg 0f2e189012 agent: rename local/global to src/dst 2020-05-05 21:07:34 +02:00
Chris Piraino 837bd6f558
Construct a default destination if one does not exist for service-router (#7783) 2020-05-05 10:49:50 -05:00
R.B. Boyer c9c557477b
acl: add MaxTokenTTL field to auth methods (#7779)
When set to a non zero value it will limit the ExpirationTime of all
tokens created via the auth method.
2020-05-04 17:02:57 -05:00
R.B. Boyer 265d2ea9e1
acl: add DisplayName field to auth methods (#7769)
Also add a few missing acl fields in the api.
2020-05-04 15:18:25 -05:00
Hans Hasselberg 1be90e0fa1
agent: don't let left nodes hold onto their node-id (#7747) 2020-05-04 18:39:08 +02:00
Matt Keeler 669d22933e
Merge pull request #7714 from hashicorp/oss-sync/msp-agent-token 2020-05-04 11:33:50 -04:00
R.B. Boyer 3ac5a841ec
acl: refactor the authmethod.Validator interface (#7760)
This is a collection of refactors that make upcoming PRs easier to digest.

The main change is the introduction of the authmethod.Identity struct.
In the one and only current auth method (type=kubernetes) all of the
trusted identity attributes are both selectable and projectable, so they
were just passed around as a map[string]string.

When namespaces were added, this was slightly changed so that the
enterprise metadata can also come back from the login operation, so
login now returned two fields.

Now with some upcoming auth methods it won't be true that all identity
attributes will be both selectable and projectable, so rather than
update the login function to return 3 pieces of data it seemed worth it
to wrap those fields up and give them a proper name.
2020-05-01 17:35:28 -05:00
R.B. Boyer 4cd1d62e40
acl: change authmethod.Validator to take a logger (#7758) 2020-05-01 15:55:26 -05:00
R.B. Boyer 9faf8c42d1
sdk: extracting testutil.RequireErrorContains from various places it was duplicated (#7753) 2020-05-01 11:56:34 -05:00
Hans Hasselberg 6626cb69d6
rpc: oss changes for network area connection pooling (#7735) 2020-04-30 22:12:17 +02:00
Freddy c34ee5d339
Watch fallback channel for gateways that do not exist (#7715)
Also ensure that WatchSets in tests are reset between calls to watchFired. 
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
2020-04-29 16:52:27 -06:00
Matt Keeler 901d6739ad
Some boilerplate to allow for ACL Bootstrap disabling configurability 2020-04-28 09:42:46 -04:00
Freddy f5c1e5268b
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv e751b83a3f Clean up dead code, issue addressed by passing ws to serviceGatewayNodes 2020-04-27 11:08:41 -06:00
freddygv 75e737b0f2 Fix internal endpoint test 2020-04-27 11:08:41 -06:00
freddygv 7667567688 Avoid deleting mappings for services linked to other gateways on dereg 2020-04-27 11:08:41 -06:00
freddygv 28fe6920fe Re-fix bug in CheckConnectServiceNodes 2020-04-27 11:08:41 -06:00
freddygv bab101107c Fix ConnectQueryBlocking test 2020-04-27 11:08:40 -06:00
freddygv 65e60d02f1 Fix bug in CheckConnectServiceNodes
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.

In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
2020-04-27 11:08:40 -06:00
Matt Keeler 4b1b42cef5
A couple testing helper updates (#7694) 2020-04-27 12:17:38 -04:00
Kit Patella 2b95bd7ca9
Merge pull request #7656 from hashicorp/feature/audit/oss-merge
agent: stub out auditing functionality in OSS
2020-04-17 13:33:06 -07:00
Chris Piraino c5ab43ebbc
Fix bug where non-typical services are associated with gateways (#7662)
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.
2020-04-17 11:24:34 -05:00
Kit Patella c3d24d7c3e agent: stub out auditing functionality in OSS 2020-04-16 15:07:52 -07:00
Kyle Havlovitz 6a5eba63ab
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Matt Keeler 1e70ffee76
Update the Client code to use the common version checking infra… (#7558)
Also reduce the log level of some version checking messages on the server as they can be pretty noisy during upgrades and really are more for debugging purposes.
2020-04-14 11:54:27 -04:00
Matt Keeler 1332628b67
Allow the bootstrap endpoint to be disabled in enterprise. (#7614) 2020-04-14 11:45:39 -04:00
Pierre Souchay 2e6cd9e11a
fix flaky TestReplication_FederationStates test due to race conditions (#7612)
The test had two racy bugs related to memdb references.

The first was when we initially populated data and retained the FederationState objects in a slice. Due to how the `inmemCodec` works these were actually the identical objects passed into memdb.

The second was that the `checkSame` assertion function was reading from memdb and setting the RaftIndexes to zeros to aid in equality checks. This was mutating the contents of memdb which is a no-no.

With this fix, the command:
```
i=0; while /usr/local/bin/go test -count=1 -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do i=$((i + 1)); printf "$i "; done
```
That used to break on my machine in less than 20 runs is now running 150+ times without any issue.

Might also fix #7575
2020-04-09 15:42:41 -05:00
Freddy c1f79c6b3c
Terminating gateway discovery (#7571)
* Enable discovering terminating gateways

* Add TerminatingGatewayServices to state store

* Use GatewayServices RPC endpoint for ingress/terminating
2020-04-08 12:37:24 -06:00
Matt Keeler 42f02e80c3
Enable filtering language support for the v1/connect/intentions… (#7593)
* Enable filtering language support for the v1/connect/intentions listing API

* Update website for filtering of Intentions

* Update website/source/api/connect/intentions.html.md
2020-04-07 11:48:44 -04:00
Matt Keeler 5d0e661203
Ensure that token clone copies the roles (#7577) 2020-04-02 12:09:35 -04:00
Emre Savcı 7a99f29adc
agent: add len, cap while initializing arrays 2020-04-01 10:54:51 +02:00
Freddy 8a1e53754e
Add config entry for terminating gateways (#7545)
This config entry will be used to configure terminating gateways.

It accepts the name of the gateway and a list of services the gateway will represent.

For each service users will be able to specify: its name, namespace, and additional options for TLS origination.

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-03-31 13:27:32 -06:00
Kyle Havlovitz 01a23b8eb4
Add config entry/state for Ingress Gateways (#7483)
* Add Ingress gateway config entry and other relevant structs

* Add api package tests for ingress gateways

* Embed EnterpriseMeta into ingress service struct

* Add namespace fields to api module and test consul config write decoding

* Don't require a port for ingress gateways

* Add snakeJSON and camelJSON cases in command test

* Run Normalize on service's ent metadata

Sadly cannot think of a way to test this in OSS.

* Every protocol requires at least 1 service

* Validate ingress protocols

* Update agent/structs/config_entry_gateways.go

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-03-31 11:59:10 -05:00
Matt Keeler 35c8e996c3
Ensure server requirements checks are done against ALL known se… (#7491)
Co-authored-by: Paul Banks <banks@banksco.de>
2020-03-27 12:31:43 -04:00
Daniel Nephin a2eb66963c
Merge pull request #7516 from hashicorp/dnephin/remove-unused-method
agent: Remove unused method Encrypted from delegate interface
2020-03-26 14:17:58 -04:00
Daniel Nephin ebb851f32d agent: Remove unused Encrypted from interface
It appears to be unused. It looks like it has been around a while,
I geuss at some point we stopped using this method.
2020-03-26 12:34:31 -04:00
Freddy cb55fa3742
Enable CLI to register terminating gateways (#7500)
* Enable CLI to register terminating gateways

* Centralize gateway proxy configuration
2020-03-26 10:20:56 -06:00
Alejandro Baez 7d68d7eaa6
Add PolicyReadByName for API (#6615) 2020-03-25 10:34:24 -04:00
Matt Keeler 58e2969fc1
Fix ACL mode advertisement and detection (#7451)
These changes are necessary to ensure advertisement happens correctly even when datacenters are connected via network areas in Consul enterprise.

This also changes how we check if ACLs can be upgraded within the local datacenter. Previously we would iterate through all LAN members. Now we just use the ServerLookup type to iterate through all known servers in the DC.
2020-03-16 12:54:45 -04:00
Freddy 8a7ff69b19
Update MSP token and filtering (#7431) 2020-03-11 12:08:49 -06:00
R.B. Boyer 10d3ff9a4f
server: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#7419)
Fixes #7414
2020-03-10 11:15:22 -05:00
Kyle Havlovitz 520d464c85
Merge pull request #7373 from hashicorp/acl-segments-fix
Add stub methods for ACL/segment bug fix from enterprise
2020-03-09 14:25:49 -07:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler b684138882 Fix session backwards incompatibility with 1.6.x and earlier. 2020-03-05 15:34:55 -05:00
Kyle Havlovitz b05ebe2507 Add stub methods for ACL/segment bug fix from enterprise 2020-03-02 10:30:23 -08:00
rerorero b366a25179
fix: Destroying a session that doesn't exist returns status cod… (#6905)
fix #6840
2020-02-18 11:13:15 -05:00
Matt Keeler be0d6efac9
Allow the PolicyResolve and RoleResolve endpoints to process na… (#7296) 2020-02-13 14:55:27 -05:00
R.B. Boyer 919741838d
fix use of hclog logger (#7264) 2020-02-12 09:37:16 -06:00
ShimmerGlass a27ccc7248
agent: add server raft.{last,applied}_index gauges (#6694)
These metrics are useful for :
* Tracking the rate of update to the db
* Allow to have a rough idea of when an index originated
2020-02-11 10:50:18 +01:00
Hans Hasselberg 71ce832990
connect: add validations around intermediate cert ttl (#7213) 2020-02-11 00:05:49 +01:00
R.B. Boyer c37d00791c
make the TestRPC_RPCMaxConnsPerClient test less flaky (#7255) 2020-02-10 15:13:53 -06:00
Sarah Christoff 85d2714c76
Fix flaky TestAutopilot_BootstrapExpect (#7242) 2020-02-10 14:52:58 -06:00
Kit Patella d28bc1acbe
rpc: measure blocking queries (#7224)
* agent: measure blocking queries

* agent.rpc: update docs to mention we only record blocking queries

* agent.rpc: make go fmt happy

* agent.rpc: fix non-atomic read and decrement with bitwise xor of uint64 0

* agent.rpc: clarify review question

* agent.rpc: today I learned that one must declare all variables before interacting with goto labels

* Update agent/consul/server.go

agent.rpc: more precise comment on `Server.queriesBlocking`

Co-Authored-By: Paul Banks <banks@banksco.de>

* Update website/source/docs/agent/telemetry.html.md

agent.rpc: improve queries_blocking description

Co-Authored-By: Paul Banks <banks@banksco.de>

* agent.rpc: fix some bugs found in review

* add a note about the updated counter behavior to telemetry.md

* docs: add upgrade-specific note on consul.rpc.quer{y,ies_blocking} behavior

Co-authored-by: Paul Banks <banks@banksco.de>
2020-02-10 10:01:15 -08:00
Matt Keeler 966d085066
Catalog + Namespace OSS changes. (#7219)
* Various Prepared Query + Namespace things

* Last round of OSS changes for a namespaced catalog
2020-02-10 10:40:44 -05:00
R.B. Boyer b4325dfbce
agent: ensure that we always use the same settings for msgpack (#7245)
We set RawToString=true so that []uint8 => string when decoding an interface{}.
We set the MapType so that map[interface{}]interface{} decodes to map[string]interface{}.

Add tests to ensure that this doesn't break existing usages.

Fixes #7223
2020-02-07 15:50:24 -06:00
Freddy aca8b85440
Remove outdated TODO (#7244) 2020-02-07 13:14:48 -07:00
Matt Keeler f610d1d791
Fix a bug with ACL enforcement of reads on namespaced config entries. (#7239) 2020-02-07 08:30:40 -05:00
Kit Patella aa9db3f903
agent/consul server: fix LeaderTest_ChangeNodeID (#7236)
* fix LeaderTest_ChangeNodeID to use StatusLeft and add waitForAnyLANLeave

* unextract the waitFor... fn, simplify, and provide a more descriptive error
2020-02-06 16:37:53 -08:00
Matt Keeler 2524a028ea
OSS Changes for various config entry namespacing bugs (#7226) 2020-02-06 10:52:25 -05:00
R.B. Boyer a67001aa22
agent: differentiate wan vs lan loggers in memberlist and serf (#7205)
This should be a helpful change until memberlist and serf can be
properly switched to native hclog.
2020-02-05 09:52:43 -06:00
Matt Keeler 119168203b
Fix disco chain graph validation for namespaces (#7217)
Previously this happened to be validating only the chains in the default namespace. Now it will validate all chains in all namespaces when the global proxy-defaults is changed.
2020-02-05 10:06:27 -05:00
Matt Keeler 3621f7090b
Minor Non-Functional Updates (#7215)
* Cleanup the discovery chain compilation route handling

Nothing functionally should be different here. The real difference is that when creating new targets or handling route destinations we use the router config entries name and namespace instead of that of the top level request. Today they SHOULD always be the same but that may not always be the case. This hopefully also makes it easier to understand how the router entries are handled.

* Refactor a small bit of the service manager tests in oss

We used to use the stringHash function to compute part of the filename where things would get persisted to. This has been changed in the core code to calling the StringHash method on the ServiceID type. It just so happens that the new method will output the same value for anything in the default namespace (by design actually). However, logically this filename computation in the test should do the same thing as the core code itself so I updated it here.

Also of note is that newer enterprise-only tests for the service manager cannot use the old stringHash function at all because it will produce incorrect results for non-default namespaces.
2020-02-05 10:06:11 -05:00
Freddy 67e02a0752
Add managed service provider token (#7218)
Stubs for enterprise-only ACL token to be used by managed service providers.
2020-02-04 13:58:56 -07:00
Hans Hasselberg a9f9ed83cb
agent: increase watchLimit to 8192. (#7200)
The previous value was too conservative and users with many instances
were having problems because of it. This change increases the limit to
8192 which reportedly fixed most of the issues with that.

Related: #4984, #4986, #5050.
2020-02-04 13:11:30 +01:00
Davor Kapsa c280dd8549
auto_encrypt: check previously ignored error (#6604) 2020-02-03 10:35:11 +01:00
Hans Hasselberg 50281032e0
Security fixes (#7182)
* Mitigate HTTP/RPC Services Allow Unbounded Resource Usage

Fixes #7159.

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2020-01-31 11:19:37 -05:00
Matt Keeler 26bb1584c1
Updates to the Txn API for namespaces (#7172)
* Updates to the Txn API for namespaces

* Update agent/consul/txn_endpoint.go

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>

Co-authored-by: R.B. Boyer <public@richardboyer.net>
2020-01-30 13:12:26 -05:00
Matt Keeler 3f253080a2
Sync some feature flag support from enterprise (#7167) 2020-01-29 13:21:38 -05:00
R.B. Boyer 01ebdff2a9
various tweaks on top of the hclog work (#7165) 2020-01-29 11:16:08 -06:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
Kit Patella 49e9bbbdf9
Add accessorID of token when ops are denied by ACL system (#7117)
* agent: add and edit doc comments

* agent: add ACL token accessorID to debugging traces

* agent: polish acl debugging

* agent: minor fix + string fmt over value interp

* agent: undo export & fix logging field names

* agent: remove note and migrate up to code review

* Update agent/consul/acl.go

Co-Authored-By: Matt Keeler <mkeeler@users.noreply.github.com>

* agent: incorporate review feedback

* Update agent/acl.go

Co-Authored-By: R.B. Boyer <public@richardboyer.net>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: R.B. Boyer <public@richardboyer.net>
2020-01-27 11:54:32 -08:00
Matt Keeler 485a0a65ea
Updates to Config Entries and Connect for Namespaces (#7116) 2020-01-24 10:04:58 -05:00
Matt Keeler 90b9f87160
Add the v1/catalog/node-services/:node endpoint (#7115)
The backing RPC already existed but the endpoint will be useful for other service syncing processes such as consul-k8s as this endpoint can return all services registered with a node regardless of namespacing.
2020-01-24 09:27:25 -05:00
Hans Hasselberg 5379cf7c67
raft: increase raft notify buffer. (#6863)
* Increase raft notify buffer.

Fixes https://github.com/hashicorp/consul/issues/6852.

Increasing the buffer helps recovering from leader flapping. It lowers
the chances of the flapping leader to get into a deadlock situation like
described in #6852.
2020-01-22 16:15:59 +01:00
Hans Hasselberg d52a4e3b82
tests: fix autopilot test (#7092) 2020-01-21 14:09:51 +01:00
Hans Hasselberg 43392d5db3
raft: update raft to v1.1.2 (#7079)
* update raft
* use hclogger for raft.
2020-01-20 13:58:02 +01:00
Hans Hasselberg 315ba7d6ad
connect: check if intermediate cert needs to be renewed. (#6835)
Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates.
This PR adds a check that renews the cert if it is half way through its validity period.

In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.
2020-01-17 23:27:13 +01:00
Hans Hasselberg b6c83e06d5
auto_encrypt: set dns and ip san for k8s and provide configuration (#6944)
* Add CreateCSRWithSAN
* Use CreateCSRWithSAN in auto_encrypt and cache
* Copy DNSNames and IPAddresses to cert
* Verify auto_encrypt.sign returns cert with SAN
* provide configuration options for auto_encrypt dnssan and ipsan
* rename CreateCSRWithSAN to CreateCSR
2020-01-17 23:25:26 +01:00
Matej Urbas d877e091d6 agent: configurable MaxQueryTime and DefaultQueryTime. (#3777) 2020-01-17 14:20:57 +01:00
Matt Keeler c8294b8595
AuthMethod updates to support alternate namespace logins (#7029) 2020-01-14 10:09:29 -05:00
Matt Keeler baa89c7c65
Intentions ACL enforcement updates (#7028)
* Renamed structs.IntentionWildcard to structs.WildcardSpecifier

* Refactor ACL Config

Get rid of remnants of enterprise only renaming.

Add a WildcardName field for specifying what string should be used to indicate a wildcard.

* Add wildcard support in the ACL package

For read operations they can call anyAllowed to determine if any read access to the given resource would be granted.

For write operations they can call allAllowed to ensure that write access is granted to everything.

* Make v1/agent/connect/authorize namespace aware

* Update intention ACL enforcement

This also changes how intention:read is granted. Before the Intention.List RPC would allow viewing an intention if the token had intention:read on the destination. However Intention.Match allowed viewing if access was allowed for either the source or dest side. Now Intention.List and Intention.Get fall in line with Intention.Matches previous behavior.

Due to this being done a few different places ACL enforcement for a singular intention is now done with the CanRead and CanWrite methods on the intention itself.

* Refactor Intention.Apply to make things easier to follow.
2020-01-13 15:51:40 -05:00
Pierre Souchay 61fc4f8253 rpc: log method when a server/server RPC call fails (#4548)
Sometimes, we have lots of errors in cross calls between DCs (several hundreds / sec)
Enrich the log in order to help diagnose the root cause of issue.
2020-01-13 19:55:29 +01:00
R.B. Boyer 20f51f9181 connect: derive connect certificate serial numbers from a memdb index instead of the provider table max index (#7011) 2020-01-09 16:32:19 +01:00
R.B. Boyer 446f0533cd connect: ensure that updates to the secondary root CA configuration use the correct signing key ID values for comparison (#7012)
Fixes #6886
2020-01-09 16:28:16 +01:00
R.B. Boyer 42f80367be
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases (#6948)
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases

Namely one omission was that check updates for dumped services were not
unblocking.

Also adds a ServiceDump state store test and also fix a watch bug with the
normal dump.

Follow-on from #6916
2019-12-19 10:15:37 -06:00
Matt Keeler 6de4eb8569
OSS changes for implementing token based namespace inferencing
remove debug log
2019-12-18 14:07:08 -05:00
Matt Keeler 185654b075
Unflake the TestACLEndpoint_TokenList test
In order to do this I added a waitForLeaderEstablishment helper which does the right thing to ensure that leader establishment has finished.

fixup
2019-12-18 14:07:07 -05:00
Matt Keeler 8af12bf4f4
Miscellaneous acl package cleanup
• Renamed EnterpriseACLConfig to just Config
• Removed chained_authorizer_oss.go as it was empty
• Renamed acl.go to errors.go to more closely describe its contents
2019-12-18 13:44:32 -05:00
Matt Keeler bdf025a758
Rename EnterpriseAuthorizerContext -> AuthorizerContext 2019-12-18 13:43:24 -05:00
Preetha f607a00138 autopilot: fix dead server removal condition to use correct failure tolerance (#4017)
* Make dead server removal condition in autopilot use correct failure tolerance rules
* Introduce func with explanation
2019-12-16 23:35:13 +01:00
Matt Keeler 9812b32155
Fix blocking for ServiceDumping by kind (#6919) 2019-12-10 13:58:30 -05:00
Matt Keeler 442924c35a
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
Hans Hasselberg a36e58c964
agent: fewer file local differences between enterprise and oss (#6820) (#6898)
* Increase number to test ignore. Consul Enterprise has more flags and since we are trying to reduce the differences between both code bases, we are increasing the number in oss. The semantics don't change, it is just a cosmetic thing.
* Introduce agent.initEnterprise for enterprise related hooks.
* Sync test with ent version.
* Fix import order.
* revert error wording.
2019-12-06 21:35:58 +01:00
Matt Keeler 609c9dab02
Miscellaneous Fixes (#6896)
Ensure we close the Sentinel Evaluator so as not to leak go routines

Fix a bunch of test logging so that various warnings when starting a test agent go to the ltest logger and not straight to stdout.

Various canned ent meta types always return a valid pointer (no more nils). This allows us to blindly deref + assign in various places.

Update ACL index tracking to ensure oss -> ent upgrades will work as expected.

Update ent meta parsing to include function to disallow wildcarding.
2019-12-06 14:01:34 -05:00
Matt Keeler c15c81a7ed
[Feature] API: Add a internal endpoint to query for ACL authori… (#6888)
* Implement endpoint to query whether the given token is authorized for a set of operations

* Updates to allow for remote ACL authorization via RPC

This is only used when making an authorization request to a different datacenter.
2019-12-06 09:25:26 -05:00
Matt Keeler f30af37d11
Fix the TestLeader_SecondaryCA_IntermediateRefresh test flakiness 2019-12-04 19:19:55 -05:00
Matt Keeler 90ae4a1f1e
OSS KV Modifications to Support Namespaces 2019-11-25 12:57:35 -05:00
Matt Keeler 68d79142c4
OSS Modifications necessary for sessions namespacing 2019-11-25 12:07:04 -05:00
Paul Banks a84b82b3df
connect: Add AWS PCA provider (#6795)
* Update AWS SDK to use PCA features.

* Add AWS PCA provider

* Add plumbing for config, config validation tests, add test for inheriting existing CA resources created by user

* Unparallel the tests so we don't exhaust PCA limits

* Merge updates

* More aggressive polling; rate limit pass through on sign; Timeout on Sign and CA create

* Add AWS PCA docs

* Fix Vault doc typo too

* Doc typo

* Apply suggestions from code review

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>

* Doc fixes; tests for erroring if State is modified via API

* More review cleanup

* Uncomment tests!

* Minor suggested clean ups
2019-11-21 17:40:29 +00:00
Paul Banks 9e17aa3b41
Change CA Configure struct to pass Datacenter through (#6775)
* Change CA Configure struct to pass Datacenter through

* Remove connect/ca/plugin as we don't have immediate plans to use it.

We still intend to one day but there are likely to be several changes to the CA provider interface before we do so it's better to rebuild from history when we do that work properly.

* Rename PrimaryDC; fix endpoint in secondary DCs
2019-11-18 14:22:19 +00:00
Paul Banks 1197b43c7b
Support Connect CAs that can't cross sign (#6726)
* Support Connect CAs that can't cross sign

* revert spurios mod changes from make tools

* Add log warning when forcing CA rotation

* Fixup SupportsCrossSigning to report errors and work with Plugin interface (fixes tests)

* Fix failing snake_case test

* Remove misleading comment

* Revert "Remove misleading comment"

This reverts commit bc4db9cabed8ad5d0e39b30e1fe79196d248349c.

* Remove misleading comment

* Regen proto files messed up by rebase
2019-11-11 21:36:22 +00:00
Paul Banks ca96d5fa72
connect: Allow CA Providers to store small amount of state (#6751)
* pass logger through to provider

* test for proper operation of NeedsLogger

* remove public testServer function

* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault

* Fix all the other places in tests where we set the logger

* Allow CA Providers to persist some state

* Update CA provider plugin interface

* Fix plugin stubs to match provider changes

* Update agent/connect/ca/provider.go

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>

* Cleanup review comments
2019-11-11 20:57:16 +00:00
Todd Radel 19a3892f71 connect: Implement NeedsLogger interface for CA providers (#6556)
* add NeedsLogger to Provider interface

* implements NeedsLogger in default provider

* pass logger through to provider

* test for proper operation of NeedsLogger

* remove public testServer function

* Switch test to actually assert on logging output rather than reflection.

--amend

* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault

* Fix all the other places in tests where we set the logger

* Add TODO comment
2019-11-11 20:30:01 +00:00
Todd Radel e100fda218 Make all Connect Cert Common Names valid FQDNs (#6423) 2019-11-11 17:11:54 +00:00
Matt Keeler 7081643191
Fill the Authz Context with a Sentinel Scope (#6729) 2019-11-01 17:05:22 -04:00
Matt Keeler c71ea7056f
Miscellaneous fixes (#6727) 2019-11-01 16:11:44 -04:00
Paul Banks 5f405c3277
Fix support for RSA CA keys in Connect. (#6638)
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.

* Ensure key type ad bits are populated from CA cert and clean up tests

* Add integration test and fix error when initializing secondary CA with RSA key.

* Add more tests, fix review feedback

* Update docs with key type config and output

* Apply suggestions from code review

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
2019-11-01 13:20:26 +00:00
Matt Keeler 21f98f426e
Add hook for validating the enterprise meta attached to a reque… (#6695) 2019-10-30 12:42:39 -04:00
Matt Keeler c2d9041c0f
PreVerify acl:read access for listing endpoints (#6696)
We still will need to filter results based on the authorizer too but this helps to give an early 403.
2019-10-30 09:10:11 -04:00
Sarah Christoff 86b30bbfbe
Set MinQuorum variable in Autopilot (#6654)
* Add MinQuorum to Autopilot
2019-10-29 09:04:41 -05:00
Matt Keeler 0fc2c95255
More Replication Abstractions (#6689)
Also updated ACL replication to use a function to fill in the desired enterprise meta for all remote listing RPCs.
2019-10-28 13:49:57 -04:00
Matt Keeler 87c44a3b8d
Ensure that cache entries for tokens are prefixed “token-secret… (#6688)
This will be necessary once we store other types of identities in here.
2019-10-25 13:05:43 -04:00
Matt Keeler a688ea952d
Update the ACL Resolver to allow for Consul Enterprise specific hooks. (#6687) 2019-10-25 11:06:16 -04:00
Matt Keeler 1270a93274
Updates to allow for Namespacing ACL resources in Consul Enterp… (#6675)
Main Changes:

• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.

Secondary Changes:

• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
2019-10-24 14:38:09 -04:00
Freddy caf658d0d3
Store check type in catalog (#6561) 2019-10-17 20:33:11 +02:00
R.B. Boyer e74a6c44f1
server: ensure the primary dc and ACL dc match (#6634)
This is mostly a sanity check for server tests that skip the normal
config builder equivalent fixup.
2019-10-17 10:57:17 -05:00
R.B. Boyer bc22eb8090
unflake TestLeader_SecondaryCA_Initialize (#6631) 2019-10-16 16:49:01 -05:00
R.B. Boyer 3ae748c7a4
fix flaky multidc acl tests that failed to wait for token replication (#6628)
If acls have not yet replicated to the secondary then authz requests
will be remotely resolved by the primary. Now these tests explicitly
wait until replication has caught up first.
2019-10-16 12:24:29 -05:00
R.B. Boyer a4c5b8e85c
appease the retry linter (#6629) 2019-10-16 11:39:22 -05:00
Paul Banks 979ad7fecb
Allow time for secondary CA to initialize (#6627) 2019-10-16 17:03:31 +01:00
Matt Keeler f9a43a1e2d
ACL Authorizer overhaul (#6620)
* ACL Authorizer overhaul

To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.

Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.

* Include stub for extra enterprise rules in the global management policy

* Allow for an upgrade of the global-management policy
2019-10-15 16:58:50 -04:00
R.B. Boyer 9a51ecc98b
agent: clients should only attempt to remove pruned nodes once per call (#6591) 2019-10-07 16:15:23 -05:00
Sarah Christoff 9b93dd93c9
Prune Unhealthy Agents (#6571)
* Add -prune flag to ForceLeave
2019-10-04 16:10:02 -05:00
Matt Keeler b0b57588d1
Implement Leader Routine Management (#6580)
* Implement leader routine manager

Switch over the following to use it for go routine management:

• Config entry Replication
• ACL replication - tokens, policies, roles and legacy tokens
• ACL legacy token upgrade
• ACL token reaping
• Intention Replication
• Secondary CA Roots Watching
• CA Root Pruning

Also added the StopAll call into the Server Shutdown method to ensure all leader routines get killed off when shutting down.

This should be mostly unnecessary as `revokeLeadership` should manually stop each one but just in case we really want these to go away (eventually).
2019-10-04 13:08:45 -04:00
Matt Keeler 9bd378a95c
Add EnterpriseConfig stubs (#6566) 2019-10-01 14:34:55 -04:00
R.B. Boyer 8433ef02a8
connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate (#6513)
This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
2019-09-26 11:54:14 -05:00
Matt Keeler 5b83f589da
Expand the QueryOptions and QueryMeta interfaces (#6545)
In a previous PR I made it so that we had interfaces that would work enough to allow blockingQueries to work. However to complete this we need all fields to be settable and gettable.

Notes:
   • If Go ever gets contracts/generics then we could get rid of all the Getters/Setters
   • protoc / protoc-gen-gogo are going to generate all the getters for us.
   • I copied all the getters/setters from the protobuf funcs into agent/structs/protobuf_compat.go
   • Also added JSON marshaling funcs that use jsonpb for protobuf types.
2019-09-26 09:55:02 -04:00
Freddy 5eace88ce2
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
Matt Keeler 8885c8d318
Allow for enterprise only leader routines (#6533)
Eventually I am thinking we may need a way to register these at different priority levels but for now sticking this here is fine
2019-09-23 20:09:56 -04:00
R.B. Boyer cc889443a5
connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492)
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.

The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
2019-09-23 12:52:35 -05:00
Matt Keeler 8431c5f533
Add support for implementing new requests with protobufs instea… (#6502)
* Add build system support for protobuf generation

This is done generically so that we don’t have to keep updating the makefile to add another proto generation.

Note: anything not in the vendor directory and with a .proto extension will be run through protoc if the corresponding namespace.pb.go file is not up to date.

If you want to rebuild just a single proto file you can do so with: make proto-rebuild PROTOFILES=<list of proto files to rebuild>

Providing the PROTOFILES var will override the default behavior of finding all the .proto files.

* Start adding types to the agent/proto package

These will be needed for some other work and are by no means comprehensive.

* Add ability to resolve/fixup the agentpb.ACLLinks structure in the state store.

* Use protobuf marshalling of raft requests instead of msgpack for protoc generated types.

This does not change any encoding of existing types.

* Removed structs package automatically encoding with protobuf marshalling

Instead the caller of raftApply that wants to opt-in to protobuf encoding will have to call `raftApplyProtobuf`

* Run update-vendor to fixup modules.txt

Nothing changed as far as dependencies go but the ordering of modules in that file depends on the time they are first seen and its not alphabetical.

* Rename some things and implement the structs.RPCInfo interface bits

agentpb.QueryOptions and agentpb.WriteRequest implement 3 of the 4 RPCInfo funcs and the new TargetDatacenter message type implements the fourth.

* Use the right encoding function.

* Renamed agent/proto package to agent/agentpb to prevent package name conflicts

* Update modules.txt to fix ordering

* Change blockingQuery to take in interfaces for the query options and meta

* Add %T to error output.

* Add/Update some comments
2019-09-20 14:37:22 -04:00
R.B. Boyer 5c5f21088c sdk: add freelist tracking and ephemeral port range skipping to freeport
This should cut down on test flakiness.

Problems handled:

- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.

- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).

- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
2019-09-17 14:30:43 -05:00
R.B. Boyer edf5347d3c fix typo of 'unknown' in log messages 2019-09-13 15:59:49 -05:00
Hans Hasselberg f025a7440d
agent: handleEnterpriseLeave (#6453) 2019-09-11 11:01:37 +02:00
Pierre Souchay 6d13efa828 Distinguish between DC not existing and not being available (#6399) 2019-09-03 09:46:24 -06:00
Matt Keeler 31d9d2e557
Store primaries root in secondary after intermediate signature (#6333)
* Store primaries root in secondary after intermediate signature

This ensures that the intermediate exists within the CA root stored in raft and not just in the CA provider state. This has the very nice benefit of actually outputting the intermediate cert within the ca roots HTTP/RPC endpoints.

This change means that if signing the intermediate fails it will not set the root within raft. So far I have not come up with a reason why that is bad. The secondary CA roots watch will pull the root again and go through all the motions. So as soon as getting an intermediate CA works the root will get set.

* Make TestAgentAntiEntropy_Check_DeferSync less flaky

I am not sure this is the full fix but it seems to help for me.
2019-08-30 11:38:46 -04:00
Pierre Souchay 35d90fc899 Display IPs of machines when node names conflict to ease troubleshooting
When there is an node name conflicts, such messages are displayed within Consul:

`consul.fsm: EnsureRegistration failed: failed inserting node: Error while renaming Node ID: "e1d456bc-f72d-98e5-ebb3-26ae80d785cf": Node name node001 is reserved by node 05f10209-1b9c-b90c-e3e2-059e64556d4a with name node001`

While it is easy to find the node that has reserved the name, it is hard to find
the node trying to aquire the name since it is not registered, because it
is not part of `consul members` output

This PR will display the IP of the offender and solve far more easily those issues.
2019-08-28 15:57:05 -04:00
Alvin Huang e4e9381851
revert commits on master (#6413) 2019-08-27 17:45:58 -04:00
tradel 2838a1550a update tests to match new method signatures 2019-08-27 14:16:39 -07:00
tradel 93c839b76c confi\gure providers with DC and domain 2019-08-27 14:16:25 -07:00
tradel 1acde6e30a create a common name for autoTLS agent certs 2019-08-27 14:15:53 -07:00
Alvin Huang 9662b7c01a
add nil pointer check for pointer to ACLToken struct (#6407) 2019-08-27 11:23:28 -04:00
Hans Hasselberg 4f7a3e8fa8 make sure auto_encrypt has private key type and bits 2019-08-26 13:09:50 +02:00
R.B. Boyer 2d4a3b51d0
Merge pull request #6388 from hashicorp/release/1-6
merging release/1-6 into master
2019-08-23 13:44:46 -05:00
Matt Keeler 89ac998e8b
Secondary CA `establishLeadership` fix (#6383)
This prevents ACL issues (or other issues) during intermediate CA cert signing from failing leader establishment.
2019-08-23 11:32:37 -04:00
Hans Hasselberg aada537d87
auto_encrypt: use server-port (#6287)
AutoEncrypt needs the server-port because it wants to talk via RPC. Information from gossip might not be available at that point and thats why the server-port is being used.
2019-08-23 10:18:46 +02:00
Matt Keeler 8cb0560f52
Ensure that config entry writes are forwarded to the primary DC (#6339) 2019-08-20 12:01:13 -04:00
R.B. Boyer 0675e0606e
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer d6456fddeb
connect: introduce ExternalSNI field on service-defaults (#6324)
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.

Setting ExternalSNI will disable mesh gateway use for that target. It also 
disables several service-resolver features that do not make sense for an 
external service.
2019-08-19 12:19:44 -05:00
R.B. Boyer f84f509ce4
connect: updating a service-defaults config entry should leave an unset protocol alone (#6342)
If the entry is updated for reasons other than protocol it is surprising
that the value is explicitly persisted as 'tcp' rather than leaving it
empty and letting it fall back dynamically on the proxy-defaults value.
2019-08-19 10:44:06 -05:00
Matt Keeler 73888eed36
Filter out left/leaving serf members when determining if new AC… (#6332) 2019-08-16 10:34:18 -04:00
R.B. Boyer 22ee60d1ba
agent: blocking central config RPCs iterations should not interfere with each other (#6316) 2019-08-14 09:08:46 -05:00
hashicorp-ci 29767157ed Merge Consul OSS branch 'master' at commit 8f7586b339dbb518eff3a2eec27d7b8eae7a3fbb 2019-08-13 02:00:43 +00:00
Sarah Adams 2f7a90bc52
add flag to allow /operator/keyring requests to only hit local servers (#6279)
Add parameter local-only to operator keyring list requests to force queries to only hit local servers (no WAN traffic).

HTTP API: GET /operator/keyring?local-only=true
CLI: consul keyring -list --local-only

Sending the local-only flag with any non-GET/list request will result in an error.
2019-08-12 11:11:11 -07:00
Mike Morris 88df658243
connect: remove managed proxies (#6220)
* connect: remove managed proxies implementation and all supporting config options and structs

* connect: remove deprecated ProxyDestination

* command: remove CONNECT_PROXY_TOKEN env var

* agent: remove entire proxyprocess proxy manager

* test: remove all managed proxy tests

* test: remove irrelevant managed proxy note from TestService_ServerTLSConfig

* test: update ContentHash to reflect managed proxy removal

* test: remove deprecated ProxyDestination test

* telemetry: remove managed proxy note

* http: remove /v1/agent/connect/proxy endpoint

* ci: remove deprecated test exclusion

* website: update managed proxies deprecation page to note removal

* website: remove managed proxy configuration API docs

* website: remove managed proxy note from built-in proxy config

* website: add note on removing proxy subdirectory of data_dir
2019-08-09 15:19:30 -04:00
R.B. Boyer 357ca39868
connect: ensure intention replication continues to work when the replication ACL token changes (#6288) 2019-08-07 11:34:09 -05:00
hashicorp-ci 3ac803da5e Merge Consul OSS branch 'master' at commit d84863799deca45ccf4bec5ab9f645ccae6b3aeb 2019-08-06 02:00:30 +00:00
Sarah Adams 9ed3e64510
fallback to proxy config global protocol when upstream services' protocol is unset (#6277)
fallback to proxy config global protocol when upstream services' protocol is unset

Fixes #5857
2019-08-05 12:52:35 -07:00
R.B. Boyer 64fc002e03
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer 0165e93517
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
Todd Radel 295abd82c3
connect: generate intermediate at same time as root (#6272)
Generate intermediate at same time as root
Co-Authored-By: Freddy <freddygv@users.noreply.github.com>
2019-08-02 15:36:03 -04:00
R.B. Boyer 4e2fb5730c
connect: detect and prevent circular discovery chain references (#6246) 2019-08-02 09:18:45 -05:00
R.B. Boyer 6c9edb17c2
server: if inserting bootstrap config entries fails don't silence the errors (#6256) 2019-08-01 23:07:11 -05:00
R.B. Boyer 782c647bf4
connect: simplify the compiled discovery chain data structures (#6242)
This should make them better for sending over RPC or the API.

Instead of a chain implemented explicitly like a linked list (nodes
holding pointers to other nodes) instead switch to a flat map of named
nodes with nodes linking other other nodes by name. The shipped
structure is just a map and a string to indicate which key to start
from.

Other changes:

* inline the compiler option InferDefaults as true

* introduce compiled target config to avoid needing to send back
  additional maps of Resolvers; future target-specific compiled state
  can go here

* move compiled MeshGateway out of the Resolver and into the
  TargetConfig where it makes more sense.
2019-08-01 22:44:05 -05:00
R.B. Boyer 4666599e18
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
Paul Banks a5c70d79d0 Revert "connect: support AWS PCA as a CA provider" (#6251)
This reverts commit 3497b7c00d49c4acbbf951d84f2bba93f3da7510.
2019-07-31 09:08:10 -04:00
Todd Radel d3b7fd83fe
connect: support AWS PCA as a CA provider (#6189)
Port AWS PCA provider from consul-ent
2019-07-30 22:57:51 -04:00
Todd Radel 1b14d6595e
connect: Support RSA keys in addition to ECDSA (#6055)
Support RSA keys in addition to ECDSA
2019-07-30 17:47:39 -04:00
Matt Keeler a7c4b7af7c
Fix CA Replication when ACLs are enabled (#6201)
Secondary CA initialization steps are:

• Wait until the primary will be capable of signing intermediate certs. We use serf metadata to check the versions of servers in the primary which avoids needing a token like the previous implementation that used RPCs. We require at least one alive server in the primary and the all alive servers meet the version requirement.
• Initialize the secondary CA by getting the primary to sign an intermediate

When a primary dc is configured, if no existing CA is initialized and for whatever reason we cannot initialize a secondary CA the secondary DC will remain without a CA. As soon as it can it will initialize the secondary CA by pulling the primaries roots and getting the primary to sign an intermediate.

This also fixes a segfault that can happen during leadership revocation. There was a spot in the secondaryCARootsWatch that was getting the CA Provider and executing methods on it without nil checking. Under normal circumstances it wont be nil but during leadership revocation it gets nil'ed out. Therefore there is a period of time between closing the stop chan and when the go routine is actually stopped where it could read a nil provider and cause a segfault.
2019-07-26 15:57:57 -04:00
R.B. Boyer 1b95d2e5e3 Merge Consul OSS branch master at commit b3541c4f34d43ab92fe52256420759f17ea0ed73 2019-07-26 10:34:24 -05:00
Matt Keeler c4a34602b6
Allow forwarding of some status RPCs (#6198)
* Allow forwarding of some status RPCs

* Update docs

* add comments about not using the regular forward
2019-07-25 14:26:22 -04:00
Jeff Mitchell e266b038cc Make the chunking test multidimensional (#6212)
This ensures that it's not just a single operation we restores
successfully, but many. It's the same foundation, just with multiple
going on at once.
2019-07-25 11:40:09 +01:00
Freddy 7dbbe7e55a
auto-encrypt: Fix port resolution and fallback to default port (#6205)
Auto-encrypt meant to fallback to the default port when it wasn't provided, but it hadn't been because of an issue with the error handling. We were checking against an incomplete error value:
"missing port in address" vs "address $HOST: missing port in address"

Additionally, all RPCs to AutoEncrypt.Sign were using a.config.ServerPort, so those were updated to use ports resolved by resolveAddrs, if they are available.
2019-07-24 16:49:37 -07:00
Jeff Mitchell e0068431f5 Chunking support (#6172)
* Initial chunk support

This uses the go-raft-middleware library to allow for chunked commits to the KV
2019-07-24 17:06:39 -04:00
Freddy 1b97d65873
Make new config when retrying testServer creation (#6204) 2019-07-24 08:41:00 -06:00
Alvin Huang 5b6fa58453 resolve circleci config conflicts 2019-07-23 20:18:36 -04:00
Freddy c19f46639b
Restore NotifyListen to avoid panic in newServer retry (#6200) 2019-07-23 14:33:00 -06:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
hashicorp-ci 8b109e5f9f Merge Consul OSS branch 'master' at commit ef257b084d2e2a474889518440515e360d0cd990 2019-07-20 02:00:29 +00:00
Christian Muehlhaeuser 26f9368567 Fixed typos in comments (#6175)
Just a few nitpicky typo fixes.
2019-07-19 07:54:53 -04:00
Christian Muehlhaeuser 877bfd280b Fixed a few tautological condition mistakes (#6177)
None of these changes should have any side-effects. They're merely
fixing tautological mistakes.
2019-07-19 07:53:42 -04:00
Christian Muehlhaeuser d1426767f6 Fixed nil check for token (#6179)
I can only assume we want to check for the retrieved `updatedToken` to not be
nil, before accessing it below.

`token` can't possibly be nil at this point, as we accessed `token.AccessorID`
just before.
2019-07-19 07:48:11 -04:00
Alvin Huang 17654c6292 Merge branch 'master' into release/1-6 2019-07-17 15:43:30 -04:00
Freddy f59e6db9b1
Reduce number of servers in TestServer_Expect_NonVoters (#6155) 2019-07-17 11:35:33 -06:00
Freddy 476a4b95a5
More flaky test fixes (#6151)
* Add retry to TestAPI_ClientTxn

* Add retry to TestLeader_RegisterMember

* Account for empty watch result in ConnectRootsWatch
2019-07-17 09:33:38 -06:00
hashicorp-ci 022483aff0 Merge Consul OSS branch 'master' at commit 95dbb7f2f1b9fc3528a16335201e2324f1b388bd 2019-07-17 02:00:21 +00:00
Freddy 99601aa3a7
Update retries that weren't using retry.R (#6146) 2019-07-16 14:47:45 -06:00
R.B. Boyer 1cc6d07d0f
add test for discovery chain agent cache-type (#6130) 2019-07-15 10:09:52 -05:00
Jack Pearkes fa15914813 Merge branch 'master' into release/1-6 2019-07-12 14:51:25 -07:00
Matt Keeler 3914ec5c62
Various Gateway Fixes (#6093)
* Ensure the mesh gateway configuration comes back in the api within each upstream

* Add a test for the MeshGatewayConfig in the ToAPI functions

* Ensure we don’t use gateways for dc local connections

* Update the svc kind index for deletions

* Replace the proxycfg.state cache with an interface for testing

Also start implementing proxycfg state testing.

* Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.
2019-07-12 17:19:37 -04:00