Commit Graph

4004 Commits

Author SHA1 Message Date
kevinkengne 18a12d6517
add completeness test for types with CacheInfo method (#9480)
include all fields when fuzzing in tests
split tests by struct type

Ensure the new value for the field is different

fuzzer.Fuzz could produce the same value again in some cases.

Use a custom fuzz function for QueryOptions. That type is an embedded struct in the request types
but only one of the fields is important to include in the cache key.

Move enterpriseMetaField to an oss file so that we can change it in enterprise.
2021-01-12 19:45:46 -05:00
Chris Piraino baad708929
Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 15:31:47 -06:00
hashicorp-ci 5b92255cf5 auto-updated agent/uiserver/bindata_assetfs.go from commit b86eea4fb 2021-01-12 14:57:52 +00:00
Daniel Nephin 51464763f4 config: remove Builder receiver from funcs that dont use it
This change allows us to re-use these functions in other places without the Builder, and makes it
more explicit about which functions can warn/error and which can not.
2021-01-11 17:41:58 -05:00
Daniel Nephin 78b02663e9 config: Use golden for TestRuntimeConfig_Sanitize
A golden file makes the expected value easier to work with. This change also
removes a number of shims for enterprise and replaces them with a single one
for the golden filename.
2021-01-11 14:34:03 -05:00
Pierre Souchay 70d2da7582
Display a warning when rpc.enable_streaming = true is set on a client (#9530)
* Display a warning when rpc.enable_streaming = true is set on a client

This option has no effect when running as an agent

* Added warning when server starts with use_streaming_backend but without rpc.enable_streaming

* Added unit test
2021-01-08 15:23:23 -05:00
Chris Piraino 2eac571276
Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-01-08 14:03:06 -06:00
Daniel Nephin 6094b73d89
Merge pull request #9250 from hashicorp/dnephin/config-tags-2
config: remove unnecessary json/hcl struct tags
2021-01-08 14:08:34 -05:00
Daniel Nephin d5bdc2f539
Merge pull request #9249 from hashicorp/dnephin/config-tags
config: use fields to detect enterprise-only settings
2021-01-07 19:49:29 -05:00
Daniel Nephin f3b30a7ed6 config: remove unnecessary json/hcl struct tags
Using find/replace with regex

        json:"[^"]+"
        hcl:"[^"]+"
2021-01-07 19:42:56 -05:00
Daniel Nephin ee00d7668c config: Use config fields to warn about enterprise settings
It is no safe to assumes that the mapstructure keys will contain all the keys because some config can be specified
with command line flags or literals.

This change allows us to remove the json marshal/unmarshal cycle for command line flags, which will allow
us to remove all of the hcl/json struct tags on config fields.
2021-01-07 19:31:39 -05:00
Daniel Nephin 0679a23921
Merge pull request #9251 from hashicorp/dnephin/config-cleanup-flags-and-opts
config: move testing shims and cleanup some defaults
2021-01-07 19:11:22 -05:00
Daniel Nephin 942ffa7e02 config: use LiteralConfig for flag values 2021-01-07 18:57:53 -05:00
Daniel Nephin 2d0f33591e config: remove t.Parallel and some unused fields from tests
Use go-cmp for better diffs in tests
2021-01-07 18:57:52 -05:00
Daniel Nephin 31b1addd9e structs: add tests for String() methods
To show that printing one of these IDs works properly now that the String() method
receiver is no longer a pointer.
2021-01-07 18:47:38 -05:00
Daniel Nephin 45f0afcbf4 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Daniel Nephin bcfb444a77 Remove an unnecessary else 2021-01-07 18:13:49 -05:00
Daniel Nephin f6543b1651 xds: remove Server.Initialize
Requiring a call to initialize to set a single field is not really substantially different
from having to set that field to a value.
2021-01-07 18:13:48 -05:00
Daniel Nephin bbf1a116f6 xds: Fix data race
TestEnvoy.Close used e.stream.recvCh == nil to indicate the channel had already
been closed, so that TestEnvoy.Close can be called multiple times. The recvCh
was not protected by a lock, so setting it to nil caused a data race with any
goroutine trying to read from the channel.

Instead set the stream to nil. The stream is guarded by a lock, so it does not race.

This change allows us to test the agent/xds package using -race.
2021-01-07 18:13:48 -05:00
Daniel Nephin de226f26e4 xds: Pass in logger
small cleanup in tests
2021-01-07 18:13:48 -05:00
hashicorp-ci 7c5ca27fdd auto-updated agent/uiserver/bindata_assetfs.go from commit e893ba7ea 2021-01-07 19:09:58 +00:00
Daniel Nephin 27c38bfebb
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
Pierre Souchay c43888c064 Added testing of GRPC with TLS combinations
This ensures that https://github.com/hashicorp/consul/issues/9474 will
not reproduce.
2021-01-06 22:20:23 +01:00
Pierre Souchay 542852786c [Streaming][bugfix] handle TLS signalisation when TLS is disabled on client side
Tnis is an alternative to https://github.com/hashicorp/consul/pull/9494
2021-01-06 17:24:58 +01:00
R.B. Boyer db62541676
acl: use the presence of a management policy in the state store as a sign that we already migrated to v2 acls (#9505)
This way we only have to wait for the serf barrier to pass once before
we can upgrade to v2 acls. Without this patch every restart needs to
re-compute the change, and potentially if a stray older node joins after
a migration it might regress back to v1 mode which would be problematic.
2021-01-05 17:04:27 -06:00
hashicorp-ci 8fc77877be auto-updated agent/uiserver/bindata_assetfs.go from commit 1304dc882 2021-01-05 17:47:53 +00:00
Daniel Nephin 3140c0a343
Merge pull request #9067 from naemono/6074-allow-config-MaxHeaderBytes
Adds option to configure HTTP Server's MaxHeaderBytes
2021-01-05 12:28:27 -05:00
hashicorp-ci e64b90c604 auto-updated agent/uiserver/bindata_assetfs.go from commit a42e844cc 2021-01-05 17:09:19 +00:00
hashicorp-ci 988609fb61 auto-updated agent/uiserver/bindata_assetfs.go from commit 17438020f 2021-01-05 10:11:12 +00:00
Michael Montgomery 5a70c2c7e8 Remove unneeded test 2021-01-04 19:47:13 -06:00
Matt Keeler 3a79b559f9
Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487)
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
2021-01-04 14:05:23 -05:00
hashicorp-ci 1e58b31098 auto-updated agent/uiserver/bindata_assetfs.go from commit 8c9d5ecc2 2021-01-04 18:36:22 +00:00
R.B. Boyer 42dea6f01e
server: deletions of intentions by name using the intention API is now idempotent (#9278)
Restoring a behavior inadvertently changed while fixing #9254
2021-01-04 11:27:00 -06:00
hashicorp-ci 091963c7ea auto-updated agent/uiserver/bindata_assetfs.go from commit 8c0473a62 2021-01-04 16:52:35 +00:00
Michael Montgomery a1748aa2cb Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-12-30 14:14:05 -06:00
Michael Montgomery 519f537b8b Fixed failing tests
Removed use of `NewTestAgent`, per review comment
Removed CLI flag, per review comment
Updated website documentation
Added changelog entry
2020-12-30 14:09:50 -06:00
Daniel Nephin 088831c91e Maybe fix another data race in a test 2020-12-22 18:53:54 -05:00
Daniel Nephin d0f2eca8de Fix one race caused by t.Parallel 2020-12-22 18:27:18 -05:00
hashicorp-ci 5786746ed5 auto-updated agent/uiserver/bindata_assetfs.go from commit 99f102705 2020-12-18 10:43:59 +00:00
hashicorp-ci 098ca1f567 auto-updated agent/uiserver/bindata_assetfs.go from commit 25d6a1277 2020-12-18 09:07:19 +00:00
hashicorp-ci 87b547cfb3 auto-updated agent/uiserver/bindata_assetfs.go from commit c7d917777 2020-12-17 19:01:42 +00:00
hashicorp-ci d302cf769c auto-updated agent/uiserver/bindata_assetfs.go from commit 2e3a66efb 2020-12-17 16:39:34 +00:00
hashicorp-ci 325bca338b auto-updated agent/uiserver/bindata_assetfs.go from commit 635cf4dc9 2020-12-17 16:08:30 +00:00
hashicorp-ci 33fe41ac33 auto-updated agent/uiserver/bindata_assetfs.go from commit 66cc91c69 2020-12-17 15:30:54 +00:00
hashicorp-ci 9ad738b0bf auto-updated agent/uiserver/bindata_assetfs.go from commit 0ca54c608 2020-12-16 16:46:08 +00:00
hashicorp-ci d600dc85ea auto-updated agent/uiserver/bindata_assetfs.go from commit 4404b4f44 2020-12-16 09:23:22 +00:00
hashicorp-ci fff6a07022 auto-updated agent/uiserver/bindata_assetfs.go from commit a919b60f5 2020-12-15 19:36:10 +00:00
hashicorp-ci b7c34f4b71 auto-updated agent/uiserver/bindata_assetfs.go from commit e921b3cf9 2020-12-15 18:33:36 +00:00
hashicorp-ci d0dffb1697 auto-updated agent/uiserver/bindata_assetfs.go from commit 5e150d7f0 2020-12-15 18:19:30 +00:00
John Cowen d3d4c1452a
api: Ensure the internal/ui/service endpoint responds with an array (#9397)
In some circumstances this endpoint will have no results in it (dues to
ACLs, Namespaces or filtering).

This ensures that the response is at least an empty array (`[]`) rather
than `null`
2020-12-15 16:52:00 +00:00
hashicorp-ci 50bad6e642 auto-updated agent/uiserver/bindata_assetfs.go from commit 9f0787197 2020-12-15 16:37:21 +00:00
hashicorp-ci cc3ebef5df auto-updated agent/uiserver/bindata_assetfs.go from commit f111d6b3e 2020-12-15 15:40:26 +00:00
hashicorp-ci de315911b4 auto-updated agent/uiserver/bindata_assetfs.go from commit 14d043e5f 2020-12-14 15:33:47 +00:00
hashicorp-ci 88b34b3996 auto-updated agent/uiserver/bindata_assetfs.go from commit 4e419b9b3 2020-12-14 14:31:06 +00:00
Daniel Nephin b58401480b http: Check HTTPUseCache in a single place
HTTPUseCache is only used is a gate for allowing QueryOptions.UseCache to be enabled. By
moving it to the place where the query options are set, this behaviour is more obvious.

Also remove parseInternal which was an alias for parse.
2020-12-11 14:03:47 -05:00
Daniel Nephin c66a63275f
Merge pull request #9340 from hashicorp/dnephin/skip-slow-tests-with-short
testing: skip slow tests with -short
2020-12-11 13:33:44 -05:00
hashicorp-ci 91c244a830 auto-updated agent/uiserver/bindata_assetfs.go from commit 514270a41 2020-12-11 11:48:26 +00:00
hashicorp-ci 71cd7c6a16 auto-updated agent/uiserver/bindata_assetfs.go from commit 0f7c909f1 2020-12-11 09:44:56 +00:00
hashicorp-ci f827deb8a7 auto-updated agent/uiserver/bindata_assetfs.go from commit 21e5a8f0f 2020-12-11 09:38:09 +00:00
R.B. Boyer f9dcaf7f6b
acl: global tokens created by auth methods now correctly replicate to secondary datacenters (#9351)
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
2020-12-09 15:22:29 -06:00
hashicorp-ci 6e08ef263e auto-updated agent/uiserver/bindata_assetfs.go from commit 9d8131907 2020-12-09 19:17:30 +00:00
hashicorp-ci 52ca0f2a2b auto-updated agent/uiserver/bindata_assetfs.go from commit a78566e2d 2020-12-09 18:54:02 +00:00
hashicorp-ci bfaf17fa05 auto-updated agent/uiserver/bindata_assetfs.go from commit d6cb2b0d7 2020-12-09 18:45:39 +00:00
Kenia 8bda36c9f4
Create consul version metric with version label (#9350)
* create consul version metric with version label

* agent/agent.go: add pre-release Version as well as label

Co-Authored-By: Radha13 <kumari.radha3@gmail.com>

* verion and pre-release version labels.

* hyphen/- breaks prometheus

* Add Prometheus gauge defintion for version metric

* Add new metric to telemetry docs

Co-authored-by: Radha Kumari <kumari.radha3@gmail.com>
Co-authored-by: Aestek <thib.gilles@gmail.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-12-09 09:16:53 -05:00
hashicorp-ci 3f2199d323 auto-updated agent/uiserver/bindata_assetfs.go from commit 27c74f714 2020-12-09 13:12:32 +00:00
hashicorp-ci cdfae9ce63 auto-updated agent/uiserver/bindata_assetfs.go from commit 613be01f4 2020-12-09 09:29:17 +00:00
hashicorp-ci 37613345f2 auto-updated agent/uiserver/bindata_assetfs.go from commit db5283ee2 2020-12-08 15:53:31 +00:00
hashicorp-ci 535ef33bac auto-updated agent/uiserver/bindata_assetfs.go from commit 3be03029f 2020-12-08 09:32:32 +00:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
hashicorp-ci d4478782ae auto-updated agent/uiserver/bindata_assetfs.go from commit 4dfa7622d 2020-12-07 09:24:00 +00:00
hashicorp-ci b734cf54ed auto-updated agent/uiserver/bindata_assetfs.go from commit adbd6c0c8 2020-12-03 09:19:50 +00:00
Kyle Havlovitz 57210a59c3 connect: Fix a case where the active root would get unset even when there wasn't a new one 2020-12-02 11:42:23 -08:00
hashicorp-ci 4ee15914b0 auto-updated agent/uiserver/bindata_assetfs.go from commit e23b5b003 2020-12-02 15:53:16 +00:00
hashicorp-ci a814338374 auto-updated agent/uiserver/bindata_assetfs.go from commit 9ac7bc180 2020-12-02 15:46:59 +00:00
hashicorp-ci 297210ad93 auto-updated agent/uiserver/bindata_assetfs.go from commit a5b9ada9a 2020-12-02 09:49:40 +00:00
hashicorp-ci af9687e0b8 auto-updated agent/uiserver/bindata_assetfs.go from commit cf38309f6 2020-12-01 15:49:06 +00:00
Kyle Havlovitz 91d5d6c586
Merge pull request #9009 from hashicorp/update-secondary-ca
connect: Fix an issue with updating CA config in a secondary datacenter
2020-11-30 14:49:28 -08:00
Kyle Havlovitz c5167cf9c4 Use a buffered channel for CA intermediate renew func 2020-11-30 14:37:24 -08:00
hashicorp-ci 149e1e5f13 auto-updated agent/uiserver/bindata_assetfs.go from commit afe0f2614 2020-11-30 18:47:37 +00:00
hashicorp-ci 45f7de452f auto-updated agent/uiserver/bindata_assetfs.go from commit b5abbf122 2020-11-30 17:33:21 +00:00
hashicorp-ci 58797598dc auto-updated agent/uiserver/bindata_assetfs.go from commit d1ebe8c14 2020-11-30 17:27:35 +00:00
hashicorp-ci c03baa7b57 auto-updated agent/uiserver/bindata_assetfs.go from commit f46ef3e3f 2020-11-30 17:07:25 +00:00
hashicorp-ci 4801228104 auto-updated agent/uiserver/bindata_assetfs.go from commit a59a2f860 2020-11-30 16:57:34 +00:00
hashicorp-ci 2fd62ba8de auto-updated agent/uiserver/bindata_assetfs.go from commit 9cf30e74e 2020-11-30 15:09:43 +00:00
Daniel Nephin 17a86be022
Merge pull request #9284 from hashicorp/dnephin/agent-service-register
local: mark service as InSync when added to local agent state
2020-11-27 15:49:55 -05:00
Daniel Nephin 4c5fab6e00 local: mark service and checks as InSync when added
If the existing service and checks are the same as the new registration.
2020-11-27 15:31:12 -05:00
Hans Hasselberg 8c5c6e77ec
fix serf_wan documentation (#9289)
WAN config is different than LAN config, source of truth is
f72d2042a8/config.go (L315-L326)
and now the docs are correct.
2020-11-27 20:49:43 +01:00
hashicorp-ci b6e469e1e9 auto-updated agent/uiserver/bindata_assetfs.go from commit 408174f3b 2020-11-27 15:45:17 +00:00
Daniel Nephin 813f0d552d
Merge pull request #9247 from pierresouchay/streaming_predictible_order_for_health
[Streaming] Predictable order for results of /health/service/:serviceName to mimic memdb
2020-11-25 15:53:18 -05:00
Pierre Souchay 09673426e3 Applied suggestions from @dnephin
* Renamed `cachedHealResultSorter` into `sortCheckServiceNodes`
* Use `<` instead of `strings.Compare`
* Single line comparison in unit test
2020-11-25 21:40:51 +01:00
R.B. Boyer 6d6b6c15c6
server: fix panic when deleting a non existent intention (#9254)
* server: fix panic when deleting a non existent intention

* add changelog

* Always return an error when deleting non-existent ixn

Co-authored-by: freddygv <gh@freddygv.xyz>
2020-11-24 13:44:20 -05:00
hashicorp-ci 293360339b auto-updated agent/uiserver/bindata_assetfs.go from commit 6f8b5acbe 2020-11-24 17:51:46 +00:00
hashicorp-ci 4039a19ed3 auto-updated agent/uiserver/bindata_assetfs.go from commit 9c3c7bcf3 2020-11-24 14:38:24 +00:00
Hans Hasselberg 25f9e232af add missing descriptions for metrics 2020-11-23 22:06:30 +01:00
Kit Patella 7a8844ccce add entries for missing fsm operations and mark duplicated metrics prefixes as deprecated 2020-11-23 12:42:51 -08:00
Daniel Nephin 1987a1eca0 config: remove unused const 2020-11-20 19:17:12 -05:00
Kyle Havlovitz a01f853aa5 Clean up the logic in persistNewRootAndConfig 2020-11-20 15:54:44 -08:00
Daniel Nephin 8e783fb37b config: move testing shims to BuilderOpts
And remove the devMode field from builder.

This change helps make the Builder state more explicit by moving inputs to the BuilderOps struct,
leaving only fields that can change during Builder.Build on the Builder struct.
2020-11-20 18:31:10 -05:00
Daniel Nephin fcbdfa393e config: Use LiteralSource for some defaults
Using the LiteralSource makes it much easier to find default values, because an IDE reports
the location of a default. With an HCL string they are harder to discover.

Also removes unnecessary mapstructure.Decodes of constant values.
2020-11-20 18:14:17 -05:00
Kit Patella 3ea27d75e4
Merge pull request #9245 from hashicorp/telemetry/fix-missing-and-stale-docs
Telemetry/fix missing and stale docs
2020-11-20 12:54:29 -08:00
Pierre Souchay 9239df6dbd [Streaming] Predictable order for results of /health/service/:serviceName to mimic memdb
This ensures the result is consitent with/witout streaming

Will partially fix #9239
2020-11-20 16:23:35 +01:00
Michael Montgomery ed719c978b Merge branch 'master' into 6074-allow-config-MaxHeaderBytes 2020-11-20 07:43:53 -06:00
Kyle Havlovitz 26a9c985c5 Add CA server delegate interface for testing 2020-11-19 20:08:06 -08:00
Kit Patella 4ad076207e add telemetry and definition help entries for missing catalog and acl metrics 2020-11-19 13:29:44 -08:00
R.B. Boyer 7bcbc59dea
command: when generating envoy bootstrap configs use the datacenter returned from the agent services endpoint (#9229)
Fixes #9215
2020-11-19 15:27:31 -06:00
Kit Patella 46205bbf27 remove stale entries and rename/define acl.resolveToken 2020-11-19 13:06:28 -08:00
hashicorp-ci 22a0ab69ae auto-updated agent/uiserver/bindata_assetfs.go from commit d913af2bb 2020-11-19 18:45:01 +00:00
Freddy e4e306210a
Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 10:14:48 -07:00
hashicorp-ci 293ba9e0b5 auto-updated agent/uiserver/bindata_assetfs.go from commit 687ce1f9c 2020-11-19 16:13:04 +00:00
Daniel Nephin 35c5f83ea3
Merge pull request #9224 from hashicorp/dnephin/fix-multiple-http-listeners
agent: fix bug with multiple listeners
2020-11-18 16:52:29 -05:00
Daniel Nephin 8647483605 Use freeport
To prevent other tests which already use freeport from flaking when port 0 steals their reserved port.
2020-11-18 16:07:34 -05:00
hashicorp-ci 75a1727b31 auto-updated agent/uiserver/bindata_assetfs.go from commit 591a96d5b 2020-11-18 19:07:25 +00:00
hashicorp-ci fc07c63974 auto-updated agent/uiserver/bindata_assetfs.go from commit 1edef424a 2020-11-18 19:00:19 +00:00
Daniel Nephin fed2a61dfc agent: fix bug with multiple listeners
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.

This problem is fixed by capturing the variable by passing it into a
function.
2020-11-18 13:03:29 -05:00
hashicorp-ci 393d83dfa3 auto-updated agent/uiserver/bindata_assetfs.go from commit 664f1d9aa 2020-11-18 11:17:06 +00:00
Kyle Havlovitz c8d4a40a87 connect: update some function comments in CA manager 2020-11-17 16:00:19 -08:00
Daniel Nephin b9306d8827 acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Daniel Nephin 9e7c8dd19d Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Daniel Nephin d9af48afce
Merge pull request #9160 from hashicorp/dnephin/go-test-race-in-to-out-list
ci: change go-test-race package list to exclude list
2020-11-17 13:13:38 -05:00
Matt Keeler 4bca029be9
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Kit Patella 4dfcdbab26
Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-16 15:54:50 -08:00
hashicorp-ci 95fa102195 auto-updated agent/uiserver/bindata_assetfs.go from commit fe728855e 2020-11-16 23:41:31 +00:00
Freddy 2763833d32
Add DC and NS support for Envoy metrics (#9207)
This PR updates the tags that we generate for Envoy stats.

Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
2020-11-16 16:37:19 -07:00
Kit Patella 36aaf86647 Merge branch 'mkcp/telemetry/add-all-metric-definitions' of ssh://github.com/hashicorp/consul into mkcp/telemetry/add-all-metric-definitions 2020-11-16 15:26:12 -08:00
Kit Patella 4c30ebbb73 fix some tests that were broken from the TelemetryConfig change 2020-11-16 15:22:36 -08:00
Kit Patella 7ec3ad5b73
linting: sort and group import 2020-11-16 14:17:24 -08:00
Kit Patella 1f0b26c9d3 update runtime_test to handle PrometheusOpts expiry field change 2020-11-16 14:16:12 -08:00
Matt Keeler 197a37a860
Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 17:08:17 -05:00
Kit Patella 6290be054a use the MetricsPrefix to set the service name and provide as slice literal to avoid bugs from append modifying its first arg 2020-11-16 14:01:12 -08:00
Kit Patella 464d13d80b push prometheus sink definiitons into prometheus.PrometheusOpts 2020-11-16 12:44:47 -08:00
Daniel Nephin de88ceed1c
Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 14:20:07 -05:00
Kit Patella 0b18f5612e trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 374748dafc merge master 2020-11-16 10:46:53 -08:00
hashicorp-ci 42641671b3 auto-updated agent/uiserver/bindata_assetfs.go from commit 959974e96 2020-11-16 15:27:40 +00:00
Kit Patella af719981f3 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kyle Havlovitz 0a86533e20 Reorganize some CA manager code for correctness/readability 2020-11-13 14:46:01 -08:00
Kyle Havlovitz 5de81c1375 connect: Add CAManager for synchronizing CA operations 2020-11-13 14:33:44 -08:00
Kyle Havlovitz 0b4876f906 connect: Add logic for updating secondary DC intermediate on config set 2020-11-13 14:33:44 -08:00
R.B. Boyer db1184c094
server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 16:19:12 -06:00
Kit Patella b486c1bce8 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
R.B. Boyer e323014faf
server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 14:42:21 -06:00
R.B. Boyer 6300abed18
server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 13:56:41 -06:00
Mike Morris a343365da7
ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 13:02:59 -05:00
hashicorp-ci 45a8bc9472 auto-updated agent/uiserver/bindata_assetfs.go from commit 1059a51a3 2020-11-13 16:00:39 +00:00
hashicorp-ci b8d6e195ed auto-updated agent/uiserver/bindata_assetfs.go from commit 78b704be8 2020-11-13 15:44:14 +00:00
R.B. Boyer 758384893d
server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 09:15:39 -06:00
Kit Patella 9533372ded first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
Daniel Nephin 70093be98c
Merge pull request #9162 from hashicorp/dnephin/fix-grpc-metrics
grpc: fix metrics
2020-11-12 17:03:01 -05:00
hashicorp-ci dc2cb412b8 auto-updated agent/uiserver/bindata_assetfs.go from commit 6b2970402 2020-11-12 18:49:48 +00:00
R.B. Boyer a5bd1ba323
agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 10:38:32 -06:00
hashicorp-ci c6dd87c79e auto-updated agent/uiserver/bindata_assetfs.go from commit 7243f1f4f 2020-11-12 15:45:53 +00:00
Daniel Nephin a7fec642fc ci: go-test-race switch to exclude list
Most packages should pass the race detector. An exclude list ensures
that new packages are automatically tested with -race.

Also fix a couple small test races to allow more packages to be tested.

Returning readyCh requires a lock because it can be set to nil, and
setting it to nil will race without the lock.

Move the TestServer.Listening calls around so that they properly guard
setting TestServer.l. Otherwise it races.

Remove t.Parallel in a small package. The entire package tests run in a
few seconds, so t.Parallel does very little.

In auto-config, wait for the AutoConfig.run goroutine to stop before
calling readPersistedAutoConfig. Without this change there was a data
race on reading ac.config.
2020-11-11 14:44:57 -05:00
Daniel Nephin 956bff398a grpc: fix grpc metrics
defaultMetrics was being set at package import time, which meant that it received an instance of
the original default. But lib/telemetry.InitTelemetry sets a new global when it is called.

This resulted in the metrics being sent nowhere.

This commit changes defaultMetrics to be a function, so it will return the global instance when
called. Since it is called after InitTelemetry it will return the correct metrics instance.
2020-11-11 14:27:25 -05:00
Matt Keeler 58f98db227
Add a CLI command for retrieving the autopilot configuration. (#9142) 2020-11-11 13:19:02 -05:00
hashicorp-ci aac3729a6c auto-updated agent/uiserver/bindata_assetfs.go from commit 6423a2c10 2020-11-11 17:03:36 +00:00
Matt Keeler 2badb01d30
Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 11:21:42 -05:00
hashicorp-ci 4c2ca3a516 auto-updated agent/uiserver/bindata_assetfs.go from commit e1d977138 2020-11-11 14:48:38 +00:00
Daniel Nephin e033ad896a
Merge pull request #9149 from joel0/wrap-errors
Use error wrapping to preserve error type info
2020-11-10 18:27:08 -05:00
Daniel Nephin 386eb567f9
Merge pull request #8976 from joel0/wrap-eof
Wrap rpc error object
2020-11-10 17:04:11 -05:00
Joel May 87747ecd34 Use error wrapping to preserve error type info 2020-11-10 21:50:09 +00:00
hashicorp-ci 9fdbc68c92 auto-updated agent/uiserver/bindata_assetfs.go from commit e18d8e299 2020-11-10 16:37:33 +00:00
hashicorp-ci 892fa384fe auto-updated agent/uiserver/bindata_assetfs.go from commit fb6202929 2020-11-10 14:42:02 +00:00
hashicorp-ci 5626983031 auto-updated agent/uiserver/bindata_assetfs.go from commit c8e40ee0d 2020-11-09 17:34:25 +00:00
Matt Keeler 1f40f51a58
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
Matt Keeler 755fb72994
Switch to using the external autopilot module 2020-11-09 09:22:11 -05:00
hashicorp-ci 901df77971 auto-updated agent/uiserver/bindata_assetfs.go from commit 5c0ec13fb 2020-11-09 09:31:52 +00:00
hashicorp-ci eaafa5c17d auto-updated agent/uiserver/bindata_assetfs.go from commit d9672bca8 2020-11-09 09:19:52 +00:00
Mike Morris 9ccb340893
chore: upgrade to gopsutil/v3 (#9118)
* deps: update golang.org/x/sys

* deps: update imports to gopsutil/v3

* chore: make update-vendor
2020-11-06 20:48:38 -05:00
Daniel Nephin e4a78c977d stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
R.B. Boyer 9b37ea7dcb
Revert "Add namespace support for metrics (OSS) (#9117)" (#9124)
This reverts commit 06b3b017d326853dbb53bc0ec08ce371265c5ce9.
2020-11-06 10:24:32 -06:00
hashicorp-ci 4db32dd6c5 auto-updated agent/uiserver/bindata_assetfs.go from commit 3a68686cc 2020-11-06 15:04:29 +00:00
hashicorp-ci 6e87590a9a auto-updated agent/uiserver/bindata_assetfs.go from commit 848f72f66 2020-11-06 09:31:18 +00:00
Freddy 874efe705f
Add namespace support for metrics (OSS) (#9117) 2020-11-05 18:24:29 -07:00
Daniel Nephin 4fc073b1f4 stream: rename FilterByKey 2020-11-05 19:21:16 -05:00
Daniel Nephin d4cd2fa6a8 stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 8a26bca020 stream: move event filtering to PayloadEvents
Removes the weirdness around PayloadEvents.FilterByKey
2020-11-05 17:50:17 -05:00
Daniel Nephin dcacfd3548 stream: Remove unused method 2020-11-05 16:49:59 -05:00
R.B. Boyer be9122a8c4
agent: sanitize ui metrics proxy header values on agent/self endpoint (#9104) 2020-11-05 13:25:27 -06:00
Daniel Nephin 621f1db766
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin cd220e5d6c
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
hashicorp-ci 70fb9d94d2 auto-updated agent/uiserver/bindata_assetfs.go from commit 6ff094976 2020-11-05 19:12:03 +00:00
hashicorp-ci 17d1fa5f42 auto-updated agent/uiserver/bindata_assetfs.go from commit 1ef18c4b6 2020-11-05 16:10:14 +00:00
R.B. Boyer c2d167d06e
agent: protect the ui metrics proxy endpoint behind ACLs (#9099)
This ensures the metrics proxy endpoint is ACL protected behind a
wildcard `service:read` and `node:read` set of rules. For Consul
Enterprise these will need to span all namespaces:

```
service_prefix "" { policy = "read" }
node_prefix ""    { policy = "read" }

namespace_prefix "" {
  service_prefix "" { policy = "read" }
  node_prefix ""    { policy = "read" }
}
```

This PR contains just the backend changes. The frontend changes to
actually pass the consul token header to the proxy through the JS plugin
will come in another PR.
2020-11-04 12:50:03 -06:00
hashicorp-ci 6645dfcbb1 auto-updated agent/uiserver/bindata_assetfs.go from commit 0f6c0a5c1 2020-11-04 09:37:51 +00:00
hashicorp-ci ebe0ffce30 auto-updated agent/uiserver/bindata_assetfs.go from commit d5d4155e1 2020-11-03 14:14:58 +00:00
hashicorp-ci 0d8a30d8e3 auto-updated agent/uiserver/bindata_assetfs.go from commit 56c2ff56e 2020-11-02 18:43:31 +00:00
hashicorp-ci 0fa484ae8f auto-updated agent/uiserver/bindata_assetfs.go from commit bf32a1799 2020-11-02 16:11:45 +00:00
hashicorp-ci 052303f0fa auto-updated agent/uiserver/bindata_assetfs.go from commit 314eeda95 2020-11-02 14:40:27 +00:00
R.B. Boyer c8c87ec317
agent: introduce path allow list for requests going through the metrics proxy (#9059)
Added a new option `ui_config.metrics_proxy.path_allowlist`. This defaults to `["/api/v1/query", "/api/v1/query_range"]` when the metrics provider is set to `prometheus`.

Requests that do not use one of the allow-listed paths (via exact match) get a 403 Forbidden response instead.
2020-10-30 16:49:54 -05:00
Daniel Nephin 8a017c4f43 structs: add a namespace test for CheckServiceNode.CanRead 2020-10-30 15:07:04 -04:00
Daniel Nephin 78260952b0 cache-type: use namespace in tests
to verify that the namespace is passed through correctly to the server.
2020-10-30 15:07:04 -04:00
Daniel Nephin f6b629852f state: test EventPayloadCheckServiceNode.FilterByKey
Also fix a bug in that function when only one of key or namespace were the empty string.
2020-10-30 14:35:57 -04:00
Daniel Nephin 60df44df4f stream: Add tests for filterByKey with namespace
And fix a bug where a request with a Namespace but no Key would not be properly filtered
2020-10-30 14:35:42 -04:00
Daniel Nephin 318dfbe6e4 stream: Move FilterByKey events to a table
In preparation for adding new tests.
2020-10-30 14:35:28 -04:00
Daniel Nephin 2d0030da39 state: use enterprise meta for creating events 2020-10-30 14:34:04 -04:00
Daniel Nephin b57c7afcbb stream: include the namespace in the snap cache key
Otherwise the wrong snapshot could be returned when the same key is used in different namespaces
2020-10-30 14:34:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
hashicorp-ci e69d2c99cf auto-updated agent/uiserver/bindata_assetfs.go from commit cf2cfbaf2 2020-10-30 15:27:01 +00:00
R.B. Boyer 67a0d0c426
state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 15:28:31 -05:00
R.B. Boyer 78014653b3 restore prior signature of test helper so enterprise compiles 2020-10-29 13:52:15 -05:00
hashicorp-ci e970f9699f auto-updated agent/uiserver/bindata_assetfs.go from commit 1d6961248 2020-10-29 18:33:41 +00:00
Michael Montgomery 1c0a46849a Resolves #6074. Adds new option to configure HTTP Server's MaxHeaderBytes with option `-http-max-header-bytes`
Adds tests for behavior
2020-10-29 12:38:19 -05:00
Daniel Nephin 61ce0964a4 stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin 8ef4c0fcc5 state: use go-cmp for comparison
The output of the previous assertions made it impossible to debug the tests without code changes.

With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
2020-10-28 16:33:00 -04:00
Daniel Nephin c106d94742 proto: remove Event.Key field
The field is never used, and the value is available from the payload.
2020-10-28 16:33:00 -04:00
Daniel Nephin ab43236f86 proto: remove Event.Namespace field
All events are part of a single Topic, so we don't need this field.
2020-10-28 16:33:00 -04:00
Daniel Nephin 44da869ed4 stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin eea87e1acf store: use a ReadDB for snapshots
to remove the cyclic dependency between the snapshot handlers and the state.Store
2020-10-28 13:07:42 -04:00
Daniel Nephin cfe0ffde15
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin 03d2be03e7
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Kyle Havlovitz ecf0f0b2aa
Merge pull request #9053 from hashicorp/vault-token-lookupself
connect: Use the lookup-self endpoint for Vault token
2020-10-27 14:34:03 -07:00
Daniel Nephin 40cb72fe06 agent/grpc: add connection count metrics
Gauge metrics are great for understanding the current state, but can somtimes hide problems
if there are many disconnect/reconnects.

This commit adds counter metrics for connections and streams to make it easier to see the
count of newly created connections and streams.
2020-10-27 16:49:49 -04:00
Daniel Nephin 64284ed91a agent/grpc: rename metrics
These new names should make it easier to add counter metics with similar prefixes
2020-10-27 16:49:49 -04:00
Daniel Nephin 72430b9125
Merge pull request #8961 from hashicorp/dnephin/grpc-resolve-node-id
agent/grpc: fix some test flakes and handle duplicate server IDs in the pool
2020-10-27 16:47:37 -04:00
Daniel Nephin a0e017791f
Merge pull request #8998 from hashicorp/dnephin/lib-ttlcache
lib/ttlcache: extract a new package from agent/cache
2020-10-27 16:43:10 -04:00
Daniel Nephin 87a43b0ba7
Merge pull request #8987 from hashicorp/dnephin/stream-filter
streaming: apply filter to a single item
2020-10-27 16:39:43 -04:00
Daniel Nephin 6e34759442 agent/grpc: Add an integration test for ClientPool with TLS
Also deregister the resolver.Builder in tests.
2020-10-27 16:34:18 -04:00
Daniel Nephin 87793cd090 agent/grpc: pass metrics to constructor
Instead of referencing a package var. This does not fix the flaky test, but it seems more correct.
2020-10-27 16:34:17 -04:00
Daniel Nephin 70fea7a77e agent/grpc: fix a flaky test by performing more retries
Instead of using retry.Run, which appears to have problems in some cases where it does not
emit an error message, use a for loop.

Increase the number of attempts and remove any sleep, since this operation is not that expensive to do
in a tight loop
2020-10-27 16:34:17 -04:00
Daniel Nephin 9b89fb492d agent/grpc: remove misleading warnings from test output
Handle shutdown properly in tests so that the tests don't warn about using a closed connection.
2020-10-27 16:34:16 -04:00
Daniel Nephin 64105079d9 agent/grpc: fix a flake in TestHandler_EmitsStats 2020-10-27 16:34:16 -04:00
Daniel Nephin 7e338693a8 agent/grpc: use a separate channel for closing the Accept
Closing l.conns can lead to a race and a 'panic: send on closed chan' when a
connection is in the middle of being handled when the server is shutting down.

Found using '-race -count=800'
2020-10-27 16:34:15 -04:00
Daniel Nephin e640d47319 agent/grpc/resolver: namespace the server ID with the DC name
So that if two datacenters end up with overlapping serverIDs we don't send requests to the wrong server
2020-10-27 16:34:15 -04:00
Kyle Havlovitz 1c0608eeeb connect: Use the lookup-self endpoint for Vault token 2020-10-27 13:03:45 -07:00
hashicorp-ci e11b76ce80 auto-updated agent/uiserver/bindata_assetfs.go from commit f4208b5fb 2020-10-27 14:56:48 +00:00
hashicorp-ci e5b54d9be3 auto-updated agent/uiserver/bindata_assetfs.go from commit 30da884d5 2020-10-27 14:31:16 +00:00
hashicorp-ci 707ef19716 auto-updated agent/uiserver/bindata_assetfs.go from commit ed6a2c150 2020-10-27 14:09:17 +00:00
hashicorp-ci 338002054f auto-updated agent/uiserver/bindata_assetfs.go from commit 827e53694 2020-10-27 13:36:44 +00:00
hashicorp-ci 5ae5a4b800 auto-updated agent/uiserver/bindata_assetfs.go from commit 52d7283cd 2020-10-26 19:56:11 +00:00
Paul Banks df1eec292f
UI metrics provider dc (#9001)
* Plumb Datacenter and Namespace to metrics provider in preparation for them being usable.

* Move metrics loader/status to a new component and show reason for being disabled.

* Remove stray console.log

* Rebuild AssetFS to resolve conflicts

* Yarn upgrade

* mend
2020-10-26 19:48:23 +00:00
hashicorp-ci fea7f184ea auto-updated agent/uiserver/bindata_assetfs.go from commit c4f027fa0 2020-10-26 16:55:52 +00:00
Daniel Nephin 68a0fd3f8c streaming: disable streaming when requesting connect events
Until the correct events are created for terminating gateways.
2020-10-26 11:55:49 -04:00
Daniel Nephin abd8cfcfe9 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
hashicorp-ci 03212abfca auto-updated agent/uiserver/bindata_assetfs.go from commit 948917c6b 2020-10-26 09:34:58 +00:00
Daniel Nephin 1dcbfd17c3 health: change the name of UseStreamingBackend config
Remove it from the cache section, and update the docs.
2020-10-23 17:47:01 -04:00
Daniel Nephin 6650146a74 health: use streaming, even when cache=1 is not set 2020-10-23 17:39:55 -04:00
R.B. Boyer 0a80e82f21
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin f9b2834171 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin 26387cdc0e
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy d23038f94f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
hashicorp-ci 8fa2b814ad auto-updated agent/uiserver/bindata_assetfs.go from commit eedee07e7 2020-10-23 16:32:06 +00:00
hashicorp-ci d1a6880699 auto-updated agent/uiserver/bindata_assetfs.go from commit 7559f64d0 2020-10-23 14:45:43 +00:00
hashicorp-ci e174a0c274 auto-updated agent/uiserver/bindata_assetfs.go from commit 2a8a80638 2020-10-23 13:50:57 +00:00
hashicorp-ci 0b7ee99f23 auto-updated agent/uiserver/bindata_assetfs.go from commit 45554e4e9 2020-10-23 08:32:18 +00:00
R.B. Boyer 2183842f0e
connect: add support for envoy 1.16.0, drop support for 1.12.x, and bump point releases as well (#8944)
Supported versions will be: "1.16.0", "1.15.2", "1.14.5", "1.13.6"
2020-10-22 13:46:19 -05:00
Daniel Nephin 56050a1dd1
Merge pull request #8973 from hashicorp/dnephin/resolver-balance
agent/router: refactor calculation of delay between rebalances.
2020-10-22 13:52:10 -04:00
Daniel Nephin fb8b68a6ec stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Daniel Nephin 1d03a7a8a4
Merge pull request #8924 from ShimmerGlass/fix-sidecar-deregister-after-restart
Fix: service LocallyRegisteredAsSidecar property is not persisted
2020-10-22 13:26:55 -04:00
Daniel Nephin 193c89429d
Merge pull request #9004 from hashicorp/dnephin/dns-small-cleanup-1
dns: remove goto INVALID and a naked return
2020-10-21 18:41:11 -04:00
Daniel Nephin 19cffcb1f3
Merge pull request #8970 from hashicorp/dnephin/resolver-balance-test
agent/router: improve  the test for refreshServerRebalanceTimer
2020-10-21 18:40:32 -04:00
Daniel Nephin 1e1f963064
Merge pull request #8771 from amenzhinsky/fix-grpc-use-tls-mapping
Fix GRPCUseTLS flag HTTP API mapping
2020-10-21 18:37:11 -04:00
Daniel Nephin aa06ae91ce dns: remove goto INVALID and a naked return
This commit is one small step toward modernizing and making the DNSServer readable.
2020-10-21 15:16:03 -04:00
Daniel Nephin 09d62f1df0 lib/ttlcache: unexport key and additional godoc 2020-10-20 19:16:03 -04:00
Daniel Nephin 2601998766 lib/ttlcache: add a constant for NotIndexed 2020-10-20 19:10:20 -04:00
Daniel Nephin 0beaced90f cache: fix a bug with Prepopulate
Prepopulate was setting entry.Expiry.HeapIndex to 0. Previously this would result in a call to heap.Fix(0)
which wasn't correct, but was also not really a problem because at worse it would re-notify.

With the recent change to extract cachettl it was changed to call Update(idx), which would have updated
the wrong entry.

A previous commit removed the setting of entry.Expiry so that the HeapIndex would be reported
as -1, and this commit adds a test and handles the -1 heap index.
2020-10-20 19:10:20 -04:00
Daniel Nephin 9d5b738cdb lib/ttlcache: extract package from agent/cache 2020-10-20 19:10:20 -04:00
Daniel Nephin 909b8e674e cache: export ExpiryHeap
and hide internal methods on an unexported type, so that when it is extrated those methods are not exported.
2020-10-20 19:10:20 -04:00
Daniel Nephin d3742a1d0e cache: Refactor heap.notify to make it more explicit.
And remove duplicate notifications.

Instead of performing the check in the heap implementation, check the
index in the higher level interface (Add,Remove,Update) and notify if one
of the relevant indexes is 0.
2020-10-20 19:10:20 -04:00
Daniel Nephin a96646c562 cache: Move more of the expiryLoop into the Heap 2020-10-20 19:10:20 -04:00
Daniel Nephin b6f24c6554 cache: extract cache eviction heap
Start creating an interface that doesn't require using heap and hides more of the
entry internals.
2020-10-20 19:10:19 -04:00
Daniel Nephin 312a3bb9b1 streaming: apply filter to a single item
Instead of the whole map. This should save a lot of time performing reflecting on a large map.
The filter does not change, so there is no reason to re-apply it to older entries.
2020-10-19 18:24:02 -04:00
hashicorp-ci 5de8c073cc auto-updated agent/uiserver/bindata_assetfs.go from commit 2773daedc 2020-10-19 16:35:48 +00:00
hashicorp-ci 054483d5cf auto-updated agent/uiserver/bindata_assetfs.go from commit b24c2227d 2020-10-19 16:19:40 +00:00
hashicorp-ci f2d511b9e5 auto-updated agent/uiserver/bindata_assetfs.go from commit 02dcd422f 2020-10-19 14:50:52 +00:00
hashicorp-ci ef43c1a09e auto-updated agent/uiserver/bindata_assetfs.go from commit bf8bb7415 2020-10-19 14:25:54 +00:00
Joel May e603d3fd98 Wrap rpc error object 2020-10-16 00:42:05 +00:00
Daniel Nephin a3f922249e agent/router: refactor calculation of delay between rebalances.
This change attempts to make the delay logic more obvious by:

* remove indirection, inline a bunch of function calls
* move all the code and constants next to each other
* replace the two constant values with a single value
* reword the comments.
2020-10-15 15:59:36 -04:00
Daniel Nephin da1e45745c agent/router: Add bounds test cases 2020-10-15 14:43:29 -04:00
Daniel Nephin 9bbfc53685 router: organize the test by number of servers
And adddd some additional cases to show where the minimum value stops being used
2020-10-15 13:53:37 -04:00
Daniel Nephin 2ac2503604 router: make refreshServerRebalanceTimer test a lot more strict 2020-10-15 12:05:07 -04:00
hashicorp-ci d15d2b5b2c auto-updated agent/uiserver/bindata_assetfs.go from commit d08e13e5b 2020-10-15 14:49:01 +00:00
hashicorp-ci e5ace85562 auto-updated agent/uiserver/bindata_assetfs.go from commit 5fd79ba40 2020-10-14 08:07:29 +00:00
Mathilde Gilles a1ec792acc Fix: service LocallyRegisteredAsSidecar property is not persisted
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
2020-10-13 19:38:58 +02:00
hashicorp-ci 646e1f369c auto-updated agent/uiserver/bindata_assetfs.go from commit 7afcd7755 2020-10-13 16:43:31 +00:00
Mike Morris 528b0f4621 chore: regenerate bindata_assetfs 2020-10-12 15:44:33 -04:00
Mike Morris f2443e0745 Merge branch 'master' into release/1.9.0-beta1 2020-10-12 15:32:58 -04:00
hashicorp-ci d2f62d64cb
update bindata_assetfs.go 2020-10-12 18:30:30 +00:00
hashicorp-ci 72988bae34 auto-updated agent/uiserver/bindata_assetfs.go from commit 868cd47a6 2020-10-12 16:28:44 +00:00
Pierre Souchay 54f9f247f8
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
hashicorp-ci f58806fbab auto-updated agent/uiserver/bindata_assetfs.go from commit 851705e93 2020-10-09 21:01:02 +00:00
hashicorp-ci 89977d1467 auto-updated agent/uiserver/bindata_assetfs.go from commit 27048a061 2020-10-09 20:35:59 +00:00
Paul Banks 40695d5919
Add metrics rendering to the new topology view. (#8858)
* Remove unused StatsCard component

* Create Card and Stats contextual components with styling

* Send endpoint, item, and protocol to Stats as props

* WIP basic plumbing for metrics in Ember

* WIP metrics data source now works for different protocols and produces reasonable mock responses

* WIP sparkline component

* Mostly working metrics and graphs in topology

* Fix date in tooltip to actually be correct

* Clean up console.log

* Add loading frame and create a style sheet for Stats

* Various polish fixes:

 - Loading state for graph
 - Added fake latency cookie value to test loading
 - If metrics provider has no series/stats for the service show something that doesn't look broken
 - Graph hover works right to the edge now
 - Stats boxes now wrap so they are either shown or not as will fit not cut off
 - Graph resizes when browser window size changes
 - Some tweaks to number formats and stat metrics to make them more compact/useful

* Thread Protocol through topology model correctly

* Rebuild assetfs

* Fix failing tests and remove stats-card now it's changed and become different

* Fix merge conflict

* Update api doublt

* more merge fixes

* Add data-permission and id attr to Card

* Run JS linter

* Move things around so the tests run with everything available

* Get tests passing:

1. Remove fakeLatency setTimeout (will be replaced with CONSUL_LATENCY
in mocks)
2. Make sure any event handlers are removed

* Make sure the Consul/scripts are available before the app

* Make sure interval gets set if there is no cookie value

* Upgrade mocks so we can use CONSUL_LATENCY

* Fix handling of no series values from Prometheus

* Update assetfs and fix a comment

* Rebase and rebuild assetfs; fix tcp metric series units to be bits not bytes

* Rebuild assetfs

* Hide stats when provider is not configured

Co-authored-by: kenia <keniavalladarez@gmail.com>
Co-authored-by: John Cowen <jcowen@hashicorp.com>
2020-10-09 21:31:15 +01:00
s-christoff a62705101f
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00
Kyle Havlovitz 707f4a8d26 Stop intermediate renew routine on leader stop 2020-10-09 12:30:57 -07:00
Kyle Havlovitz 926a393a5c
Merge pull request #8784 from hashicorp/renew-intermediate-primary
connect: Enable renewing the intermediate cert in the primary DC
2020-10-09 12:18:59 -07:00
hashicorp-ci 2facfdf9a1 auto-updated agent/uiserver/bindata_assetfs.go from commit 766b28b8e 2020-10-09 18:44:15 +00:00
Daniel Nephin dd0e8d42c4
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Daniel Nephin 195511140f config: add field for enabling streaming in the client
agent: register the new streaming cache-type
2020-10-09 14:11:34 -04:00
Daniel Nephin c1b832c2e6
Merge pull request #8893 from hashicorp/dnephin/add-steps-to-subscribe-tests
subscribe: add steps to long test cases, and add new cases for converting Events
2020-10-09 13:54:59 -04:00
Daniel Nephin 1d41d78338
Merge pull request #8896 from hashicorp/dnephin/go-test-race-more-pkgs
ci: go test -race more packages
2020-10-09 13:48:56 -04:00
Chris Piraino 4f77f87065
Emit service usage metrics with correct labeling strategy (#8856)
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
2020-10-09 11:01:45 -05:00
Kyle Havlovitz 50543d678e Fix intermediate refresh test comments 2020-10-09 08:53:33 -07:00
Matt Keeler 44017f819e
Remove oss/ent divergence of runtime_test.go (#8903) 2020-10-09 10:45:11 -04:00
Matt Keeler 891d05fada
Add capability for the v1/connect/ca/roots endpoint to return a PEM encoded certificate chain (#8774)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-09 10:43:33 -04:00
R.B. Boyer d2f09ca306
upstream some differences from enterprise (#8902) 2020-10-09 09:42:53 -05:00
hashicorp-ci b55c4d90b2 auto-updated agent/uiserver/bindata_assetfs.go from commit b6f686fec 2020-10-09 13:37:35 +00:00
R.B. Boyer b4bf092db3
uiserver: upstream refactors done elsewhere (#8891) 2020-10-09 08:32:39 -05:00
Kyle Havlovitz 968fd8660d Update CI for leader renew CA test using Vault 2020-10-09 05:48:15 -07:00
Kyle Havlovitz 62270c3f9a
Merge branch 'master' into renew-intermediate-primary 2020-10-09 04:40:34 -07:00
Kyle Havlovitz b78f618beb connect: Check for expired root cert when cross-signing 2020-10-09 04:35:56 -07:00
Paul Banks fb87e218ab
Actually proxy the query string too 2020-10-09 12:27:24 +01:00
Daniel Nephin 6a8eac77af cache-types: skip tests with races 2020-10-08 20:15:13 -04:00
Daniel Nephin 022744699f grpc: fix data rate in stats handler test 2020-10-08 19:43:49 -04:00
Freddy 89d52f41c4
Add protocol to the topology endpoint response (#8868) 2020-10-08 17:31:54 -06:00
Daniel Nephin f06fd96d3a subscribe: add test cases for newEventFromStreamEvent 2020-10-08 18:48:17 -04:00
Kit Patella 40b9769b1f
Merge pull request #8877 from hashicorp/mkcp/telemetry/consul.api.http
Add flag for disabling 1.9 metrics backwards compatibility and warnings when set to default
2020-10-08 13:22:37 -07:00
R.B. Boyer 69af49441a
agent: allow the /v1/connect/intentions/match endpoint to use the agent cache (#8875)
This is the recommended proxy integration API for listing intentions
which should not require an active connection to the servers to resolve
after the initial cache filling.
2020-10-08 14:51:53 -05:00
Daniel Nephin ea95908f63 subscribe: Add steps to rpc/subscribe tests
To make them easier to follow
2020-10-08 15:38:01 -04:00
hashicorp-ci 143bfb7462 auto-updated agent/uiserver/bindata_assetfs.go from commit 13dfde75a 2020-10-08 19:36:48 +00:00
Matt Keeler 141eb60f06
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Paul Banks d9818ed1e0
Add /v1/internal/ui/metrics-proxy API endpoint that proxies to a configured metrics provider backend. 2020-10-08 17:32:29 +01:00
Paul Banks 42d5e75dc5
Fix merge conflicts 2020-10-08 17:31:36 +01:00
Daniel Nephin 05df7b18a9 config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Daniel Nephin 3483e2fb89 streaming: Use a shorter LastGetTTL for the cache 2020-10-08 12:11:20 -04:00
hashicorp-ci fdfa9c610c auto-updated agent/uiserver/bindata_assetfs.go from commit b373456c7 2020-10-08 15:56:05 +00:00
Freddy de4af766f3
Support ingress gateways in mesh viz endpoint (#8864)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-08 09:47:09 -06:00
hashicorp-ci 75847b0f11 auto-updated agent/uiserver/bindata_assetfs.go from commit ef7b1f8a4 2020-10-08 15:41:40 +00:00
hashicorp-ci 448febe0be auto-updated agent/uiserver/bindata_assetfs.go from commit d849f025c 2020-10-08 15:08:16 +00:00
Daniel Nephin 0b3f438703
Merge pull request #8841 from hashicorp/streaming/materialize-view-fix-service-unique-id
streaming: Use an ID that includes namespace to store services in the materialized view
2020-10-07 21:28:53 -04:00
Daniel Nephin a94fe054f0
Merge pull request #8809 from hashicorp/streaming/materialize-view
Add StreamingHealthServices cache-type
2020-10-07 21:26:38 -04:00
Daniel Nephin e0236b5a9f
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin 783627aeef
Merge pull request #8768 from hashicorp/streaming/add-subscribe-service
subscribe: add subscribe service for streaming change events
2020-10-07 21:24:03 -04:00
Freddy 7d1f50d2e6
Return intention info in svc topology endpoint (#8853) 2020-10-07 18:35:34 -06:00
Kit Patella 328036dd37 add config flag to disable 1.9 metrics backwards compatibility. Add warnings on start and reload on default value 2020-10-07 17:12:52 -07:00
Daniel Nephin eb6f2a8d72 structs: add CheckServiceNode.CanRead
And use it from the subscribe endpoint.
2020-10-07 18:15:13 -04:00
hashicorp-ci 7af06157eb auto-updated agent/uiserver/bindata_assetfs.go from commit 0d0de4309 2020-10-07 14:13:29 +00:00
hashicorp-ci 50efe14f3e auto-updated agent/uiserver/bindata_assetfs.go from commit a61b6c53d 2020-10-07 13:50:39 +00:00
hashicorp-ci a7a30d8f5d auto-updated agent/uiserver/bindata_assetfs.go from commit 222555c4c 2020-10-07 08:09:54 +00:00
R.B. Boyer 35c4efd220
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
Daniel Nephin dbfa6530f1 streaming: store services with a unique ID that includes namespace 2020-10-06 16:54:56 -04:00
R.B. Boyer d6dce2332a
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Daniel Nephin 83401194ab streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin bda19cb71e submatview: remove notifyUpdateLocked from reset
This call appears to only be necessary because reset() was called from
NewMaterializer.

This commit has the constructor set a default value for updateCh, and
removes both the call to reset() from New(), and the call to
notifyUpdateLocked() from reset().

This should ensure that we do not notify the Fetch() call before we have new
values to report.
2020-10-06 13:22:02 -04:00
Daniel Nephin e5813bd9d6 submatview: remove method receiver from handlers 2020-10-06 13:22:02 -04:00
Daniel Nephin f857aef4a8 submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin 58cf09247b submatview: refactor Materializer
Refactor of Materializer.Run
Use handlers to manage state in Materializer
Rename Materializer receiver
rename m.l to m.lock, and flip some conditionals to remove the negative.
Improve godoc, rename Deps, move resetErr, and pass err into notifyUpdate
Update for NewSnapshotToFollow events
Refactor to move context cancel out of Materializer
2020-10-06 13:22:02 -04:00
Daniel Nephin e8c7881196 submatview: Move the 'use materialize from result.State' logic
No need to do all this other work if we have one already.

This logic moved closer to this call site 3 times during the process
of refactoring.
2020-10-06 13:22:02 -04:00
Daniel Nephin 3bb252888b submatview: Move Materializer to submatview package 2020-10-06 13:22:02 -04:00
Daniel Nephin d24e243f70 submatview: Refactor MaterializeView
Replace InitFilter with Reset.
Removes the need to store a fatalErr and the cache-type, and removes the need to recreate the filter
each time.
Pass dependencies into MaterializedView.
Remove context from MaterializedView.
Rename state to view.
Rename MaterialziedView to Materialzier.
Rename to NewMaterializer
Pass in retry.Waiter
2020-10-06 13:22:02 -04:00
Daniel Nephin 50846a96ff cache-types: Update Streaming health cache-type
To use latest protobuf types
2020-10-06 13:22:02 -04:00
Daniel Nephin e5d37bdf23 agent/cache: Add cache-type and materialized view for streaming health
Extracted from d97412ce4c399a35b41bbdae2716f0e32dce80bf

Co-authored-by: Paul Banks <banks@banksco.de>
2020-10-06 13:21:57 -04:00
Daniel Nephin ad29cf4f94 stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin 3183b9ebb3 subscribe: update to use NewSnapshotToFollow event 2020-10-06 12:49:35 -04:00
Daniel Nephin fa115c6249 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00
Daniel Nephin 011109a6f6 subscirbe: extract streamID and logging from Subscribe
By extracting all of the tracing logic the core logic of the Subscribe
endpoint is much easier to read.
2020-10-06 12:49:35 -04:00
Daniel Nephin 4c4441997a subscribe: add integration test for acl token updates 2020-10-06 12:49:35 -04:00
Daniel Nephin 3deccdde56 subscribe: add integration test for filtering events by acl 2020-10-06 12:49:35 -04:00
Daniel Nephin cb6ce4ab0c subscribe: Add an integration test for forward to DC 2020-10-06 12:49:35 -04:00
Daniel Nephin f5f4a3d803 subscribe: add first integration test for Server 2020-10-06 12:49:35 -04:00
Daniel Nephin 371ec2d70a subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin 2f6c98ee0e subscribe: add commented out test cases
Co-authored-by: Paul Banks <banks@banksco.de>
2020-10-06 12:49:35 -04:00
Daniel Nephin f73432be66 subscribe: add initial impl from streaming-rpc-final branch
Co-authored-by: Paul Banks <banks@banksco.de>
2020-10-06 12:49:35 -04:00
Daniel Nephin ae433947a4
Merge pull request #8799 from hashicorp/streaming/rename-framing-events
stream: remove EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-06 12:42:58 -04:00
Daniel Nephin d5ac6e8f09
Merge pull request #8806 from hashicorp/dnephin/service-health-interface
rpcclient: Add health.Client and use it in http and dns
2020-10-06 12:13:09 -04:00
R.B. Boyer a77b518542
server: create new memdb table for storing system metadata (#8703)
This adds a new very tiny memdb table and corresponding raft operation
for updating a very small effective map[string]string collection of
"system metadata". This can persistently record a fact about the Consul
state machine itself.

The first use of this feature will come in a later PR.
2020-10-06 10:08:37 -05:00
hashicorp-ci c945bbc6f5 auto-updated agent/uiserver/bindata_assetfs.go from commit a7038b48c 2020-10-06 13:43:27 +00:00
hashicorp-ci 6c78cc6eea auto-updated agent/uiserver/bindata_assetfs.go from commit 5a39be47d 2020-10-06 13:32:27 +00:00
hashicorp-ci 7d4d5cb521 auto-updated agent/uiserver/bindata_assetfs.go from commit 76d3909f3 2020-10-06 08:42:39 +00:00
hashicorp-ci 6d50d8a859 auto-updated agent/uiserver/bindata_assetfs.go from commit e4a0dcf10 2020-10-06 08:36:18 +00:00
Freddy f3d991fb8f
Add default meta to test assertion (#8824) 2020-10-05 15:00:25 -06:00
Daniel Nephin 2706cf9b2a
Merge pull request #8802 from hashicorp/dnephin/extract-lib-retry
lib/retry - extract a new package from lib/retry.go
2020-10-05 14:22:37 -04:00
hashicorp-ci 9eb769d242 auto-updated agent/uiserver/bindata_assetfs.go from commit f26201a7a 2020-10-05 17:12:56 +00:00
freddygv 82a17ccee6 Do not evaluate discovery chain for topology upstreams 2020-10-05 10:24:50 -06:00
freddygv 63c50e15bc Single DB txn for ServiceTopology and other PR comments 2020-10-05 10:24:50 -06:00
freddygv 263bd9dd92 Add topology HTTP endpoint 2020-10-05 10:24:50 -06:00
freddygv 7c11580e93 Add topology RPC endpoint 2020-10-05 10:24:50 -06:00
freddygv 21c4708fe9 Add topology ACL filter 2020-10-05 10:24:50 -06:00
freddygv ac54bf99b3 Add func to combine up+downstream queries 2020-10-05 10:24:50 -06:00
freddygv 160a6539d1 factor in discovery chain when querying up/downstreams 2020-10-05 10:24:50 -06:00
freddygv 214b25919f support querying upstreams/downstreams from registrations 2020-10-05 10:24:50 -06:00
freddygv 3653045cb0 Add method for downstreams from disco chain 2020-10-05 10:24:50 -06:00
hashicorp-ci 9a546f3170 auto-updated agent/uiserver/bindata_assetfs.go from commit 3a89c7d23 2020-10-05 14:07:13 +00:00
hashicorp-ci c652d2551d auto-updated agent/uiserver/bindata_assetfs.go from commit dfe57406d 2020-10-05 13:53:36 +00:00
hashicorp-ci 63ceb4f15e auto-updated agent/uiserver/bindata_assetfs.go from commit b871837ee 2020-10-05 13:14:40 +00:00
Daniel Nephin 04c5b9adf5 rpcclient: Add health.Client and use it in http and dns
This new package provides a client agent implementation of an interface
for fetching the health of services.

This approach has a number of benefits:

1. It provides a much more explicit interface. Instead of everything
   dependency on `RPC()` and `Cache.Get()` for many unrelated things
   they can depend on a type that are named according to the behaviour
   it provides.

2. It gives us a single place to vary the behaviour and migrate to
   a new form of RPC (gRPC). The current implementation has two options
   (cache, or direct RPC), and in the future we will have more.
   It is also a great opporunity to start adding `context.Context` args
   to these operations, which in the future will allow us to cancel
   the operations.

3. As a concequence of the first, in the Server agent where we make
   these calls we can replace the current in-memory RPC calls with
   a thin adapter for the real method. This removes the `net/rpc`
   machinery from the call in places where it is not needed.

This new package is quite small right now, but I think we can expect it
to grow to a more reasonable size as other RPC calls are replaced.

This change also happens to replace two very similar implementations with
a single implementation.
2020-10-04 18:55:02 -04:00
Daniel Nephin 40aac46cf4 lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin 0c7f9c72d7 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Kit Patella 9851a16d05 remove consul.api.http from filtered metric prefixes 2020-10-02 14:16:02 -07:00
Kit Patella 5524a43f10
Merge pull request #8271 from coignetp/http-metrics-label
Use method and path as labels for http metrics
2020-10-02 13:41:48 -07:00
hashicorp-ci 8e7643ef8e auto-updated agent/uiserver/bindata_assetfs.go from commit 8b409529a 2020-10-02 19:28:38 +00:00
Daniel Nephin 9c5181c897 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin 0769f54fe1 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 5ef630f664
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
Paul Banks 35d249e38f
Merge pull request #8694 from hashicorp/ui-config-metrics
Add config changes for UI metrics
2020-10-01 17:38:03 +01:00
Paul Banks 4c4c7266e6
Update all the references in CI and makefile to the bindata file location 2020-10-01 16:19:10 +01:00
R.B. Boyer e84d52ba3a
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer 4b525e5d16
agent: enable enable_central_service_config by default (#8746) 2020-10-01 09:19:14 -05:00
Paul Banks 72fcc8ec02
Fix ui dir where there is no index tests and lint issue. 2020-10-01 12:26:19 +01:00
Paul Banks a6c748ec1b
Refactor uiserver to separate package, cleaner Reloading 2020-10-01 11:32:25 +01:00
R.B. Boyer ccd0200bd9
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2956313f2d connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00
Paul Banks 89e539a00d
Fix reload test; address other PR feedback 2020-09-30 18:00:07 +01:00
Paul Banks 85d425801c
Fix JSON encoding of metrics options which broke the index but didn't break tests.
Also add tests that do catch that error.
2020-09-30 17:59:19 +01:00
Paul Banks 497a5d4e36
Add config changes for UI metrics 2020-09-30 17:59:16 +01:00
hashicorp-ci 575eae410b auto-updated agent/bindata_assetfs.go from commit 1a6f3d524 2020-09-30 15:28:06 +00:00
hashicorp-ci 0fd251d008 auto-updated agent/bindata_assetfs.go from commit 8e174cae6 2020-09-30 15:23:27 +00:00
hashicorp-ci 644dd068f5 auto-updated agent/bindata_assetfs.go from commit 823d6dadb 2020-09-30 15:17:41 +00:00
Aliaksandr Mianzhynski 191d9f5484 Fix GRPCUseTLS flag HTTP API mapping 2020-09-29 18:29:56 +03:00
freddygv ec6e8021c0 Resolve conflicts 2020-09-29 08:59:18 -06:00
Daniel Nephin d192b0a080 stream: move goroutine out of New
This change will make it easier to manage goroutine lifecycle from the caller.

Also expose EventPublisher from state.Store
2020-09-28 18:40:10 -04:00
Daniel Nephin e345c8d8a6 state: use pbsubscribe.Topic for topic values 2020-09-28 18:40:10 -04:00
Daniel Nephin 6e592ec485 state: rename and export EventPayload
The subscribe endpoint needs to be able to inspect the payload to filter
events, and convert them into the protobuf types.

Use the protobuf CatalogOp type for the operation field, for now. In the
future if we end up with multiple interfaces we should be able to remove
the protobuf dependency by changing this to an int32 and adding a test
for the mapping between the values.

Make the value of the payload a concrete type instead of interface{}. We
can create other payloads for other event types.
2020-09-28 18:34:30 -04:00
Daniel Nephin e1855afbe1
Merge pull request #8726 from amenzhinsky/grpc-hc-error
Return grpc serving status in health check errors
2020-09-25 13:24:32 -04:00
Hans Hasselberg e5419bc04c
fix ent error (#8750) 2020-09-25 10:31:42 -05:00
R.B. Boyer 657995cc93
agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 16:24:04 -05:00
R.B. Boyer 45609fccdf
server: make sure that the various replication loggers use consistent logging (#8745) 2020-09-24 15:49:38 -05:00
R.B. Boyer 4af8c78f1f
agent: make the json/hcl decoding of ConnectProxyConfig fully work with CamelCase and snake_case (#8741)
Fixes #7418
2020-09-24 13:58:52 -05:00
Daniel Nephin c89eb4ff20 agent/grpc: always close the conn when dialing fails. 2020-09-24 12:53:14 -04:00
Daniel Nephin 87f0eb6790 agent/grpc: seed the rand for shuffling servers 2020-09-24 12:53:14 -04:00
Daniel Nephin ff3610850e agent/grpc: use router.Manager to handle the rebalance
The router.Manager is already rebalancing servers for other connection pools, so it can call into our resolver to do the same.
This change allows us to remove the serf dependency from resolverBuilder, and remove Datacenter from the config.

Also revert the change to refreshServerRebalanceTimer
2020-09-24 12:53:14 -04:00
Daniel Nephin 5e7b9bffdb grpc: restore integration tests for grpc client conn pool
Add a fake rpc Listener
2020-09-24 12:53:14 -04:00
Daniel Nephin e160475fba router: remove grpcServerTracker from managers
It only needs to be refereced from the Router, because there is only 1 instance, and the
Router can call AddServer/RemoveServer like it does on the Manager.
2020-09-24 12:53:14 -04:00
Daniel Nephin 4b041a018d grpc: redeuce dependencies, unexport, and add godoc
Rename GRPCClient to ClientConnPool. This type appears to be more of a
conn pool than a client. The clients receive the connections from this
pool.

Reduce some dependencies by adjusting the interface baoundaries.

Remove the need to create a second slice of Servers, just to pick one and throw the rest away.

Unexport serverResolver, it is not used outside the package.

Use a RWMutex for ServerResolverBuilder, some locking is read-only.

Add more godoc.
2020-09-24 12:53:10 -04:00
Daniel Nephin 4b24470887 grpc: move client conn pool to grpc package 2020-09-24 12:48:12 -04:00
Daniel Nephin fad15171ec grpc: client conn pool and resolver
Extracted from 936522a13c07e8b732b6fde61bba23d05f7b9a70

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-24 12:46:22 -04:00
Daniel Nephin e0119a6e92
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Paul Banks 0594667c3a
Fix bad int -> string conversions caught by go vet changes in 1.15 (#8739) 2020-09-24 11:14:07 +01:00
Alexander Mykolaichuk e4f9c19772
added permission denied error message (#8044) 2020-09-22 20:36:07 +02:00
Hans Hasselberg d48d2bf550
use service datacenter for dns name (#8704)
* Use args.Datacenter instead of configured datacenter
2020-09-22 20:34:09 +02:00
Aliaksandr Mianzhynski c79180980c Return grpc serving status in health check errors 2020-09-22 21:16:58 +03:00
Daniel Nephin b3ec7df80f api: rename HTTPServer to HTTPHandlers
Resolves a TODO about naming. This type is a set of handlers for an http.Server, it is not
itself a Server. It provides http.Handler functions.
2020-09-18 17:38:23 -04:00
Hans Hasselberg c6fa758d6f
fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 10:13:29 +02:00
Daniel Nephin 2d3540d6b5
Merge pull request #8620 from hashicorp/dnephin/better-impl-of-TestAgent.HTTPAddr
http: fix tests incorrectly using HTTPAddr to get the address of the https server
2020-09-17 11:48:57 -04:00
Mike Morris fe984b3ee3
test: update tags for database service registrations and queries (#8693) 2020-09-16 14:05:01 -04:00
Kyle Havlovitz 8f83f7ac13
Merge pull request #8560 from hashicorp/vault-ca-renew-token
Automatically renew the token used by the Vault CA provider
2020-09-16 07:30:30 -07:00
Daniel Nephin 9f83eb3dc9
Merge pull request #8685 from pierresouchay/do_not_flood_logs_with_Non-server_in_server-only_area
[BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
2020-09-15 17:57:05 -04:00
Kyle Havlovitz c8fd61abc7 Merge branch 'master' into vault-ca-renew-token 2020-09-15 14:39:04 -07:00
Daniel Nephin c621b4a420 agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 0536b2047e agent/consul: make router required 2020-09-15 17:26:26 -04:00
Daniel Nephin 49086dd5ae
Merge pull request #8679 from hashicorp/streaming/fix-TestHandler_EmitsStats
streaming: Fix TestHandler_EmitsStats
2020-09-15 17:04:55 -04:00
Kyle Havlovitz 316600a685 Update vault CA for latest api client 2020-09-15 13:33:55 -07:00
Paul Banks 0062106c46
Update UI Config passing to not use an inline script (#8645)
* Update UI Config passing to not use an inline script

* Update agent/http.go

* Fix incorrect placeholder name
2020-09-15 20:57:37 +01:00
Kyle Havlovitz 63d3a5fc1f Clean up CA shutdown logic and error 2020-09-15 12:28:58 -07:00
Kyle Havlovitz 6f1dd25139
Merge pull request #8646 from hashicorp/common-intermediate-ttl
Move IntermediateCertTTL to common CA config
2020-09-15 12:03:29 -07:00
Pierre Souchay 617e0d2364 [BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix #8663
2020-09-15 11:54:59 +02:00
Daniel Nephin 1e40f00567 agent/grpc: make TestHandler_EmitsStats predictable
Occasionally this test would flake. The flakes were fixed by:

1. Stopping the service and retrying to check on metrics. This way we
   also include the active_streams going to 0 in the metric calls.

2. Using a reference to the global Metrics. This way when other tests
   have background goroutines that are still shutting down, they won't
   emit metrics to the metric instance with the fake Sink. The stats
   test can patch the local reference to the global, so the existing
   statHandlers will continue to emit to the global, but the stats
   test will send all metrics to the replacement.
2020-09-14 19:05:22 -04:00
Daniel Nephin c827e7f1e9 grpc: add Datacenter field to testing service response 2020-09-14 19:02:09 -04:00
freddygv e0db834148 Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv 43efb4809c Merge master 2020-09-14 16:17:43 -06:00
freddygv 66e5c5989a Fix type assertion 2020-09-14 16:12:21 -06:00
Daniel Nephin 75515f3431
Merge pull request #8587 from hashicorp/streaming/add-grpc-server
streaming: add gRPC server for handling connections
2020-09-14 15:24:54 -04:00
freddygv 33af8dab9a Resolve conflicts against master 2020-09-11 18:41:58 -06:00
freddygv 60cb306524 Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv ae8c609f10 PR comments 2020-09-11 10:49:26 -06:00
Kyle Havlovitz 1595add842 Clean up Vault renew tests and shutdown 2020-09-11 08:41:05 -07:00
freddygv 5871b667a5 Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
Kyle Havlovitz dc393336d1 Use mapstructure for decoding vault data 2020-09-10 06:31:04 -07:00
Kyle Havlovitz 7588e22739 Add a stop function to make sure the renewer is shut down on leader change 2020-09-10 06:12:48 -07:00
Kyle Havlovitz c52dfeb633 Move IntermediateCertTTL to common CA config 2020-09-10 00:23:22 -07:00
Kyle Havlovitz 1c57b72a9f Add a test for token renewal 2020-09-09 16:36:37 -07:00
Daniel Nephin 1a930fcf75 grpc: Add a simple test service for testing the gRPC server 2020-09-08 12:10:43 -04:00
Daniel Nephin 863a9df951 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00
Daniel Nephin 4eb514a59f http: fix tests incorrectly using HTTPAddr to get the address of the
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
2020-09-04 15:29:17 -04:00
freddygv 1ee039ed95 Set tgw filter router config name to cluster name 2020-09-04 12:45:05 -06:00
Hans Hasselberg 51f079dcdd
secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 11:47:16 +02:00
freddygv 3e4bc36941 Add server receiver to routes and log tgw err 2020-09-03 16:19:58 -06:00
Daniel Nephin 670b7cbd99
Merge pull request #8357 from hashicorp/streaming/add-service-health-events
streaming: add ServiceHealth events
2020-09-03 17:53:56 -04:00
Daniel Nephin ec5d20b0de
Merge pull request #8554 from hashicorp/dnephin/agent-setup-persisted-tokens
agent: move token persistence from agent into token.Store
2020-09-03 17:29:21 -04:00
Daniel Nephin c17a5b0628 state: handle terminating gateways in service health events 2020-09-03 16:58:05 -04:00
Daniel Nephin b241debee7 state: improve comments in catalog_events.go
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:05 -04:00
Daniel Nephin 870823e8ed state: use changeType in serviceChanges
To be a little more explicit, instead of nil implying an indirect change
2020-09-03 16:58:05 -04:00
Daniel Nephin 68682e7e83 don't over allocate slice 2020-09-03 16:58:04 -04:00
Daniel Nephin 5f52220f53 state: fix a bug in building service health events
The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.

Also refactor some things to reduce the arguments to functions.
2020-09-03 16:58:04 -04:00
Daniel Nephin c61313b78a state: Remove unused args and return values
Also rename some functions to identify them as constructors for events
2020-09-03 16:58:04 -04:00
Daniel Nephin 668b98bcce state: use an enum for tracking node changes 2020-09-03 16:58:04 -04:00
Daniel Nephin 7c3c627028 state: serviceHealthSnapshot
refactored to remove unused return value and remove duplication
2020-09-03 16:58:04 -04:00
Daniel Nephin fdfe176deb state: Add Change processor and snapshotter for service health
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:04 -04:00
Daniel Nephin 6a1a43721d state: fix bug in changeTrackerDB.publish
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
2020-09-03 16:58:01 -04:00
Daniel Nephin 81cc3daf69 stream: have SnapshotFunc accept a non-pointer SubscribeRequest
The value is not expected to be modified. Passing a value makes that explicit.
2020-09-03 16:54:02 -04:00
freddygv 56fdae9ace Update resolver defaulting 2020-09-03 13:08:44 -06:00
freddygv b149185794 Update golden files after default route fix for tgw 2020-09-03 12:35:11 -06:00
Daniel Nephin 330b73725f agent: add apiServers type for managing HTTP servers
Remove Server field from HTTPServer. The field is no longer used.
2020-09-03 13:40:12 -04:00
freddygv 23147c1d5b Fix http assertion in route creation 2020-09-03 10:21:20 -06:00
freddygv 0c50b8e769 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv 02d6acd8fc Ensure resolver node with LB isn't considered default 2020-09-03 08:55:57 -06:00
freddygv c4bce2154b Move valid policies to pkg level 2020-09-02 15:49:03 -06:00
freddygv daad3b9210 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
R.B. Boyer b0bde51e70
connect: all config entries pick up a meta field (#8596)
Fixes #8595
2020-09-02 14:10:25 -05:00
Chris Piraino df1381f77f
Merge pull request #8603 from hashicorp/feature/usage-metrics
Track node and service counts in the state store and emit them periodically as metrics
2020-09-02 13:23:39 -05:00
R.B. Boyer 4197bed23b
connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 10:47:19 -05:00
Chris Piraino b245d60200 Set metrics reporting interval to 9 seconds
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
2020-09-02 10:24:23 -05:00
Chris Piraino e9b397005c Update godoc string for memdb wrapper functions/structs 2020-09-02 10:24:22 -05:00
Chris Piraino 80f923a47a Refactor state store usage to track unique service names
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
2020-09-02 10:24:21 -05:00
Chris Piraino 79e6534345 Use ReadTxn interface in state store helper functions 2020-09-02 10:24:20 -05:00
Chris Piraino d90d95421d Add WriteTxn interface and convert more functions to ReadTxn
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
2020-09-02 10:24:19 -05:00
Chris Piraino 45a4057f60 Report node/service usage metrics from every server
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
2020-09-02 10:24:17 -05:00
Chris Piraino 3af96930eb Add new usage memdb table that tracks usage counts of various elements
We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
2020-09-02 10:24:16 -05:00
freddygv d7bda050e0 Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
Daniel Nephin 9535a1b57d token: OSS support for enterprise tokens 2020-08-31 15:10:15 -04:00
Daniel Nephin 8e477feb22 config: use token.Config for ACLToken config
Using the target Config struct reduces the amount of copying and
translating of configuration structs.
2020-08-31 15:10:15 -04:00
Daniel Nephin b64ce07ef7 agent/token: Move token persistence out of agent
And into token.Store. This change isolates any awareness of token
persistence in a single place.

It is a small step in allowing Agent.New to accept its dependencies.
2020-08-31 15:00:34 -04:00
Daniel Nephin fbae521775 fix TestStore_RegularTokens
This test was only passing because t.Parallel was causing every subtest to run with the last value in the iteration,
which sets a value for all tokens. The test started to fail once t.Parallel was removed, but the same failure could
have been produced by adding 'tt := tt' to the t.Run() func.

These tests run in under 10ms, so there is no reason to use t.Parallel.
2020-08-31 14:59:14 -04:00
Matt Keeler 335c604ced
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00
freddygv 58a018c20b Add documentation for resolver LB cfg 2020-08-28 14:46:13 -06:00
freddygv 194d34b09d Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
freddygv 8f470b30d7 Log error as error 2020-08-28 13:11:55 -06:00
freddygv afb14b6705 Compile down LB policy to disco chain nodes 2020-08-28 13:11:04 -06:00
Daniel Nephin 845661c8af
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
Daniel Nephin 39b06a0c0b
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
Pierre Souchay ee50b55163 Added Unit test for cache reloading 2020-08-28 13:03:58 +02:00
freddygv 391d569a45 Add LB policy to service-resolver 2020-08-27 19:44:02 -06:00
Jack 145bcdc2bb
Add http2 and grpc support to ingress gateways (#8458) 2020-08-27 15:34:08 -06:00
R.B. Boyer f2b8bf109c
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
R.B. Boyer a7a8b8d6d9
agent: ensure that we normalize bootstrapped config entries (#8547) 2020-08-27 11:37:25 -05:00
Pierre Souchay f92ae5e6ca Also test reload of EntryFetchMaxBurst 2020-08-27 18:14:05 +02:00
Matt Keeler 106e1d50bd
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay 4983e093a0 Tests that changes in rate limit are taken into account by agent 2020-08-27 16:41:20 +02:00
Pierre Souchay 084d0e8015 Added `options.Equals()` and minor fixes indentation fixes 2020-08-27 13:44:45 +02:00
R.B. Boyer 6fad634512
agent: expose the list of supported envoy versions on /v1/agent/self (#8545) 2020-08-26 10:04:11 -05:00
Kyle Havlovitz 6f7152841f Automatically renew the token used by the Vault CA provider 2020-08-25 10:34:49 -07:00
Pierre Souchay dd385f05e6 Ensure that Cache options are reloaded when `consul reload` is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
André Cruz 673bd69f36
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
André Cruz a64686fab6
testing: Fix govet errors 2020-08-21 18:01:55 +01:00
Daniel Nephin 1c3a638d69
Merge pull request #8537 from hashicorp/dnephin/fix-panic-on-connect-nil
Fix panic when decoding 'Connect: null'
2020-08-20 18:00:25 -04:00
Daniel Nephin 4155cae1cb Fix panic when decoding 'Connect: null'
Surprisingly the json Unmarshal updates the aux pointer to a nil.
2020-08-20 17:52:14 -04:00
Daniel Nephin a97adadd2b config: use logging.Config in RuntimeConfig
To add structure to RuntimeConfig, and remove the need to translate into a third type.
2020-08-19 13:21:00 -04:00
Daniel Nephin e4578aace8 logging: move init of grpclog
This line initializes global state. Moving it out of the constructor and closer to where logging
is setup helps keep related things together.
2020-08-19 13:21:00 -04:00
Daniel Nephin 7349018ff3 logging: Setup accept io.Writer instead of []io.Writer
Also accept a non-pointer Config, since the config is not modified
2020-08-19 13:20:41 -04:00
Daniel Nephin a520cf3ea7 testing: disable global metrics sink in tests
This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling
the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics
does not provide any way of shutting down the global goroutine.
2020-08-18 19:04:57 -04:00
Daniel Nephin 84642486b9 agent: extract dependency creation from New
With this change, Agent.New() accepts many of the dependencies instead
of creating them in New. Accepting fully constructed dependencies from
a constructor makes the type easier to test, and easier to change.

There are still a number of dependencies created in Start() which can
be addressed in a follow up.
2020-08-18 19:04:55 -04:00
Daniel Nephin b204e342c5
Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1
testing: small improvements to TestSessionCreate and testutil.retry
2020-08-18 18:26:05 -04:00
Daniel Nephin ab0d206eac
Merge pull request #8528 from hashicorp/dnephin/move-node-name-validation
config: Move some config validation from Agent.Start to config.Builder.Validate
2020-08-18 18:25:41 -04:00
Hans Hasselberg 02de4c8b76
add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 09:50:24 +02:00
Daniel Nephin 7078ca07fa config: Move remote-script-checks warning to config
Previously it was done in Agent.Start, but it can be done much earlier
2020-08-17 17:39:49 -04:00
Daniel Nephin a0dc4222b6 config: move NodeName validation to config validation
Previsouly it was done in Agent.Start, which is much later then it needs to be.

The new 'dns' package was required, because otherwise there would be an
import cycle. In the future we should move more of the dns server into
the dns package.
2020-08-17 17:25:02 -04:00
Daniel Nephin 647236bb17
Merge pull request #8515 from hashicorp/dnephin/unexport-testing-shims
config: unexport fields and resolve TODOs in config.Builder
2020-08-17 16:03:07 -04:00
Daniel Nephin 3e0d63a6b7 testing: use t.Cleanup in testutil.TempFile
So that it has the same behaviour as TempDir.

Also remove the now unnecessary 'defer os.Remove'
2020-08-14 20:06:01 -04:00
Daniel Nephin 8d35e37b3c testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin fe8790da9e config: unexport and resolve TODOs in config.Builder
- unexport testing shims, and document their purpose
- resolve a TODO by moving validation to NewBuilder and storing the one
  field that is used instead of all of Options
- create a slice with the correct size to avoid extra allocations
2020-08-14 19:23:32 -04:00
Daniel Nephin 85655098be testing: Improve session_endpoint_test
While working on another change I caused a bunch of these tests to fail.
Unfortunately the failure messages were not super helpful at first.

One problem was that the request and response were created outside of
the retry. This meant that when the second attempt happened, the request
body was empty (because the buffer had been consumed), and so the
request was not actually being retried. This was fixed by moving more of
the request creation into the retry block.

Another problem was that these functions can return errors in two ways, and
are not consistent about which way they use. Some errors are returned to
the response writer, but the tests were not checking those errors, which
was causing a panic later on. This was fixed by adding a check for the
response code.

Also adds some missing t.Helper(), and has assertIndex use checkIndex so
that it is clear these are the same implementation.
2020-08-14 18:55:52 -04:00
Daniel Nephin 2725513eea testutil: Add t.Cleanup to TempDir
TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP
env var.
2020-08-14 13:19:10 -04:00
Daniel Nephin 6b0ac22c1b testing: fix flaky test TestDNS_NonExistentDC_RPC
I saw this test flake locally, and it was easy to reproduce with -count=10.

The failure was: 'TestAgent.dns: rpc error: error=No known Consul servers'.

Waiting for the agent seems to fix it.
2020-08-13 18:03:04 -04:00
Daniel Nephin 512a523a3e testing: wait until monitor has started before shutdown
This commit fixes a test that I saw flake locally while running tests. The test output from the monitor
started immediately after the line the test was looking for.

To fix the problem a channel is closed when the goroutine starts. Shutdown is not called until this channel
is closed, which seems to greatly reduce the chance of a flake.
2020-08-13 17:53:29 -04:00
Daniel Nephin b6d91d59f3 testing: Remove TestAgent.Key and change TestAgent.DataDir
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.

This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
2020-08-13 17:53:24 -04:00
Daniel Nephin 3abe4e43d3 testing: use t.Cleanup in TestAgent for returnPorts 2020-08-13 17:09:37 -04:00
Daniel Nephin edf6e74a14 testing: remove unused fields from TestACLAgent 2020-08-13 17:03:55 -04:00
Daniel Nephin 24d8db906f agent: rename vars in newConsulConfig
'base' is a bit misleading, since it is the return value. Renamed to cfg.
2020-08-13 11:58:21 -04:00
Daniel Nephin 88ddd8e0e7 agent: Move setupKeyring functions to keyring.go
There are a couple reasons for this change:

1. agent.go is way too big. Smaller files makes code eaasier to read
   because tools that show usage also include filename which can give
   a lot more context to someone trying to understand which functions
   call other functions.
2. these two functions call into a large number of functions already in
   keyring.go.
2020-08-13 11:58:21 -04:00
Daniel Nephin 119df79d7c agent: unmethod consulConfig
To allow us to move newConsulConfig out of Agent.
2020-08-13 11:58:21 -04:00
Daniel Nephin 055e7e8ca3 Fix conflict in merged PRs
One PR renamed the var from config->cfg, and another used the old name config, which caused the
build to fail on master.
2020-08-13 11:28:26 -04:00
Daniel Nephin 629c34085d state: remove unused Store method receiver
And use ReadTxn interface where appropriate.
2020-08-13 11:25:22 -04:00
Daniel Nephin 20f94bf9ab
Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-13 11:18:11 -04:00
Daniel Nephin fc797a279a
Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-13 11:16:48 -04:00
Daniel Nephin d8ffcd5686
Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 11:16:12 -04:00
Daniel Nephin 45dae87ee7 auto-config: reduce awareness of config
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.

There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.

Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
2020-08-12 13:23:23 -04:00
Daniel Nephin 62e402c4f9 Remove check that hostID is a uuid.
Immediately afterward we hash the ID, so it does not need to be a uuid anymore.
2020-08-12 13:05:10 -04:00
Daniel Nephin 55b074f0eb agent: convert NodeID methods to functions
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.

Also cleanup the testing to use t.Run and require.
2020-08-12 13:05:10 -04:00
Daniel Nephin 6be568119b Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
R.B. Boyer 63422ca9c5
connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470)
Fixes #8466

Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)

If you were to do these in a different order, it would fail without a
crash:

1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
   protocol mismatch)

This PR introduces the missing validation. The two new behaviors are:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
   message about protocol mismatch)

In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
2020-08-12 11:19:20 -05:00
Freddy 77de9bbe22
Notify alias checks when aliased service is [de]registered (#8456) 2020-08-12 09:47:41 -06:00
Daniel Nephin b936f84c07
Merge pull request #8469 from hashicorp/dnephin/config-source
config: make Source an interface to avoid the marshal/unmarshal cycle in auto-config
2020-08-12 11:17:15 -04:00
Hans Hasselberg 7a6d916ddc
Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 08:54:51 +02:00
Freddy 50fee12d62
Internal endpoint to query intentions associated with a gateway (#8400) 2020-08-11 17:20:41 -06:00
Kyle Havlovitz 8118e3db40 Fix a state store comment about version 2020-08-11 13:46:12 -07:00
Kyle Havlovitz 2601585017 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 11:49:52 -07:00
Hans Hasselberg e0297b6e99 Refactor keyring ops:
* changes some functions to return data instead of modifying pointer
  arguments
* renames globalRPC() to keyringRPCs() to make its purpose more clear
* restructures KeyringOperation() to make it more understandable
2020-08-11 13:42:03 +02:00
Hans Hasselberg 08b1fea379 thread local-only through the layers
$ consul keyring -list -local-only
==> Gathering installed encryption keys...

dc1 (LAN):
  aUlAW4ST3+vwseI61so24CoORkyjZofcmHk+j7QPSYQ= [1/1]
2020-08-11 13:41:53 +02:00
Daniel Nephin 3a4242c121 auto-config: Avoid the marshal/unmarshal cycle in auto-config
Use a LiteralConfig and return a config.Config from translate.
2020-08-10 20:07:52 -04:00
freddygv 6dcfa11c21 Update error handling 2020-08-10 17:48:22 -06:00
Daniel Nephin cbdceeb044 config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Mike Morris d9ef146d82
changelog: Update for 1.8.2, 1.7.6, 1.7.5 and 1.6.7 (#8462)
* update bindata_assetfs.go

* Release v1.8.2

* Putting source back into Dev Mode

* changelog: add entries for 1.7.6, 1.7.5 and 1.6.7

Co-authored-by: hashicorp-ci <hashicorp-ci@users.noreply.github.com>
2020-08-07 18:58:09 -04:00
Daniel Nephin bef9348ca8 testing: remove unnecessary defers in tests
The data directory is now removed by the test helper that created it.
2020-08-07 17:28:16 -04:00
Daniel Nephin f3b63514d5 testing: Remove NotifyShutdown
NotifyShutdown was only used for testing. Now that t.Cleanup exists, we
can use that instead of attaching cleanup to the Server shutdown.

The Autopilot test which used NotifyShutdown doesn't need this
notification because Shutdown is synchronous. Waiting for the function
to return is equivalent.
2020-08-07 17:14:44 -04:00
Matt Keeler 05ebf9b8c5
Require token replication to be enabled in secondary dcs when ACLs are enabled with AutoConfig (#8451)
AutoConfig will generate local tokens for clients and the ability to use local tokens is gated off of token replication being enabled and being configured with a replication token. Therefore we already have a hard requirement on having token replication enabled, this commit just makes sure to surface that to the operator instead of having to discern what the issue is from RPC errors.
2020-08-07 10:20:27 -04:00
Hans Hasselberg fdceb24323
auto_config implies connect (#8433) 2020-08-07 12:02:02 +02:00
freddygv 4ac644f401 Fix test build 2020-08-06 11:31:56 -06:00
Hans Hasselberg 417d4adfb7
Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 10:42:09 +02:00
freddygv 83f4e32376 PR comments and addtl tests 2020-08-05 16:07:11 -06:00
Daniel Nephin 62641b820a
Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 14:31:43 -04:00
Daniel Nephin 061ae94c63 Rename NewClient/NewServer
Now that duplicate constructors have been removed we can use the shorter names for the single constructor.
2020-08-05 14:00:55 -04:00
Daniel Nephin 67c505cd90 Remove LogOutput from Agent
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.

Removing this field also allows us to remove one of the return values from logging.Setup.
2020-08-05 14:00:44 -04:00
Daniel Nephin e6c94c1411 Remove LogOutput from Server 2020-08-05 14:00:44 -04:00
Daniel Nephin fdf966896f Remove LogOutput from Client 2020-08-05 14:00:42 -04:00
Daniel Nephin 73493ca01b Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin 7c4566f116 api: Use a Logger instead of an io.Writer in api.Watch
So that we can pass around only a Logger, not a LogOutput
2020-08-05 13:25:08 -04:00
Daniel Nephin c7c941811d config: Remove unused field 2020-08-05 13:25:08 -04:00
Daniel Nephin 21fa99a83b Return nil value on error.
The main bug was fixed in cb050b280ceb4186de765118611a7a92d8158c3f, but the return value of 'result' is still misleading.
Change the return value to nil to make the code more clear.
2020-08-05 13:10:17 -04:00
R.B. Boyer 8ea4c482b3
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
freddygv c87af29506 collect GatewayServices from iter in a function 2020-07-31 13:30:40 -06:00
Freddy 7c2c8815d7
Avoid panics during shutdown routine (#8412) 2020-07-30 11:11:10 -06:00
freddygv 94d1f0a310 end to end changes to pass gatewayservices to /ui/services/ 2020-07-30 10:21:11 -06:00
Matt Keeler a0ebab5bac
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
freddygv 521ec63339 Support ConnectedWithProxy 2020-07-30 09:32:12 -06:00
Matt Keeler 76add4f24c
Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 10:15:12 -04:00
Hans Hasselberg ffdc3057fe
agent/cache test for cache throttling. (#8396) 2020-07-30 14:41:13 +02:00
Matt Keeler dad0f189a2
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler 2ec4e46eb2
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Matt Keeler 8df112526d
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Pierre Souchay 947d8eb039
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 3a1058a06b
Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 10:00:51 -04:00
Matt Keeler e7d8a02ae8
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Daniel Nephin 597dcf2bfb
Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2
stream: close subscriptions on shutdown
2020-07-23 13:12:50 -04:00
Matt Keeler c3e7d689b7
Refactor the agentpb package (#8362)
First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
2020-07-23 11:24:20 -04:00
Paul Coignet 9faa0a2a02
Fix tests 2020-07-23 11:04:10 +02:00
Daniel Nephin decba06b7d stream: close all subs when EventProcessor is shutdown. 2020-07-22 19:04:10 -04:00
Daniel Nephin e802689bbe stream: fix overallocation in filter
And add tests
2020-07-22 19:04:10 -04:00
Daniel Nephin f64725f7aa state: speed up TestStateStore_ServicesByNodeMeta
Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.
2020-07-22 16:57:06 -04:00
Daniel Nephin a44ddea9ba state: Use subtests in TestStateStore_ServicesByNodeMeta
These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.
2020-07-22 16:39:09 -04:00
Daniel Nephin 6d3b042872
Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 15:21:52 -04:00
Matt Keeler 8ea8a939f0
Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update 2020-07-21 13:15:27 -04:00
Daniel Nephin dbd7e9da39
Merge pull request #8344 from hashicorp/dnephin/fix-flakes-in-stream
stream: handle empty event in TestEventSnapshot
2020-07-21 13:14:35 -04:00
Daniel Nephin 80ff174880 testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Matt Keeler 6d94900cd7
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Matt Keeler 133a6d99f2
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin f913bed325 checks: wait for goroutine to complete
CheckAlias already had a waitGroup, but the Add() call was happening too late, which was causing a race in tests. The add must happen before the goroutine is started.

CheckHTTP did not have a waitGroup, so I added it to match CheckAlias.

It looks like a lot of the implementation could be shared, and may not need all of channel, waitgroup and bool, but I will leave that refactor for another time.
2020-07-20 18:55:39 -04:00
Daniel Nephin 7599e280de stream: handle empty event in TestEventSnapshot
When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers.

We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.
2020-07-20 18:20:02 -04:00
Daniel Nephin 9f5f8abcbf
Merge pull request #8245 from hashicorp/dnephin/use-not-modified-in-cache
agent/cache: Use AllowNotModified in CatalogListServices
2020-07-20 15:30:52 -04:00
Daniel Nephin 235845af21
Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 14:41:17 -04:00
Paul Coignet 6f80d33a22
Add default prefix_filter 2020-07-20 10:39:58 +02:00
Daniel Nephin 75f10fb191 state: update calls that are no longer state methods
In a previous commit these methods were changed to functions, so remove the Store paramter.
2020-07-16 15:46:10 -04:00
Daniel Nephin 3fcb2e16f4 state: un-method funcs that don't use their receiver
This change was mostly automated with the following

First generate a list of functions with:

  git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'

Then the list was curated a bit with trial/error to remove and add funcs
as necessary.

Finally the replacement was done with:

  dir=agent/consul/state
  file=${1-funcnames}

  while read fn; do
    echo "$fn"
    sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
  done < $file
2020-07-16 15:30:39 -04:00
Daniel Nephin edb0a4f1f8 store: convert methods that don't use their receiver to functions
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.

This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
2020-07-16 15:30:10 -04:00
André e93084be6c
minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 09:47:33 -04:00
Daniel Nephin 797abe1f00 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Daniel Nephin 8aa3335b22 agent/cache: Update some docstrings 2020-07-14 18:58:20 -04:00
Daniel Nephin a2f8605c66 stream: Add forceClose and refactor subscription filtering
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.

Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.

NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.

Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.

Also improve some godoc

Only unwrap itemBuffer.Err when necessary
2020-07-14 15:57:47 -04:00
Daniel Nephin 2595436f62 stream: Improve docstrings
Also rename ResumeStrema to EndOfEmptySnapshot to be more consistent with other framing events

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:47 -04:00
Daniel Nephin 16a2b3fafc stream: change Topic to an interface
Consumers of the package can decide on which type to use for the Topic. In the future we may
use a gRPC type for the topic.
2020-07-14 15:57:47 -04:00
Daniel Nephin aa571bd0ce state: Move change processing out of EventPublisher
EventPublisher was receiving TopicHandlers, which had a couple of
problems:

- ChangeProcessors were being grouped by Topic, but they completely
  ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
  changes

By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.

Handlers is now only SnapshotHandlers, which are still mapped by Topic.

Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
2020-07-14 15:57:47 -04:00
Daniel Nephin 23a940daad server: Abandom state store to shutdown EventPublisher
So that we don't leak goroutines
2020-07-14 15:57:47 -04:00
Daniel Nephin e1305fe80c stream: unexport identifiers
Now that EventPublisher is part of stream a lot of the internals can be hidden
2020-07-14 15:57:47 -04:00
Daniel Nephin 9e37894778 stream: Move EventPublisher to stream package
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.

The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
2020-07-14 15:57:47 -04:00
Daniel Nephin 6e87e83d77 state: Make handleACLUpdate async once again
So that we keep as much as possible out of the FSM commit hot path.
2020-07-14 15:57:47 -04:00
Daniel Nephin a92dab724d state: Use interface for Txn
Also store the index in Changes instead of the Txn.

This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
2020-07-14 15:57:46 -04:00
Daniel Nephin c778d61b6a stream.Subscription unexport fields and additiona docstrings 2020-07-14 15:57:46 -04:00
Daniel Nephin 37a38629d7 Add a context for stopping EventPublisher goroutine 2020-07-14 15:57:46 -04:00
Daniel Nephin 02bc5a26e4 EventPublisher: Make Unsubscribe a function on Subscription
It is critical that Unsubscribe be called with the same pointer to a
SubscriptionRequest that was used to create the Subscription. The
docstring made that clear, but it sill allowed a caler to get it wrong by
creating a new SubscriptionRequest.

By hiding this detail from the caller, and only exposing an Unsubscribe
method, it should be impossible to fail to Unsubscribe.

Also update some godoc strings.
2020-07-14 15:57:46 -04:00
Daniel Nephin 86976cf23c EventPublisher: handleACL changes synchronously
Use a separate lock for subscriptions.ByToken to allow it to happen synchronously
in the commit flow.
This removes the need to create a new txn for the goroutine, and removes
the need for EventPublisher to contain a reference to DB.
2020-07-14 15:57:46 -04:00
Daniel Nephin 606121fae6 stream.EventSnapshot: reduce the fields on the struct
Many of the fields are only needed in one place, and by using a closure
they can be removed from the struct. This reduces the scope of the variables
making it esier to see how they are used.
2020-07-14 15:57:45 -04:00
Daniel Nephin 7196917051 stream.EventBuffer: Seed the fuzz test with time.Now()
Otherwise the test will run with exactly the same values each time.

By printing the seed we can attempt to reproduce the test by adding an env var to override the seed
2020-07-14 15:57:45 -04:00
Daniel Nephin 525b275a52 state: memdb_wrapper.go -> memdb.go
Renaming in a separate commit so that git can merge changes to the file.
2020-07-14 15:57:45 -04:00
Daniel Nephin b5d2bea770 state: publish changes from Commit
Make topicRegistry use functions instead of unbound methods
Use a regular memDB in EventPublisher to remove a reference cycle
Removes the need for EventPublisher to use a store
2020-07-14 15:57:45 -04:00
Daniel Nephin f626c3d6c5 EventPublisher: docstrings and getTopicBuffer
also rename commitCh -> publishCh
2020-07-14 15:57:45 -04:00
Daniel Nephin 2020e9c7c7 ProcessChanges: use stream.Event
Also remove secretHash, which was used to hash tokens. We don't expose
these tokens anywhere, so we can use the string itself instead of a
Hash.

Fix acl_events_test.go for storing a structs type.
2020-07-14 15:57:45 -04:00
Daniel Nephin 2e45bbbb3e stream: Use local types for Event Topic SubscriptionRequest 2020-07-14 15:57:45 -04:00
Daniel Nephin 3d62013062 Rename stream_publisher.go -> event_publisher.go 2020-07-14 15:57:44 -04:00
Daniel Nephin 526fb53f85 Add streaming package with Subscription and Snapshot components.
The remaining files from 7965767de0bd62ab07669b85d6879bd5f815d157

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:44 -04:00
Matt Keeler adf53f5113
Add ability for notifications when one of the agent tokens is updated (#8301)
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-07-14 09:53:55 -04:00
Hans Hasselberg 0c39b2c820
add support for envoy 1.14.4, 1.13.4, 1.12.6 (#8216) 2020-07-13 15:44:44 -05:00
Chris Piraino b80cbb499f
Set enterprise metadata after resolving the token (#8302)
The token can encode enterprise metadata information, and we must make
sure we set that on the reply so that we can correct filter ACLs.
2020-07-13 13:39:57 -05:00
Daniel Nephin 979a3d2844 watch: Allow args from different types
Fixes a bug where specifying a slice of args with a single item was being converted to
a string when config was loaded, causing an error.
2020-07-10 17:18:32 -04:00
Freddy a30c10a18c
Add api mod support for /catalog/gateway-services (#8278) 2020-07-10 13:01:45 -06:00
Daniel Nephin 90acc6a920 watch: extract makeWatchPlan to facilitate testing
There is a bug in here now that slices in opaque config are unsliced. But to test that bug fix we
need a function that can be easily tested.
2020-07-10 13:33:45 -04:00
Paul Coignet 8f373e7d6e
Keep both metrics 2020-07-10 11:27:22 +02:00
R.B. Boyer 6e3d07c995
xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222)
- cut down on extra node metadata transmission
- split the golden file generation to compare all envoy version
2020-07-09 17:04:51 -05:00
Daniel Nephin 43cd6e91d1
Merge pull request #8231 from hashicorp/dnephin/unembed-HTTPServer-Server
agent/http: un-embed the http.Server
2020-07-09 17:42:33 -04:00
Daniel Nephin b80a0e0def agent/http: Update TestSetupHTTPServer_HTTP2
To remove the need to store the http.Server. This will allow us to
remove the http.Server field from the HTTPServer struct.
2020-07-09 16:42:19 -04:00
Daniel Nephin 13e0d258b5
Merge pull request #8237 from hashicorp/dnephin/remove-acls-enabled-from-delegate
Remove ACLsEnabled from delegate interface
2020-07-09 16:35:43 -04:00
Paul Coignet a1e698dcff
Use method and path as labels 2020-07-09 10:31:27 +02:00
Matt Keeler 39d9babab3
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-08 12:36:11 -04:00
Matt Keeler a77ed471c8
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-08 11:05:44 -04:00
Matt Keeler 386ec3a2a2
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.
2020-07-08 11:05:44 -04:00
Chris Piraino 9d92c42c90
Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 10:43:04 -05:00
Daniel Nephin 8b6036c077 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00
Daniel Nephin 24c6bcfbe8
Merge pull request #8215 from hashicorp/dnephin/support-not-modified-response-server
agent/consul: Add support for NotModified to two endpoints
2020-07-03 16:15:31 -04:00
Pierre Souchay f77182aa51
Upgrade go-connlimit to v0.3.0 / return http 429 on too many connections (#8221)
Fixes #7527

I want to highlight this and explain what I think the implications are and make sure we are aware:

* `HTTPConnStateFunc` closes the connection when it is beyond the limit. `Close` does not block.
* `HTTPConnStateFuncWithDefault429Handler(10 * time.Millisecond)` blocks until the following is done (worst case):
  1) `conn.SetDeadline(10*time.Millisecond)` so that
  2) `conn.Write(429error)` is guaranteed to timeout after 10ms, so that the http 429 can be written and 
  3) `conn.Close` can happen

The implication of this change is that accepting any new connection is worst case delayed by 10ms. But only after a client reached the limit already.
2020-07-03 09:25:07 +02:00
Daniel Nephin 8a8644edd5 agent/http: un-embed the HTTPServer
The embedded HTTPServer struct is not used by the large HTTPServer
struct. It is used by tests and the agent. This change is a small first
step in the process of removing that field.

The eventual goal is to reduce the scope of HTTPServer making it easier
to test, and split into separate packages.
2020-07-02 17:21:12 -04:00
Daniel Nephin dfa8856e5f agent/consul: Add support for NotModified to two endpoints
A query made with AllowNotModifiedResponse and a MinIndex, where the
result has the same Index as MinIndex, will return an empty response
with QueryMeta.NotModified set to true.

Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-02 17:05:46 -04:00
Matt Keeler 87764e5bfb
Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 09:49:49 -04:00
Yury Evtikhov dbf3c05fa5 DNS: add IsErrQueryNotFound function for easier error evaluation 2020-07-01 03:41:44 +01:00
Yury Evtikhov c594dfa1e6 DNS: fix agent returning SERVFAIL where NXDOMAIN should be returned 2020-07-01 01:51:21 +01:00
Yury Evtikhov 4e5ce6d4a7 DNS: add test to verify NXDOMAIN is returned when a non-existent domain is queried over RPC 2020-07-01 01:51:16 +01:00
Matt Keeler fa6a2b38d9
Add an AutoEncrypt “integration” test
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
2020-06-30 15:23:29 -04:00
Matt Keeler a97f9ff386
Overwrite agent leaf cert trust domain on the servers 2020-06-30 09:59:08 -04:00
Matt Keeler 5600069d69
Store the Connect CA rate limiter on the server
This fixes a bug where auto_encrypt was operating without utilizing a common rate limiter.
2020-06-30 09:59:07 -04:00
Matt Keeler e9e88e4527
Initialize the agent leaf cert cache result with a state to prevent unnecessary second certificate signing 2020-06-30 09:59:07 -04:00
Matt Keeler fa42d9b34f
Fix auto_encrypt IP/DNS SANs
The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.
2020-06-30 09:59:07 -04:00
Matt Keeler 3d9ca75e34
Merge pull request #8193 from hashicorp/feature/auto-config/suppress-config-warnings 2020-06-27 10:06:52 -04:00
R.B. Boyer 72a515f5ec
connect: various changes to make namespaces for intentions work more like for other subsystems (#8194)
Highlights:

- add new endpoint to query for intentions by exact match

- using this endpoint from the CLI instead of the dump+filter approach

- enforcing that OSS can only read/write intentions with a SourceNS or
  DestinationNS field of "default".

- preexisting OSS intentions with now-invalid namespace fields will
  delete those intentions on initial election or for wildcard namespaces
  an attempt will be made to downgrade them to "default" unless one
  exists.

- also allow the '-namespace' CLI arg on all of the intention subcommands

- update lots of docs
2020-06-26 16:59:15 -05:00
Matt Keeler a1275c6e7d
Use the DNS and IP SANs from the auto config stanza when set 2020-06-26 16:01:30 -04:00
Matt Keeler 1fcb9ef03f
Overhaul the auto-config translation
This fixes some issues around spurious warnings about using enterprise configuration in OSS.
2020-06-26 15:25:21 -04:00
Freddy 6ef032fefe
Split up unused key validation for oss/ent (#8189)
Split up unused key validation in config entry decode for oss/ent.

This is needed so that we can return an informative error in OSS if namespaces are provided.
2020-06-25 13:58:29 -06:00
Daniel Nephin 7d5f1ba6bd
Merge pull request #8176 from hashicorp/dnephin/add-linter-unparam-1
lint: add unparam linter and fix some of the issues
2020-06-25 15:34:48 -04:00
Matt Keeler a448e3e636
Merge pull request #8184 from hashicorp/bugfix/goroutine-leaks 2020-06-25 09:22:19 -04:00
Chris Piraino 8db22c166a
Merge pull request #7932 from hashicorp/ingress/internal-ui-endpoint-multiple-ports
Update gateway-services-nodes API endpoint to allow multiple addresses
2020-06-24 17:11:01 -05:00
Chris Piraino 58eb3710fc remove obsolete comments about test parallelization 2020-06-24 16:36:13 -05:00
Chris Piraino abe2ca94c5 Update gateway-services-nodes API endpoint to allow multiple addresses
Previously, we were only returning a single ListenerPort for a single
service. However, we actually allow a single service to be serviced over
multiple ports, as well as allow users to define what hostnames they
expect their services to be contacted over. When no hosts are defined,
we return the default ingress domain for any configured DNS domain.

To show this in the UI, we modify the gateway-services-nodes API to
return a GatewayConfig.Addresses field, which is a list of addresses
over which the specific service can be contacted.
2020-06-24 16:35:23 -05:00
Matt Keeler 2ab8af4093
Add a test for go routine leaks
This is in its own separate package so that it will be a separate test binary that runs thus isolating the go runtime from other tests and allowing accurate go routine leak checking.

This test would ideally use goleak.VerifyTestMain but that will fail 100% of the time due to some architectural things (blocking queries and net/rpc uncancellability).

This test is not comprehensive. We should enable/exercise more features and more cluster configurations. However its a start.
2020-06-24 17:09:50 -04:00
Matt Keeler d471977f62
Fix go routine leak in auto encrypt ca roots tracking 2020-06-24 17:09:50 -04:00
Matt Keeler 90e741c6d2
Allow cancelling blocking queries in response to shutting down. 2020-06-24 17:09:50 -04:00
Daniel Nephin 0285956fac Update TestAgent_GetCoordinate
The old test case was a very specific regresion test for a case that is no longer possible.
Replaced with a new test that checks the default coordinate is returned.
2020-06-24 13:00:15 -04:00
Daniel Nephin 56ecfce5fa Remove unused return values 2020-06-24 13:00:15 -04:00
Daniel Nephin 07c1081d39 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Matt Keeler 341aedbce9
Ensure that retryLoopBackoff can be cancelled
We needed to pass a cancellable context into the limiter.Wait instead of context.Background. So I made the func take a context instead of a chan as most places were just passing through a Done chan from a context anyways.

Fix go routine leak in the gateway locator
2020-06-24 12:41:08 -04:00
Matt Keeler 934f92f83b
Don’t leak metrics go routines in tests (#8182) 2020-06-24 10:15:25 -04:00
gitforbit aebb8e09a7
agent-http: cleanup: return nil instead of err (#8043)
Since err is already checked, it should return `nil`
2020-06-24 14:29:21 +02:00
R.B. Boyer ba83b52b32
connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8165) 2020-06-23 15:19:56 -05:00
freddygv 224f486aef Update namespaces subject-verb agreement 2020-06-23 10:57:30 -06:00
freddygv c631e7496a Remove break 2020-06-22 19:59:04 -06:00
freddygv e81b6d25ad Let users know namespaces are ent only in config entry decode 2020-06-22 19:59:04 -06:00
Pierre Souchay 667b36cdba
Returns DNS Error NSDOMAIN when DC does not exists (#8103)
This will allow to increase cache value when DC is not valid (aka
return SOA to avoid too many consecutive requests) and will
distinguish DC being temporarily not available from DC not existing.

Implements https://github.com/hashicorp/consul/issues/8102
2020-06-22 09:01:48 -04:00
Matt Keeler 7b49fc1529
Require enabling TLS to enable Auto Config (#8159)
On the servers they must have a certificate.

On the clients they just have to set verify_outgoing to true to attempt TLS connections for RPCs.

Eventually we may relax these restrictions but right now all of the settings we push down (acl tokens, acl related settings, certificates, gossip key) are sensitive and shouldn’t be transmitted over an unencrypted connection. Our guides and docs should recoommend verify_server_hostname on the clients as well.

Another reason to do this is weird things happen when making an insecure RPC when TLS is not enabled. Basically it tries TLS anyways. We should probably fix that to make it clearer what is going on.
2020-06-19 16:38:14 -04:00
Freddy 7e7c783c8f
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Matt Keeler 9dc9f7df15
Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 15:16:00 -04:00
Daniel Nephin 896b057a9f
Merge pull request #8147 from hashicorp/dnephin/remove-private-ip-2
Remove some dead code from agent/consul/util.go
2020-06-18 15:51:09 -04:00
Matt Keeler 7086a50353 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 15:22:24 -04:00
Daniel Nephin b5ef9b7ea9 Remove bytesToUint64 from agent/consul 2020-06-18 12:45:43 -04:00
Daniel Nephin 81bc082b63 Remove unused private IP code from agent/consul 2020-06-18 12:40:38 -04:00
Matt Keeler 2c7844d220
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Matt Keeler f5d57ccd48
Allow the Agent its its child Client/Server to share a connection pool
This is needed so that we can make an AutoConfig RPC at the Agent level prior to creating the Client/Server.
2020-06-17 16:19:33 -04:00
Matt Keeler 8c601ad8db
Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 16:07:25 -04:00
Chris Piraino 79d003d395
Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-17 14:58:01 -05:00
Daniel Nephin 3d03d72727
Merge pull request #7762 from hashicorp/dnephin/warn-on-unknown-service-file
config: warn if a config file is being skipped because of its file extension
2020-06-17 15:14:40 -04:00
Daniel Nephin cb736b6947 config: warn when a config file is skipped
All commands which read config (agent, services, and validate) will now
print warnings when one of the config files is skipped because it did
not match an expected format.

Also ensures that config validate prints all warnings.
2020-06-17 13:08:54 -04:00
Daniel Nephin 1ef8279ac9
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Matt Keeler eda8cb39fd
Implement the insecure version of the Cluster.AutoConfig RPC endpoint
Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.
2020-06-17 11:25:29 -04:00
Pierre Souchay f7a1189dba
gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 12:16:13 +02:00
Daniel Nephin 8753d1f1ba ci: Add ineffsign linter
And fix an additional ineffective assignment that was not caught by staticcheck
2020-06-16 17:32:50 -04:00
Daniel Nephin 97342de262
Merge pull request #8070 from hashicorp/dnephin/add-gofmt-simplify
ci: Enable gofmt simplify
2020-06-16 17:18:38 -04:00
Matt Keeler d994dc7b35
Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 15:03:22 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin 5f24171f13 ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Daniel Nephin 71e6534061 Rename txnWrapper to txn 2020-06-16 13:06:02 -04:00
Daniel Nephin 537ae1fd46 Rename db 2020-06-16 13:04:31 -04:00
Daniel Nephin 78c76f0773 Handle return value from txn.Commit 2020-06-16 13:04:31 -04:00
Daniel Nephin 50db8f409a state: Update docstrings for changeTrackerDB and txn
And un-embed memdb.DB to prevent accidental access to underlying
methods.
2020-06-16 13:04:31 -04:00
Paul Banks f9a6386c4a state: track changes so that they may be used to produce change events 2020-06-16 13:04:29 -04:00
Matt Keeler cdc4b20afa
ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 12:54:27 -04:00
Daniel Nephin 2c2da41b3d config: refactor to consolidate all File->Source loading
Previously the logic for reading ConfigFiles and produces Sources was split
between NewBuilder and Build. This commit moves all of the logic into NewBuilder
so that Build() can operate entirely on Sources.

This change is in preparation for logging warnings when files have an
unsupported extension.

It also reduces the scope of BuilderOpts, and gets us very close to removing
Builder.options.
2020-06-16 12:52:23 -04:00
Daniel Nephin c820a8de88 config: Make ConfigFormat not a pointer
The nil value was never used. We can avoid a bunch of complications by
making the field a string value instead of a pointer.

This change is in preparation for fixing a silent config failure.
2020-06-16 12:52:22 -04:00
Daniel Nephin b7b652e8c9 config: rename Flags to BuilderOpts
Flags is an overloaded term in this context. It generally is used to
refer to command line flags. This struct, however, is a data object
used as input to the construction.

It happens to be partially populated by command line flags, but
otherwise has very little to do with them.

Renaming this struct should make the actual responsibility of this struct
more obvious, and remove the possibility that it is confused with
command line flags.

This change is in preparation for adding additional fields to
BuilderOpts.
2020-06-16 12:51:19 -04:00
Daniel Nephin 5ac012dddf config: remove Args field from Flags
This field was populated for one reason, to test that it was empty.
Of all the callers, only a single one used this functionality. The rest
constructed a `Flags{}` struct which did not set Args.

I think this shows that the logic was in the wrong place. Only the agent
command needs to care about validating the args.

This commit removes the field, and moves the logic to the one caller
that cares.

Also fix some comments.
2020-06-16 12:49:53 -04:00
Daniel Nephin bd866d694c agent/service_manager: remove 'updateCh' field from serviceConfigWatch
Passing the channel to the function which uses it significantly
reduces the scope of the variable, and makes its usage more explicit. It
also moves the initialization of the channel closer to where it is used.

Also includes a couple very small cleanups to remove a local var and
read the error from `ctx.Err()` directly instead of creating a channel
to check for an error.
2020-06-16 12:15:57 -04:00
Daniel Nephin 1c7fce73a8 agent/service_manager: remove 'defaults' field from serviceConfigWatch
This field was always read by the same function that populated the field,
so it does not need to be a field. Passing the value as an argument to
functions makes it more obvious where the value comes from, and also reduces
the scope of the variable significantly.
2020-06-16 12:15:52 -04:00
Daniel Nephin 93d7debbeb agent/service_manager: Pass ctx around
[The documentation for context](https://golang.org/pkg/context/)
recommends not storing context in a struct field:

> Do not store Contexts inside a struct type; instead, pass a Context
> explicitly to each function that needs it. The Context should be the
> first parameter, typically named ctx...

Sometimes there are good reasons to not follow this recommendation, but
in this case it seems easy enough to follow.

Also moved the ctx argument to be the first in one of the function calls
to follow the same recommendation.
2020-06-16 12:14:00 -04:00
Daniel Nephin 13f564bdd4
Merge pull request #8074 from hashicorp/dnephin/remove-references-to-PatchSliceOfMaps
Update comments that reference PatchSliceOfMaps
2020-06-15 14:33:10 -04:00
Matt Keeler 976f922abf
Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 11:01:25 -04:00
freddygv f4179b400d Update telemetry for gateway-services endpoint 2020-06-12 14:44:36 -06:00
freddygv 51f8f76b3b Remove unused method and fixup docs ref 2020-06-12 13:47:43 -06:00
freddygv cc4ff3ae02 Fixup stray sid references 2020-06-12 13:47:43 -06:00
freddygv 1e7e716742 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
freddygv 1cab73e609 Create HTTP endpoint 2020-06-12 13:46:47 -06:00
freddygv 806b1fb608 Move GatewayServices out of Internal 2020-06-12 13:46:47 -06:00
Freddy 66e2def461
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00
Chris Piraino cd93e0cd99
Allow users to set hosts to the wildcard specifier when TLS is disabled (#8083)
This allows easier demoing/testing of ingress gateways, while still
preserving the validation we have for DNSSANs
2020-06-11 10:03:06 -05:00
Chris Piraino cba863af84
Move ingress param to a new endpoint (#8081)
In discussion with team, it was pointed out that query parameters tend
to be filter mechanism, and that semantically the "/v1/health/connect"
endpoint should return "all healthy connect-enabled endpoints (e.g.
could be side car proxies or native instances) for this service so I can
connect with mTLS".

That does not fit an ingress gateway, so we remove the query parameter
and add a new endpoint "/v1/health/ingress" that semantically means
"all the healthy ingress gateway instances that I can connect to
to access this connect-enabled service without mTLS"
2020-06-10 13:07:15 -05:00
Daniel Nephin af063a5692 Update comments that reference PatchSliceOfMaps
To reference decode.HookWeakDecodeFromSlice instead.

Also removes a step from the adding config fields checklist which is
no longer necessary.
2020-06-09 17:43:05 -04:00
Chris Piraino b8a43e164a
Merge pull request #8064 from hashicorp/ingress/health-query-param
Add API query parameter ?ingress to allow users to find ingress gateways associated to a service
2020-06-09 16:08:28 -05:00
Chris Piraino 4fe1f5fdb9 Remove TODO note about ingress API, it is done! 2020-06-09 14:58:30 -05:00
Chris Piraino 0e72da76db Set connect or ingress boolean after checking for query param 2020-06-09 14:45:21 -05:00
Daniel Nephin 6719f1a6fa
Merge pull request #7900 from hashicorp/dnephin/add-linter-staticcheck-2
intentions: fix a bug in Intention.SetHash
2020-06-09 15:40:20 -04:00
Daniel Nephin 5f14eb124c
Merge pull request #8037 from hashicorp/dnephin/add-linter-staticcheck-5
ci: Enabled SA2002 staticcheck check
2020-06-09 15:31:24 -04:00
Hans Hasselberg 7404712854
acl: do not resolve local tokens from remote dcs (#8068) 2020-06-09 21:13:09 +02:00
Kyle Havlovitz b7cf5139dd
Merge pull request #8040 from hashicorp/ingress/expose-cli
Ingress expose CLI command
2020-06-09 12:11:23 -07:00
Chris Piraino 5f28de5654 Add ?ingress query parameter on /v1/health/connect
Refactor boolean query parameter logic from ?passing value to re-use
with ingress
2020-06-09 11:44:31 -05:00
Daniel Nephin c1feec176f
Merge pull request #7964 from hashicorp/dnephin/remove-patch-slice-of-maps-forward-compat
config: Use HookWeakDecodeFromSlice in place of PatchSliceOfMaps
2020-06-08 19:53:04 -04:00
Daniel Nephin 7b99d9a25d config: add HookWeakDecodeFromSlice
Currently opaque config blocks (config entries, and CA provider config) are
modified by PatchSliceOfMaps, making it impossible for these opaque
config sections to contain slices of maps.

In order to fix this problem, any lazy-decoding of these blocks needs to support
weak decoding of []map[string]interface{} to a struct type before
PatchSliceOfMaps is replaces. This is necessary because these config
blobs are persisted, and during an upgrade an older version of Consul
could read one of the new configuration values, which would cause an error.

To support the upgrade path, this commit first introduces the new hooks
for weak decoding of []map[string]interface{} and uses them only in the
lazy-decode paths. That way, in a future release, new style
configuration will be supported by the older version of Consul.

This decode hook has a number of advantages:

1. It no longer panics. It allows mapstructure to report the error
2. It no longer requires the user to declare which fields are slices of
   structs. It can deduce that information from the 'to' value.
3. It will make it possible to preserve opaque configuration, allowing
   for structured opaque config.
2020-06-08 17:05:09 -04:00
Hans Hasselberg bec21c849d
Tokens converted from legacy ACLs get their Hash computed (#8047)
* Fixes #5606: Tokens converted from legacy ACLs get their Hash computed

This allows new style token replication to work for legacy tokens as well when they change.

* tests: fix timestamp comparison

Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2020-06-08 21:44:06 +02:00
Chris Piraino 5d0cb00ec3
Always require Host header values for http services (#7990)
Previously, we did not require the 'service-name.*' host header value
when on a single http service was exposed. However, this allows a user
to get into a situation where, if they add another service to the
listener, suddenly the previous service's traffic might not be routed
correctly. Thus, we always require the Host header, even if there is
only 1 service.

Also, we add the make the default domain matching more restrictive by
matching "service-name.ingress.*" by default. This lines up better with
the namespace case and more accurately matches the Consul DNS value we
expect people to use in this case.
2020-06-08 13:16:24 -05:00
Hans Hasselberg 4790f97bb6
http: use default minsize for gzip handler. (#7354)
Fixes #6306
2020-06-08 10:10:08 +02:00
Hans Hasselberg 532e14fdc4
agent: add option to disable agent cache for HTTP endpoints (#8023)
This allows the operator to disable agent caching for the http endpoint.
It is on by default for backwards compatibility and if disabled will
ignore the url parameter `cached`.
2020-06-08 10:08:12 +02:00
Kyle Havlovitz 3f7c072b75 Add connect expose CLI command 2020-06-05 14:54:29 -07:00
Daniel Nephin 1cdfc4f290 ci: Enabled SA2002 staticcheck check
And handle errors in the main test goroutine
2020-06-05 17:50:11 -04:00
Hans Hasselberg 681124eaff
Setup intermediate_pki_path on secondary when using vault (#8001)
Make sure to mount vault backend for intermediate_pki_path on secondary
dc.
2020-06-05 21:36:22 +02:00
Daniel Nephin b9e4544ec3 intentions: fix a bug in Intention.SetHash
Found using staticcheck.

binary.Write does not accept int types without a size. The error from binary.Write was ignored, so we never saw this error. Casting the data to uint64 produces a correct hash.

Also deprecate the Default{Addr,Port} fields, and prevent them from being encoded. These fields will always be empty and are not used.
Removing these would break backwards compatibility, so they are left in place for now.

Co-authored-by: Hans Hasselberg <me@hans.io>
2020-06-05 14:51:43 -04:00
R.B. Boyer 2867730e8a
tests: ensure that the ServiceExists helper function normalizes entmeta (#8025)
This fixes a unit test failure over in enterprise due to https://github.com/hashicorp/consul/pull/7384
2020-06-05 10:41:39 +02:00
R.B. Boyer 3ad570ba99
server: don't activate federation state replication or anti-entropy until all servers are running 1.8.0+ (#8014) 2020-06-04 16:05:27 -05:00
Hans Hasselberg 7f14d3ac8a
tests: use constructor instead init (#8024) 2020-06-04 22:59:06 +02:00
Pierre Souchay 7cd5477c3c
checks: when a service does not exists in an alias, consider it failing (#7384)
In current implementation of Consul, check alias cannot determine
if a service exists or not. Because a service without any check
is semantically considered as passing, so when no healthchecks
are found for an agent, the check was considered as passing.

But this make little sense as the current implementation does not
make any difference between:
 * a non-existing service (passing)
 * a service without any check (passing as well)

In order to make it work, we have to ensure that when a check did
not find any healthcheck, the service does indeed exists. If it
does not, lets consider the check as failing.
2020-06-04 14:50:52 +02:00
Hans Hasselberg dd8cd9bc24
Merge pull request #7966 from hashicorp/pool_improvements
Agent connection pool cleanup
2020-06-04 08:56:26 +02:00
Freddy f759a48726
Enable gateways to resolve hostnames to IPv4 addresses (#7999)
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.

If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.

Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
2020-06-03 15:28:45 -06:00
Matt Keeler 2c615807af
Fix legacy management tokens in unupgraded secondary dcs (#7908)
The ACL.GetPolicy RPC endpoint was supposed to return the “parent” policy and not always the default policy. In the case of legacy management tokens the parent policy was supposed to be “manage”. The result of us not sending this properly was that operations that required specifically a management token such as saving a snapshot would not work in secondary DCs until they were upgraded.
2020-06-03 11:22:22 -04:00
Matt Keeler 9fa9ec4ba0
Fix segfault due to race condition for checking server versions (#7957)
The ACL monitoring routine uses c.routers to check for server version updates. Therefore it needs to be started after initializing the routers.
2020-06-03 10:36:32 -04:00
Daniel Nephin e8a883e829
Replace goe/verify.Values with testify/require.Equal (#7993)
* testing: replace most goe/verify.Values with require.Equal

One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.

Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.

* Remove github.com/pascaldekloe/goe/verify

Reduce the number of assertion packages we use from 2 to 1
2020-06-02 12:41:25 -04:00
Alvin Huang 1cb06ce7f0
Merge pull request #7956 from hashicorp/update-master-to-1.8.0-beta2
Update master to 1.8.0 beta2
2020-06-01 16:52:19 -04:00
R.B. Boyer 7bd7895047
acl: allow auth methods created in the primary datacenter to optionally create global tokens (#7899) 2020-06-01 11:44:47 -05:00
R.B. Boyer 16db20b1f3
acl: remove the deprecated `acl_enforce_version_8` option (#7991)
Fixes #7292
2020-05-29 16:16:03 -05:00
Jono Sosulska 7a13c96a2a
Replace whitelist/blacklist terminology with allowlist/denylist (#7971)
* Replace whitelist/blacklist terminology with allowlist/denylist
2020-05-29 14:19:16 -04:00
Hans Hasselberg 1ed91cbdf6 pool: remove timeout parameter
Timeout was never used in a meaningful way by callers, which is why it
is now entirely internal to the pool.
2020-05-29 08:21:28 +02:00
Hans Hasselberg 5cda505495 pool: remove useTLS and ForceTLS
In the past TLS usage was enforced with these variables, but these days
this decision is made by TLSConfigurator and there is no reason to keep
using the variables.
2020-05-29 08:21:24 +02:00
Hans Hasselberg 9ef44ec3da pool: remove version
The version field has been used to decide which multiplexing to use. It
was introduced in 2457293dceec95ecd12ef4f01442e13710ea131a. But this is
6y ago and there is no need for this differentiation anymore.
2020-05-28 23:06:01 +02:00
hashicorp-ci 6b8bedd004 update bindata_assetfs.go 2020-05-28 14:39:37 -04:00
hashicorp-ci 634a0e630a update bindata_assetfs.go 2020-05-28 14:39:28 -04:00
Daniel Nephin ea6c2b2adc ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
Daniel Nephin e359b10f77
Merge pull request #7963 from hashicorp/dnephin/replace-lib-translate-keys
Replace lib.TranslateKeys with a mapstructure decode hook
2020-05-27 16:51:26 -04:00
Daniel Nephin 8f939da431 config: use the new HookTranslateKeys instead of lib.TranslateKeys
With the exception of CA provider config, which will be migrated at some
later time.
2020-05-27 16:24:47 -04:00
Daniel Nephin 644eb3b33a Add alias struct tags for new decode hook 2020-05-27 16:24:47 -04:00
R.B. Boyer 54c7f825d6
create lib/stringslice package (#7934) 2020-05-27 11:47:32 -05:00
R.B. Boyer 813d69622e
agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931)
The main fix here is to always union the `primary-gateways` list with
the list of mesh gateways in the primary returned from the replicated
federation states list. This will allow any replicated (incorrect) state
to be supplemented with user-configured (correct) state in the config
file. Eventually the game of random selection whack-a-mole will pick a
winning entry and re-replicate the latest federation states from the
primary. If the user-configured state is actually the incorrect one,
then the same eventual correct selection process will work in that case,
too.

The secondary fix is actually to finish making wanfed-via-mgws actually
work as originally designed. Once a secondary datacenter has replicated
federation states for the primary AND managed to stand up its own local
mesh gateways then all of the RPCs from a secondary to the primary
SHOULD go through two sets of mesh gateways to arrive in the consul
servers in the primary (one hop for the secondary datacenter's mesh
gateway, and one hop through the primary datacenter's mesh gateway).
This was neglected in the initial implementation. While everything
works, ideally we should treat communications that go around the mesh
gateways as just provided for bootstrapping purposes.

Now we heuristically use the success/failure history of the federation
state replicator goroutine loop to determine if our current mesh gateway
route is working as intended. If it is, we try using the local gateways,
and if those don't work we fall back on trying the primary via the union
of the replicated state and the go-discover configuration flags.

This can be improved slightly in the future by possibly initializing the
gateway choice to local on startup if we already have replicated state.
This PR does not address that improvement.

Fixes #7339
2020-05-27 11:31:10 -05:00
Raphaël Rondeau b799471e29
connect: fix endpoints clusterName when using cluster escape hatch (#7319)
```changelog
* fix(connect): fix endpoints clusterName when using cluster escape hatch
```
2020-05-26 10:57:22 +02:00
Pierre Souchay fa43d427cb
Stop all watches before shuting down anything dring shutdown. (#7526)
This will prevent watches from being triggered.

```changelog
* fix(agent):  stop all watches before shuting down
```
2020-05-26 10:01:49 +02:00
R.B. Boyer 7e42819a71
connect: ensure proxy-defaults protocol is used for upstreams (#7938) 2020-05-21 16:08:39 -05:00
Kyle Havlovitz 5aefdea1a8
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Daniel Nephin 6604e72c6d
Merge pull request #7933 from hashicorp/dnephin/state-txn-missing-errors
state: fix unhandled error
2020-05-20 17:00:20 -04:00
Daniel Nephin f9a89db86e
Update agent/consul/state/catalog.go
Co-authored-by: Hans Hasselberg <me@hans.io>
2020-05-20 16:34:14 -04:00
Seth Hoenig 7ea75263ea
grpc: use default resolver scheme for grpc dialing (#7617)
Currently checks of type gRPC will emit log messages such as,

    2020/02/12 13:48:22 [INFO] parsed scheme: ""
    2020/02/12 13:48:22 [INFO] scheme "" not registered, fallback to default scheme

Without adding full support for using custom gRPC schemes (maybe that's
right long-term path) we can just supply the default scheme as provided
by the grpc library.

Fixes https://github.com/hashicorp/consul/issues/7274
and https://github.com/hashicorp/nomad/issues/7415
2020-05-20 22:26:26 +02:00
Daniel Nephin e1e1c13b35 state: use an error to indicate compare failed
Errors are values. We can use the error value to identify the 'comparison failed' case which makes the function easier to use and should make it harder to miss handle the error case
2020-05-20 12:43:33 -04:00
Pierre Souchay eda40d8824
tests: added unit test to ensure watches are not re-triggered on consul reload (#7449)
This ensures no regression about https://github.com/hashicorp/consul/issues/7318
And ensure that https://github.com/hashicorp/consul/issues/7446 cannot happen anymore
2020-05-20 12:38:29 +02:00
Pierre Souchay 3b548f0d77
Allow to restrict servers that can join a given Serf Consul cluster. (#7628)
Based on work done in https://github.com/hashicorp/memberlist/pull/196
this allows to restrict the IP ranges that can join a given Serf cluster
and be a member of the cluster.

Restrictions on IPs can be done separatly using 2 new differents flags
and config options to restrict IPs for LAN and WAN Serf.
2020-05-20 11:31:19 +02:00
Daniel Nephin 6e3a7b0aa8 consul/state: refactor tnxService to avoid missed cases
Handling errors at the end of a log switch/case block is somewhat
brittle. This block included a couple cases where errors were ignored,
but it was not obvious the way it was written.

This change moves all error handling into each case block. There is
still potentially one case where err is ignored, which will be handled
in a follow up.
2020-05-19 16:50:14 -04:00
Daniel Nephin 3e76ebbd49 Remove unused var
The usage was removed in 8e22d80e3550592a32144f2d1c84ebf6becb957b,
however it seems there may be a bug here because the cluster name
is not updated when the target changes.
2020-05-19 16:50:14 -04:00
Daniel Nephin ff99f6466f Handle error from template.Execute
Refactored the function to make the problem more obvious, by using a
guard.
2020-05-19 16:50:14 -04:00
Daniel Nephin 545bd766e7 Fix a number of problems found by staticcheck
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2020-05-19 16:50:14 -04:00
Daniel Nephin aa8009ee45 Remove unused var
The usage of this var was removed in b92f895c233bf8a99ae35361117a90416fea29b5.

Found by using staticcheck
2020-05-19 16:50:14 -04:00
Chris Piraino 0995906b24 Add service id context to the proxycfg logger
This is especially useful when multiple proxies are all querying the
same Consul agent.
2020-05-18 09:08:05 -05:00
Chris Piraino ce099c9aca Do not return an error if requested service is not a gateway
This commit converts the previous error into just a Warn-level log
message. By returning an error when the requested service was not a
gateway, we did not appropriately update envoy because the cache Fetch
returned an error and thus did not propagate the update through proxycfg
and xds packages.
2020-05-18 09:08:04 -05:00
Aleksandr Zagaevskiy 75f0607d3b
Preserve ModifyIndex for unchanged entry in KVS TXN (#7832) 2020-05-14 13:25:04 -06:00
Pierre Souchay 9611a266a2
tests: fix unstable test `TestAgentAntiEntropy_Checks`. (#7594)
Example of failure: https://circleci.com/gh/hashicorp/consul/153932#tests/containers/2
2020-05-14 09:54:49 +02:00
Kit Patella a33791c66a http: migrate from instrumentation in s.wrap() to an s.enterpriseHandler() 2020-05-13 15:47:05 -07:00
Matt Keeler 849eedd142
Fix identity resolution on clients and in secondary dcs (#7862)
Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests. 

With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.
2020-05-13 13:00:08 -04:00
Chris Piraino 3eb4e4012a
Make new gateway tests compatible with enterprise (#7856) 2020-05-12 13:48:20 -05:00
Daniel Nephin 2e0f750f1a Add unconvert linter
To find unnecessary type convertions
2020-05-12 13:47:25 -04:00
Drew Bailey 7e1734b1c6 Value is already an int, remove type cast 2020-05-12 13:13:09 -04:00
Daniel Nephin 0985eff717
Merge pull request #7689 from hashicorp/dnephin/remove-deadcode-1
Remove some dead code
2020-05-12 12:33:59 -04:00
Daniel Nephin ad2979ca45
Merge pull request #7819 from hashicorp/dnephin/remove-t.Parallel-1
test: Remove t.Parallel() from agent/structs tests
2020-05-12 12:11:57 -04:00
R.B. Boyer 940e5ad160
acl: add auth method for JWTs (#7846) 2020-05-11 20:59:29 -05:00
Kit Patella 24175e2925
Merge pull request #7843 from hashicorp/oss-sync/auditing-config
agent/config: Fix tests & include Audit struct as a pointer on Config
2020-05-11 14:23:44 -07:00
Kit Patella a4ee78440b agent/config: include Audit struct as a pointer on Config, fix tests 2020-05-11 14:13:05 -07:00
Kit Patella a45b67b40f
Merge pull request #7841 from hashicorp/oss-sync/auditing-config
OSS sync - Auditing config
2020-05-11 13:44:38 -07:00
Kit Patella f82a97f8c8 agent/config: add auditing config to OSS and add to enterpriseConfigMap exclusions 2020-05-11 13:27:35 -07:00
Chris Piraino 1173b31949
Return early from updateGatewayServices if nothing to update (#7838)
* Return early from updateGatewayServices if nothing to update

Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table

* PR comment and better formatting
2020-05-11 14:46:48 -05:00
Chris Piraino 1a7c99cf31
Fix TestInternal_GatewayServiceDump_Ingress (#7840)
Protocol was added as a field on GatewayServices after
GatewayServiceDump PR branch was created.
2020-05-11 14:46:31 -05:00
R.B. Boyer c54211ad52
cli: ensure 'acl auth-method update' doesn't deep merge the Config field (#7839) 2020-05-11 14:21:17 -05:00
Chris Piraino 107c7a9ca7 PR comment and better formatting 2020-05-11 14:04:59 -05:00
Chris Piraino 9f924400e0 Return early from updateGatewayServices if nothing to update
Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table
2020-05-11 12:38:04 -05:00
Freddy ebbb234ecb
Gateway Services Nodes UI Endpoint (#7685)
The endpoint supports queries for both Ingress Gateways and Terminating Gateways. Used to display a gateway's linked services in the UI.
2020-05-11 11:35:17 -06:00
Kyle Havlovitz 28b4819882
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Kyle Havlovitz dc1112eaf8 Disallow the blanket wildcard prefix from being used as custom host 2020-05-08 20:24:18 -07:00
Chris Piraino 3931015a90 Remove development log line 2020-05-08 20:24:18 -07:00
Chris Piraino a500262a77 Compute all valid DNSSANs for ingress gateways
For DNSSANs we take into account the following and compute the
appropriate wildcard values:
- source datacenter
- namespaces
- alt domains
2020-05-08 20:23:17 -07:00
Daniel Nephin ce3aeb85f5 Add outlier_detection check to integration test
Fix decoding of time.Duration types.
2020-05-08 14:56:57 -04:00
Daniel Nephin 987875524a xds: Add passive health check config for upstreams 2020-05-08 14:56:57 -04:00
Chris Piraino a635e23f86
Restoring config entries updates the gateway-services table (#7811)
- Adds a new validateConfigEntryEnterprise function
- Also fixes some state store tests that were failing in enterprise
2020-05-08 13:24:33 -05:00
Daniel Nephin c2499418ed test: Remove t.Parallel() from agent/structs tests
go test will only run tests in parallel within a single package. In this case the package test run time is exactly the same with or without t.Parallel() (~0.7s).

In generally we should avoid t.Parallel() as it causes a number of problems with `go test` not reporting failure messages correctly. I encountered one of these problems, which is what prompted this change.  Since `t.Parallel` is not providing any benefit in this package, this commit removes it.

The change was automated with:

    git grep -l 't.Parallel' | xargs sed -i -e '/t.Parallel/d'
2020-05-08 14:06:10 -04:00
Freddy a37d7a42c9
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Jono Sosulska 44011c81f2
Fix spelling of deregister (#7804) 2020-05-08 10:03:45 -04:00
Chris Piraino 2d657c3c0f
Allow ingress gateways to send empty clusters, routes, and listeners (#7795)
This is useful when updating an config entry with no services, and the
expected behavior is that envoy closes all listeners and clusters.

We also allow empty routes because ingress gateways name route
configurations based on the port of the listener, so it is important we
remove any stale routes. Then, if a new listener with an old port is
added, we will not have to deal with stale routes hanging around routing
to the wrong place.

Endpoints are associated with clusters, and thus by deleting the
clusters we don't have to care about sending empty endpoint responses.
2020-05-07 16:19:25 -05:00
Chris Piraino 964e55e45e Cleanup proxycfg for TLS
- Use correct enterprise metadata for finding config entry
- nil out cancel functions on config snapshot copy
- Look at HostsSet when checking validity
2020-05-07 10:22:57 -05:00
Chris Piraino ad8a0544f2
Require individual services in ingress entry to match protocols (#7774)
We require any non-wildcard services to match the protocol defined in
the listener on write, so that we can maintain a consistent experience
through ingress gateways. This also helps guard against accidental
misconfiguration by a user.

- Update tests that require an updated protocol for ingress gateways
2020-05-06 16:09:24 -05:00
Freddy a749f46316
Remove timeout and call to Fatal from goroutine (#7797) 2020-05-06 14:33:17 -06:00
Chris Piraino bebf1d5df9 Add TLS field to ingress API structs
- Adds test in api and command/config/write packages
2020-05-06 15:12:02 -05:00
Chris Piraino 2a10984efb Add test for adding DNSSAN for ConnectCALeaf cache type 2020-05-06 15:12:02 -05:00
Chris Piraino 91586b9228 Validate hosts input in ingress gateway config entry
We can only allow host names that are valid domain names because we put
these hosts into a DNSSAN. In addition, we validate that the wildcard
specifier '*' is only present as the leftmost label to allow for a
wildcard DNSSAN and associated wildcard Host routing in the ingress
gateway proxy.
2020-05-06 15:12:02 -05:00
Kyle Havlovitz bd6bb3bf2d Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Chris Piraino ac115e39b2 A proxy-default config entry only exists in the default namespace 2020-05-06 15:06:14 -05:00
Chris Piraino ff501ffb40 Correctly set a namespace label in the required domain for xds routes
If an upstream is not in the default namespace, we expect DNS requests
to be served over "<service-name>.ingress.<namespace>.*"
2020-05-06 15:06:14 -05:00
Chris Piraino 9a130f2ccc Remove outdated comment 2020-05-06 15:06:14 -05:00
Chris Piraino 21e0ed8f8d Better document wildcard specifier interactions 2020-05-06 15:06:14 -05:00
Chris Piraino d8ac79041d Re-add comment on connect-proxy virtual hosts 2020-05-06 15:06:14 -05:00
Kyle Havlovitz 26533dcb09 Make sure IngressHosts isn't parsed during JSON decode 2020-05-06 15:06:14 -05:00
Chris Piraino c30a7bfdfe Comment why it is ok to expect upstreams slice to not be empty 2020-05-06 15:06:13 -05:00
Chris Piraino 9b9dfa8eeb xds: Use only the port number as the configured route name
This removes duplication of protocol from the stats_prefix
2020-05-06 15:06:13 -05:00
Kyle Havlovitz 04b6bd637a Filter wildcard gateway services to match listener protocol
This now requires some type of protocol setting in ingress gateway tests
to ensure the services are not filtered out.

- small refactor to add a max(x, y) function
- Use internal configEntryTxn function and add MaxUint64 to lib
2020-05-06 15:06:13 -05:00
Chris Piraino 210dda5682 Allow Hosts field to be set on an ingress config entry
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
2020-05-06 15:06:13 -05:00
Chris Piraino 837d2aa7d2 Remove service_subset field from ingress config entry
We decided that this was not a useful MVP feature, and just added
unnecessary complexity
2020-05-06 15:06:13 -05:00
Kyle Havlovitz e4268c8b7f Support multiple listeners referencing the same service in gateway definitions 2020-05-06 15:06:13 -05:00
Kyle Havlovitz b21cd112e5 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
R.B. Boyer 1187d7288e
acl: oss plumbing to support auth method namespace rules in enterprise (#7794)
This includes website docs updates.
2020-05-06 13:48:04 -05:00
R.B. Boyer b6cc92020d
test: make the kube auth method test helper use freeport (#7788) 2020-05-05 16:55:21 -05:00
Hans Hasselberg e3e2b82a00 network_segments: stop advertising segment tags 2020-05-05 21:32:05 +02:00
Hans Hasselberg 854aac510f agent: refactor to use a single addrFn 2020-05-05 21:08:10 +02:00
Hans Hasselberg 0f2e189012 agent: rename local/global to src/dst 2020-05-05 21:07:34 +02:00
Chris Piraino 837bd6f558
Construct a default destination if one does not exist for service-router (#7783) 2020-05-05 10:49:50 -05:00
R.B. Boyer c9c557477b
acl: add MaxTokenTTL field to auth methods (#7779)
When set to a non zero value it will limit the ExpirationTime of all
tokens created via the auth method.
2020-05-04 17:02:57 -05:00
R.B. Boyer 265d2ea9e1
acl: add DisplayName field to auth methods (#7769)
Also add a few missing acl fields in the api.
2020-05-04 15:18:25 -05:00
Hans Hasselberg 1be90e0fa1
agent: don't let left nodes hold onto their node-id (#7747) 2020-05-04 18:39:08 +02:00
Matt Keeler 669d22933e
Merge pull request #7714 from hashicorp/oss-sync/msp-agent-token 2020-05-04 11:33:50 -04:00
Matt Keeler 64baf36b60
Update enterprise configurations to be in OSS
This will emit warnings about the configs not doing anything but still allow them to be parsed.

This also added the warnings for enterprise fields that we already had in OSS but didn’t change their enforcement behavior. For example, attempting to use a network segment will cause a hard error in OSS.
2020-05-04 10:21:05 -04:00
R.B. Boyer 3ac5a841ec
acl: refactor the authmethod.Validator interface (#7760)
This is a collection of refactors that make upcoming PRs easier to digest.

The main change is the introduction of the authmethod.Identity struct.
In the one and only current auth method (type=kubernetes) all of the
trusted identity attributes are both selectable and projectable, so they
were just passed around as a map[string]string.

When namespaces were added, this was slightly changed so that the
enterprise metadata can also come back from the login operation, so
login now returned two fields.

Now with some upcoming auth methods it won't be true that all identity
attributes will be both selectable and projectable, so rather than
update the login function to return 3 pieces of data it seemed worth it
to wrap those fields up and give them a proper name.
2020-05-01 17:35:28 -05:00
R.B. Boyer 4cd1d62e40
acl: change authmethod.Validator to take a logger (#7758) 2020-05-01 15:55:26 -05:00
R.B. Boyer 4a630135b8
test: move some test helpers over from enterprise (#7754) 2020-05-01 14:52:15 -05:00
R.B. Boyer 9faf8c42d1
sdk: extracting testutil.RequireErrorContains from various places it was duplicated (#7753) 2020-05-01 11:56:34 -05:00
Hans Hasselberg 6626cb69d6
rpc: oss changes for network area connection pooling (#7735) 2020-04-30 22:12:17 +02:00
Freddy c34ee5d339
Watch fallback channel for gateways that do not exist (#7715)
Also ensure that WatchSets in tests are reset between calls to watchFired. 
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
2020-04-29 16:52:27 -06:00
Matt Keeler fbb5c28c80
Updates to allow for using an enterprise specific token as the agents token
This is needed to allow for managed Consul instances to register themselves in the catalog with one of the managed service provider tokens.
2020-04-28 09:44:26 -04:00
Matt Keeler 901d6739ad
Some boilerplate to allow for ACL Bootstrap disabling configurability 2020-04-28 09:42:46 -04:00
Freddy f5c1e5268b
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv 0ad47bdda5 Remove fallthrough 2020-04-27 12:00:14 -06:00
freddygv ac809ec635 Add authz filter when creating filterchain 2020-04-27 11:08:41 -06:00
freddygv 3a44359d7b Fix snapshot IsEmpty 2020-04-27 11:08:41 -06:00
freddygv e751b83a3f Clean up dead code, issue addressed by passing ws to serviceGatewayNodes 2020-04-27 11:08:41 -06:00
Freddy a54f2fd055 Update agent/proxycfg/state_test.go 2020-04-27 11:08:41 -06:00
freddygv e30d64289d PR comments 2020-04-27 11:08:41 -06:00
freddygv 75e737b0f2 Fix internal endpoint test 2020-04-27 11:08:41 -06:00
freddygv 8697ab71e4 Fix listener error handling 2020-04-27 11:08:41 -06:00
freddygv 6ecb3b7a42 Skip filter chain creation if no client cert 2020-04-27 11:08:41 -06:00
freddygv 7667567688 Avoid deleting mappings for services linked to other gateways on dereg 2020-04-27 11:08:41 -06:00
freddygv 28fe6920fe Re-fix bug in CheckConnectServiceNodes 2020-04-27 11:08:41 -06:00
freddygv 5e260b2913 Use golden files for gateway certs and fix listener test flakiness 2020-04-27 11:08:41 -06:00
freddygv b2b5942f4b Un-nest switch in gateway update handler 2020-04-27 11:08:40 -06:00
freddygv 3849b066a6 Allow terminating-gateway to setup listener before servicegroups are known 2020-04-27 11:08:40 -06:00
freddygv 929491c979 Add subset support 2020-04-27 11:08:40 -06:00
freddygv bab101107c Fix ConnectQueryBlocking test 2020-04-27 11:08:40 -06:00
freddygv 65e60d02f1 Fix bug in CheckConnectServiceNodes
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.

In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
2020-04-27 11:08:40 -06:00
freddygv 2e35a9bb18 Add xds cluster/listener/endpoint management 2020-04-27 11:08:40 -06:00
freddygv c80f89b92f Add proxycfg state management for terminating-gateways 2020-04-27 11:07:06 -06:00
freddygv 243c015c52 Require service:read to read terminating-gateway config 2020-04-27 11:07:06 -06:00
Matt Keeler 4b1b42cef5
A couple testing helper updates (#7694) 2020-04-27 12:17:38 -04:00
Kit Patella 82b0fbd975
Merge pull request #7699 from pierresouchay/fix_comment_misplaced
Fixed comment on wrong line
2020-04-24 10:09:58 -07:00
Chris Piraino b8a5fbf1bf Allow ingress gateways to route through mesh gateways
- Adds integration test for mesh gateways local + remote modes with ingress
- ingress golden files updated for mesh gateway endpoints
2020-04-24 09:31:32 -05:00
Chris Piraino af5cc8fd92 Add all the xds ingress tests
This commit copies many of the connect-proxy xds testcases and reuses
for ingress gateways. This allows us to more easily see changes to the
envoy configuration when make updates to ingress gateways.
2020-04-24 09:31:32 -05:00
Chris Piraino b131b857da Pull out setupTestVariationConfigEntriesAndSnapshot in proxycfg
This allows us to reuse the same variations for ingress gateway testing
2020-04-24 09:31:32 -05:00
Kyle Havlovitz d433679bd8 Add http routing support and integration test to ingress gateways 2020-04-24 09:31:32 -05:00
Hans Hasselberg ef88089924
auto_encrypt: add validations for auto_encrypt.{tls,allow_tls} (#7704)
Fixes https://github.com/hashicorp/consul/issues/7407.
2020-04-24 15:51:38 +02:00
Pierre Souchay 36a0a976d5 Fixed comment on wrong line.
While investigating and fixing an issue on our 1.5.1 branch,
I saw you also/already fixed the bug I found (tags not updated
for existing servers), but comment is misplaced.
2020-04-24 01:15:15 +02:00
Freddy d46ef80751
Fix check deletion in anti-entropy sync (#7690)
* Incorporate entMeta into service equality check
2020-04-23 10:16:50 -06:00
Daniel Nephin 7bc3d09857 Remove deadcode
This UnmarshalJSON was never called. The decode function is passed a map[string]interface
so it has no way of knowing that this function exists.

Tested by adding a panic to this function and watching the tests pass.

I attempted to use this Unmarshal function by passing in the type, however the tests
showed that it does not work. The test was failing to parse the request.

If the performance of this endpoint is indeed critical we can solve the problem by adding
all the fields to the request struct and handling the normalziation without a custom Unmarshal.
2020-04-22 16:48:28 -04:00
Daniel Nephin abed523b60 agent: remove deadcode that called lib.TranslateKeys
Move the last remaining function from agent/config.go to the one place
it was called.
2020-04-22 13:41:43 -04:00
Chris Piraino c4123791a9
Expect default enterprise metadata in gateway tests (#7664)
This makes it so that both OSS and enterprise tests pass correctly

In the api tests, explicitly set namespace to empty string so that tests
can be shared.
2020-04-20 09:02:35 -05:00
Kit Patella 0c4492ae80 http: rename paresTokenResolveProxy to parseTokenWithDefault 2020-04-17 13:35:24 -07:00
Kit Patella 2b95bd7ca9
Merge pull request #7656 from hashicorp/feature/audit/oss-merge
agent: stub out auditing functionality in OSS
2020-04-17 13:33:06 -07:00
Kit Patella 313c662f8c agent,config: port enterprise only fields to embedded enterprise structs 2020-04-17 13:27:39 -07:00
Daniel Nephin 57f700d6da
Merge pull request #7641 from hashicorp/dnephin/agent-cache-request-info
agent/cache: reduce function arguments by removing duplicates
2020-04-17 14:10:49 -04:00
Chris Piraino c5ab43ebbc
Fix bug where non-typical services are associated with gateways (#7662)
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.
2020-04-17 11:24:34 -05:00
Daniel Nephin 3114943f8d agent/cache: remove error return from fetch
A previous change removed the only error, so the return value can be
removed now.
2020-04-17 11:55:01 -04:00
Daniel Nephin d015d3c563 agent/cache: reduce function arguments by removing duplicates
A few of the unexported functions in agent/cache took a large number of
arguments. These arguments were effectively overrides for values that
were provided in RequestInfo.

By using a struct we can not only reduce the number of arguments, but
also simplify the logic by removing the need for overrides.
2020-04-17 11:35:07 -04:00
Kit Patella 7aaedd8ab5 config/runtime: fix an extra field in config sanitize 2020-04-16 16:37:25 -07:00
Daniel Nephin 1251c01b73 agent/cache: Make all cache options RegisterOptions
Previously the SupportsBlocking option was specified by a method on the
type, and all the other options were specified from RegisterOptions.

This change moves RegisterOptions to a method on the type, and moves
SupportsBlocking into the options struct.

Currently there are only 2 cache-types. So all cache-types can implement
this method by embedding a struct with those predefined values. In the
future if a cache type needs to be registered more than once with different
options it can remove the embedded type and implement the method in a way
that allows for paramaterization.
2020-04-16 18:56:34 -04:00
Kit Patella c3d24d7c3e agent: stub out auditing functionality in OSS 2020-04-16 15:07:52 -07:00
Kyle Havlovitz 6a5eba63ab
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Daniel Nephin 6ade136abf agent/structs: Remove ServiceID.Init and CheckID.Init
The Init method provided the same functionality as the New constructor.
The constructor is both more widely used, and more idiomatic, so remove
the Init method.

This change is in preparation for fixing printing of these IDs.
2020-04-15 12:09:56 -04:00
sasha 8afa406177
add DNSSAN and IPSAN to cache key (#7597) 2020-04-15 10:11:11 -05:00
Matt Keeler 1e70ffee76
Update the Client code to use the common version checking infra… (#7558)
Also reduce the log level of some version checking messages on the server as they can be pretty noisy during upgrades and really are more for debugging purposes.
2020-04-14 11:54:27 -04:00
Matt Keeler 1332628b67
Allow the bootstrap endpoint to be disabled in enterprise. (#7614) 2020-04-14 11:45:39 -04:00
Daniel Nephin fb31212de7 Remove TTL from cacheEntryExpiry
This should very slightly reduce the amount of memory required to store each item in
the cache.

It will also enable setting different TTLs based on the type of result. For example
we may want to use a shorter TTL when the result indicates the resource does not exist,
as storing these types of records could easily lead to a DOS caused by
OOM.
2020-04-13 13:10:38 -04:00
Daniel Nephin 371cf05340 agent/cache: Reduce differences between notify implementations
These two notify functions are very similar. There appear to be just
enough differences that trying to parameterize the differences may not
improve things.

For now, reduce some of the cosmetic differences so that the material
differences are more obvious.
2020-04-13 13:10:38 -04:00
Daniel Nephin 4d398d26ae agent/cache: Inline the refresh function to make recursion more obvious
fetch is already an exceptionally long function, but hiding the
recrusion in a function call likely does not help.
2020-04-13 13:10:38 -04:00
Daniel Nephin 98ef66e70a agent/cache: Make the return values of getEntryLocked more obvious
Use named returned so that the caller has a better idea of what these
bools mean.

Return early to reduce the scope, and make it more obvious what values
are returned in which cases. Also reduces the number of conditional
expressions in each case.
2020-04-13 13:10:38 -04:00
Daniel Nephin cef60d1547 agent/cache: Small formatting improvements to improve readability
Remove Cache.entryKey which called a single function.
Format multiline struct creation one field per line.
2020-04-13 12:34:11 -04:00
Daniel Nephin 85411d6b54 Remove SnapshotRPC passthrough
The caller has access to the delegate, so we do not gain anything by
wrapping the call in Agent.
2020-04-13 12:32:57 -04:00
Daniel Nephin bf2a6452f1
Merge pull request #7596 from hashicorp/dnephin/agent-cache-type-entry
agent/cache: move typeEntry lookup to the edge
2020-04-13 12:24:07 -04:00
Pierre Souchay 2e6cd9e11a
fix flaky TestReplication_FederationStates test due to race conditions (#7612)
The test had two racy bugs related to memdb references.

The first was when we initially populated data and retained the FederationState objects in a slice. Due to how the `inmemCodec` works these were actually the identical objects passed into memdb.

The second was that the `checkSame` assertion function was reading from memdb and setting the RaftIndexes to zeros to aid in equality checks. This was mutating the contents of memdb which is a no-no.

With this fix, the command:
```
i=0; while /usr/local/bin/go test -count=1 -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do i=$((i + 1)); printf "$i "; done
```
That used to break on my machine in less than 20 runs is now running 150+ times without any issue.

Might also fix #7575
2020-04-09 15:42:41 -05:00
Pierre Souchay 071d4ee395
tests: change default http_max_conns_per_client to 250 to ease tests (#7625)
On recent Mac OS versions, the ulimit defaults to 256 by default, but many
systems (eg: some Linux distributions) often limit this value to 1024.

On validation of configuration, Consul now validates that the number of
allowed files descriptors is bigger than http_max_conns_per_client.

This make some unit tests failing on Mac OS.
Use a less important value in unit test, so tests runs well by default
on Mac OS without need for tuning the OS.
2020-04-09 11:11:42 +02:00
Freddy c1f79c6b3c
Terminating gateway discovery (#7571)
* Enable discovering terminating gateways

* Add TerminatingGatewayServices to state store

* Use GatewayServices RPC endpoint for ingress/terminating
2020-04-08 12:37:24 -06:00
Freddy f5b9688336
Add decode rules for Expose cfg in service-defaults (#7611) 2020-04-07 19:37:47 -06:00
Matt Keeler 42f02e80c3
Enable filtering language support for the v1/connect/intentions… (#7593)
* Enable filtering language support for the v1/connect/intentions listing API

* Update website for filtering of Intentions

* Update website/source/api/connect/intentions.html.md
2020-04-07 11:48:44 -04:00
Daniel Nephin 72e2695986
Merge pull request #7598 from pierresouchay/preallocation_of_dns_meta
Pre-allocations of DNS meta to avoid several allocations
2020-04-06 14:00:32 -04:00
Pierre Souchay a7fbf003c1
[LINT] Close resp.Body to avoid linter complaining (#7600) 2020-04-06 09:11:04 -04:00
Pierre Souchay 91b3510821 Pre-allocations of DNS meta to avoid several allocations 2020-04-05 11:12:41 +02:00
Daniel Nephin ab068325da agent/cache: move typeEntry lookup to the edge
This change moves all the typeEntry lookups to the first step in the exported methods,
and makes unexporter internals accept the typeEntry struct.

This change is primarily intended to make it easier to extract the container of caches
from the Cache type.

It may incidentally reduce locking in fetch, but that was not a goal.
2020-04-03 16:01:56 -04:00
Pierre Souchay 5f9f86a327 Fixed unstable test TestForwardSignals()
Sometimes, in the CI, it could receive a SIGURG, producing this line:

  FAIL: TestForwardSignals/signal-interrupt (0.06s)
        util_test.go:286: expected to read line "signal: interrupt" but got "signal: urgent I/O condition"

Only forward the signals we test to avoid this kind of false positive

Example of such unstable errors in CI:
https://circleci.com/gh/hashicorp/consul/153571
2020-04-03 14:23:03 +02:00
Pierre Souchay 984583d980
tests: more tolerance to latency for unstable test `TestCacheNotifyPolling()`. (#7574) 2020-04-03 10:29:38 +02:00
Matt Keeler 5d0e661203
Ensure that token clone copies the roles (#7577) 2020-04-02 12:09:35 -04:00
Chris Piraino d7a870fd32
Fix flapping of mesh gateway connect-service watches (#7575) 2020-04-02 10:12:13 -05:00
Pierre Souchay 2b8da952a8
agent: show warning when enable_script_checks is enabled without safty net (#7437)
In order to enforce a bit security on Consul agents, add a new method in agent
to highlight possible security issues.

This does not return an error for now, but might in the future.

For now, it detects issues such as:

https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/

This would display this kind of messages:

```
2020-03-11T18:27:49.873+0100 [ERROR] agent: [SECURITY] issue: error="using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead see https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/"
```
2020-04-02 09:59:23 +02:00
Andy Lindeman 0d1d5d0863
agent: rewrite checks with proxy address, not local service address (#7518)
Exposing checks is supposed to allow a Consul agent bound to a different
IP address (e.g., in a different Kubernetes pod) to access healthchecks
through the proxy while the underlying service binds to localhost. This
is an important security feature that makes sure no external traffic
reaches the service except through the proxy.

However, as far as I can tell, this is subtly broken in the case where
the Consul agent cannot reach the proxy over localhost.

If a proxy is configured with: `{ LocalServiceAddress: "127.0.0.1",
Checks: true }`, as is typical with a sidecar proxy, the Consul checks
are currently rewritten to `127.0.0.1:<random port>`. A Consul agent
that does not share the loopback address cannot reach this address. Just
to make sure I was not misunderstanding, I tried configuring the proxy
with `{ LocalServiceAddress: "<pod ip>", Checks: true }`. In this case,
while the checks are rewritten as expected and the agent can reach the
dynamic port, the proxy can no longer reach its backend because the
traffic is no longer on the loopback interface.

I think rewriting the checks to use `proxy.Address`, the proxy's own
address, is more correct in this case. That is the IP where the proxy
can be reached, both by other proxies and by a Consul agent running on
a different IP. The local service address should continue to use
`127.0.0.1` in most cases.
2020-04-02 09:35:43 +02:00
Andy Lindeman 42224fe45c
proxycfg: support path exposed with non-HTTP2 protocol (#7510)
If a proxied service is a gRPC or HTTP2 service, but a path is exposed
using the HTTP1 or TCP protocol, Envoy should not be configured with
`http2ProtocolOptions` for the cluster backing the path.

A situation where this comes up is a gRPC service whose healthcheck or
metrics route (e.g. for Prometheus) is an HTTP1 service running on
a different port. Previously, if these were exposed either using
`Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would
still be configured to communicate with the path over HTTP2, which would
not work properly.
2020-04-02 09:35:04 +02:00
Pierre Souchay 3b5e72913e
config: validate system limits against limits.http_max_conns_per_client (#7434)
I spent some time today on my local Mac to figure out why Consul 1.6.3+
was not accepting limits.http_max_conns_per_client.

This adds an explicit check on number of file descriptors to be sure
it might work (this is no guarantee as if many clients are reaching
the agent, it might consume even more file descriptors)

Anyway, many users are fighting with RLIMIT_NOFILE, having a clear
message would allow them to figure out what to fix.

Example of message (reload or start):

```
2020-03-11T16:38:37.062+0100 [ERROR] agent: Error starting agent: error="system allows a max of 512 file descriptors, but limits.http_max_conns_per_client: 8192 needs at least 8212"
```
2020-04-02 09:22:17 +02:00
Shaker Islam d8ac493395
docs: document exported functions in agent.go (closes #7101) (#7366)
and fix one linter error
2020-04-01 22:52:23 +02:00
Pierre Souchay 96d7229bd9 [FIX BUILD] fix build due to merge of #7562
Due to merge #7562, upstream does not compile anymore.

Error is:

ERRO Running error: gofmt: analysis skipped: errors in package: [/Users/p.souchay/go/src/github.com/hashicorp/consul/agent/config_endpoint_test.go:188:33: too many arguments]
2020-04-01 18:29:45 +02:00
Daniel Nephin bc02b8fbe6
Merge pull request #7562 from hashicorp/dnephin/remove-tname-from-name
testing: Remove old default value from NewTestAgent() calls
2020-04-01 11:48:45 -04:00
Daniel Nephin 8d7c21b255
Merge pull request #7533 from hashicorp/dnephin/xds-server-1
agent/xds: small cleanup
2020-04-01 11:24:50 -04:00
Emre Savcı 7a99f29adc
agent: add len, cap while initializing arrays 2020-04-01 10:54:51 +02:00
Daniel Nephin 09c6ac8b92 Rename NewTestAgentWithFields to StartTestAgent
This function now only starts the agent.

Using:

git grep -l 'StartTestAgent(t, true,' | \
        xargs sed -i -e 's/StartTestAgent(t, true,/StartTestAgent(t,/g'
2020-03-31 17:14:55 -04:00
Daniel Nephin d623dcbd01 Convert the remaining calls to NewTestAgentWithFields
After removing the t.Name() parameter with sed, convert the last few tests which
use a custom name to call NewTestAgentWithFields instead.
2020-03-31 17:14:55 -04:00
Daniel Nephin 428dd566b9
Merge pull request #7470 from hashicorp/dnephin/dns-unused-params
dns: Remove a few unused function parameters
2020-03-31 16:56:19 -04:00
Pierre Souchay 5a6abf4d68
config: allow running `consul agent -dev -ui-dir=some_path` (#7525)
When run in with `-dev` in DevMode, it is not possible to replace
the embeded UI with another one because `-dev` implies `-ui`.

This commit allows this an slightly change the error message
about Consul 0.7.0 which is very old and does not apply to
current version anyway.
2020-03-31 22:36:20 +02:00
Daniel Nephin 8b6877febd Remove name from NewTestAgent
Using:

git grep -l 'NewTestAgent(t, t.Name(),' | \
    xargs sed -i -e 's/NewTestAgent(t, t.Name(),/NewTestAgent(t,/g'
2020-03-31 16:13:44 -04:00
Freddy 8a1e53754e
Add config entry for terminating gateways (#7545)
This config entry will be used to configure terminating gateways.

It accepts the name of the gateway and a list of services the gateway will represent.

For each service users will be able to specify: its name, namespace, and additional options for TLS origination.

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-03-31 13:27:32 -06:00
Kyle Havlovitz 01a23b8eb4
Add config entry/state for Ingress Gateways (#7483)
* Add Ingress gateway config entry and other relevant structs

* Add api package tests for ingress gateways

* Embed EnterpriseMeta into ingress service struct

* Add namespace fields to api module and test consul config write decoding

* Don't require a port for ingress gateways

* Add snakeJSON and camelJSON cases in command test

* Run Normalize on service's ent metadata

Sadly cannot think of a way to test this in OSS.

* Every protocol requires at least 1 service

* Validate ingress protocols

* Update agent/structs/config_entry_gateways.go

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-03-31 11:59:10 -05:00
Daniel Nephin db1fb95f21
Merge pull request #7485 from hashicorp/dnephin/do-not-skip-tests-on-ci
ci: Make it harder to accidentally skip tests on CI, and doc why some are skipped
2020-03-31 11:15:44 -04:00
Daniel Nephin 96c4a35de7 Remove t.Name() from TestAgent.Name
And re-add the name to the logger so that log messages from different agents
in a single can be identified.
2020-03-30 16:47:24 -04:00
Daniel Nephin fe027ac766 Document Agent.LogOutput 2020-03-30 14:32:13 -04:00
Daniel Nephin 823295fe2a testing: reduce verbosity of output log
Previously the log output included the test name twice and a long date
format. The test output is already grouped by test, so adding the test
name did not add any new information. The date and time are only useful
to understand elapsed time, so using a short format should provide
succident detail.

Also fixed a bug in NewTestAgentWithFields where nil was returned
instead of the test agent.
2020-03-30 13:23:13 -04:00
Daniel Nephin 6d612abbde Remove unused token parameter 2020-03-27 17:57:16 -04:00
Daniel Nephin d29c47c420 A little less 'just' 2020-03-27 16:08:25 -04:00
Daniel Nephin 1e59c6b03e Remove unused customEDSClusterJSON 2020-03-27 15:38:16 -04:00
Matt Keeler 35c8e996c3
Ensure server requirements checks are done against ALL known se… (#7491)
Co-authored-by: Paul Banks <banks@banksco.de>
2020-03-27 12:31:43 -04:00
Matt Keeler ac78be97f4
Add information about which services are proxied to ui services… (#7417) 2020-03-27 10:57:46 -04:00
Daniel Nephin a2eb66963c
Merge pull request #7516 from hashicorp/dnephin/remove-unused-method
agent: Remove unused method Encrypted from delegate interface
2020-03-26 14:17:58 -04:00
Daniel Nephin ebb851f32d agent: Remove unused Encrypted from interface
It appears to be unused. It looks like it has been around a while,
I geuss at some point we stopped using this method.
2020-03-26 12:34:31 -04:00
Freddy cb55fa3742
Enable CLI to register terminating gateways (#7500)
* Enable CLI to register terminating gateways

* Centralize gateway proxy configuration
2020-03-26 10:20:56 -06:00
Daniel Nephin 02cacf8128
Merge pull request #7498 from hashicorp/dnephin/small-cleanup
envoy: small cleanup in cmd and server
2020-03-25 13:24:44 -04:00
Alejandro Baez 7d68d7eaa6
Add PolicyReadByName for API (#6615) 2020-03-25 10:34:24 -04:00
Chris Piraino 0c5c97205f
Fix flakey health check reload test (#7490)
This test would occasionally fail because we checked for a status of
"critical" initially. This races with the actual healthcheck being run
and declared passing.

We instead use a ttl health check so that we don't rely on timing at all.
2020-03-25 09:09:13 -05:00
Daniel Nephin f994bc9157 agent: Remove xdsServer field
The field is only referenced from a single method, it can be a local var
2020-03-24 18:05:14 -04:00
Daniel Nephin 38ec02e022 dns: Remove a few unused params 2020-03-24 15:56:41 -04:00
Daniel Nephin 8b6e07d960 ci: Run all connect/ca tests from the integration suite
To reduce the chance of some tests not being run because it does not
match the regex passed to '-run'.

Also document why some tests are allowed to be skipped on CI.
2020-03-24 15:22:01 -04:00
Daniel Nephin dc983db333 ci: Do not skip tests because of missing binaries on CI
If the CI environment is not correct for running tests the tests
should fail, so that we don't accidentally stop running some tests
because of a change to our CI environment.

Also removed a duplicate delcaration from init. I believe one was
overriding the other as they are both in the same package.
2020-03-24 14:34:13 -04:00
Kim Ngo 9e8eb7896f
agent/xds: Update mesh gateway to use service router timeout (#7444)
* website/connect/proxy/envoy: specify timeout precedence for services behind mesh gateway
2020-03-17 14:50:14 -05:00
Matt Keeler 58e2969fc1
Fix ACL mode advertisement and detection (#7451)
These changes are necessary to ensure advertisement happens correctly even when datacenters are connected via network areas in Consul enterprise.

This also changes how we check if ACLs can be upgraded within the local datacenter. Previously we would iterate through all LAN members. Now we just use the ServerLookup type to iterate through all known servers in the DC.
2020-03-16 12:54:45 -04:00
Freddy 8a7ff69b19
Update MSP token and filtering (#7431) 2020-03-11 12:08:49 -06:00
Hans Hasselberg 6a55f70fa6
tls: remove old ciphers (#7282)
Following advice from:
https://github.com/ssllabs/research/wiki/SSL-and-TLS-Deployment-Best-Practices, this PR removes old ciphers.
2020-03-10 21:44:26 +01:00
R.B. Boyer 10d3ff9a4f
server: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#7419)
Fixes #7414
2020-03-10 11:15:22 -05:00
Kyle Havlovitz 520d464c85
Merge pull request #7373 from hashicorp/acl-segments-fix
Add stub methods for ACL/segment bug fix from enterprise
2020-03-09 14:25:49 -07:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Matt Keeler 9753e28333
Gather instance counts of aggregated services (#7415) 2020-03-09 11:56:19 -04:00
Pierre Souchay 49dc891737
agent: configuration reload preserves check's statuses for services (#7345)
This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

1. save all healthcheck statuses
2. cleanup everything
3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1
2020-03-09 12:59:41 +01:00
Hans Hasselberg eb8bdc372e
docs: add docs for kv_max_value_size (#7405)
Apart from the added docs, the error messages are similar now and are
pointing to the corresponding options.
Fixes #6708.
2020-03-09 11:13:40 +01:00
Kim Ngo ab8a3b8044
agent/txn_endpoint: configure max txn request length (#7388)
configure max transaction size separately from kv limit
2020-03-05 15:42:37 -06:00
Matt Keeler b684138882 Fix session backwards incompatibility with 1.6.x and earlier. 2020-03-05 15:34:55 -05:00
John Cowen 22dcee74ac
Adds http_config.response_headers to the UI headers plus tests (#7369) 2020-03-03 13:18:35 +00:00
Pierre Souchay 274b3b1520
agent: take Prometheus MIME-type header into account (#7371)
This will avoid adding format=prometheus in request and to parse
more easily metrics using Prometheus.
This commit follows https://github.com/hashicorp/consul/pull/6514 as
the PR has been closed and extends it by accepting old Prometheus
mime-type.
2020-03-03 14:18:19 +01:00
Kyle Havlovitz b05ebe2507 Add stub methods for ACL/segment bug fix from enterprise 2020-03-02 10:30:23 -08:00
Hans Hasselberg 322367d476
tls: support tls 1.3 (#7325) 2020-02-19 23:22:31 +01:00
Matt Keeler 13572cea8d
Properly detect no alt domain set (#7323) 2020-02-19 14:41:43 -05:00
Matt Keeler 154eafe140
xDS Mesh Gateway Resolver Subset Fixes (#7294)
* xDS Mesh Gateway Resolver Subset Fixes

The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied.

The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters.

* Refactor the gateway endpoint generation to be a little easier to read
2020-02-19 11:57:55 -05:00
rerorero b366a25179
fix: Destroying a session that doesn't exist returns status cod… (#6905)
fix #6840
2020-02-18 11:13:15 -05:00
Wim 8e1ac56ef7
Fix high cpu usage with IPv6 recursor address. Closes #6120 (#6128) 2020-02-18 11:09:11 -05:00
Chris Piraino 4ec01900cd
Fixes envoy config when both RetryOn* values are set (#7280) 2020-02-18 09:25:47 -06:00
Lars Lehtonen 83a3136b5a
agent/proxycfg: fix dropped error in state.initWatchesMeshGateway() (#7267) 2020-02-18 14:41:01 +01:00
Matt Keeler be0d6efac9
Allow the PolicyResolve and RoleResolve endpoints to process na… (#7296) 2020-02-13 14:55:27 -05:00
Hans Hasselberg 2bba591906
agent: sensible keyring error (#7272)
Fixes #7231. Before an agent would always emit a warning when there is
an encrypt key in the configuration and an existing keyring stored,
which is happening on restart.

Now it only emits that warning when the encrypt key from the
configuration is not part of the keyring.
2020-02-13 20:35:09 +01:00
Hans Hasselberg 0d043f045b
config: increase http_max_conns_per_client default to 200 (#7289) 2020-02-13 16:27:33 +01:00
R.B. Boyer 0e152672a1
avoid 'panic: Log in goroutine after TestCacheGet_refreshAge has completed' (#7276) 2020-02-12 10:01:51 -06:00
R.B. Boyer 919741838d
fix use of hclog logger (#7264) 2020-02-12 09:37:16 -06:00
Matt Keeler e82b3d8747
Merge branch 'master' of github.com:hashicorp/consul 2020-02-11 11:54:58 -05:00
hashicorp-ci 62abcfacc9
update bindata_assetfs.go 2020-02-11 15:19:16 +00:00
ShimmerGlass a27ccc7248
agent: add server raft.{last,applied}_index gauges (#6694)
These metrics are useful for :
* Tracking the rate of update to the db
* Allow to have a rough idea of when an index originated
2020-02-11 10:50:18 +01:00
gaoxinge 5ac14438b0
tests: convert windows style path to posix style path to avoid hcl parsing error (#6351) 2020-02-11 10:13:31 +01:00
Matt Keeler 7f610f275d
Make the config entry and leaf cert cache types ns aware (#7256) 2020-02-10 19:26:01 -05:00
Hans Hasselberg 71ce832990
connect: add validations around intermediate cert ttl (#7213) 2020-02-11 00:05:49 +01:00
R.B. Boyer c37d00791c
make the TestRPC_RPCMaxConnsPerClient test less flaky (#7255) 2020-02-10 15:13:53 -06:00
Sarah Christoff 85d2714c76
Fix flaky TestAutopilot_BootstrapExpect (#7242) 2020-02-10 14:52:58 -06:00
Kit Patella d28bc1acbe
rpc: measure blocking queries (#7224)
* agent: measure blocking queries

* agent.rpc: update docs to mention we only record blocking queries

* agent.rpc: make go fmt happy

* agent.rpc: fix non-atomic read and decrement with bitwise xor of uint64 0

* agent.rpc: clarify review question

* agent.rpc: today I learned that one must declare all variables before interacting with goto labels

* Update agent/consul/server.go

agent.rpc: more precise comment on `Server.queriesBlocking`

Co-Authored-By: Paul Banks <banks@banksco.de>

* Update website/source/docs/agent/telemetry.html.md

agent.rpc: improve queries_blocking description

Co-Authored-By: Paul Banks <banks@banksco.de>

* agent.rpc: fix some bugs found in review

* add a note about the updated counter behavior to telemetry.md

* docs: add upgrade-specific note on consul.rpc.quer{y,ies_blocking} behavior

Co-authored-by: Paul Banks <banks@banksco.de>
2020-02-10 10:01:15 -08:00
Akshay Ganeshen fd32016ce9
feat: support sending body in HTTP checks (#6602) 2020-02-10 09:27:12 -07:00
Matt Keeler 83426d7a9f
OSS Changes for agent local state namespace testing (#7250) 2020-02-10 11:25:12 -05:00
Matt Keeler 966d085066
Catalog + Namespace OSS changes. (#7219)
* Various Prepared Query + Namespace things

* Last round of OSS changes for a namespaced catalog
2020-02-10 10:40:44 -05:00
R.B. Boyer b4325dfbce
agent: ensure that we always use the same settings for msgpack (#7245)
We set RawToString=true so that []uint8 => string when decoding an interface{}.
We set the MapType so that map[interface{}]interface{} decodes to map[string]interface{}.

Add tests to ensure that this doesn't break existing usages.

Fixes #7223
2020-02-07 15:50:24 -06:00
Freddy aca8b85440
Remove outdated TODO (#7244) 2020-02-07 13:14:48 -07:00
Matt Keeler f610d1d791
Fix a bug with ACL enforcement of reads on namespaced config entries. (#7239) 2020-02-07 08:30:40 -05:00
Kit Patella aa9db3f903
agent/consul server: fix LeaderTest_ChangeNodeID (#7236)
* fix LeaderTest_ChangeNodeID to use StatusLeft and add waitForAnyLANLeave

* unextract the waitFor... fn, simplify, and provide a more descriptive error
2020-02-06 16:37:53 -08:00
Matt Keeler 2524a028ea
OSS Changes for various config entry namespacing bugs (#7226) 2020-02-06 10:52:25 -05:00
Hans Hasselberg 107e8523a8
agent: ensure node info sync and full sync. (#7189)
This fixes #7020.

There are two problems this PR solves:
  * if the node info changes it is highly likely to get service and check registration permission errors unless those service tokens have node:write. Hopefully services you register don’t have this permission.
  * the timer for a full sync gets reset for every partial sync which means that many partial syncs are preventing a full sync from happening

Instead of syncing node info last, after services and checks, and possibly saving one RPC because it is included in every service sync, I am syncing node info first. It is only ever going to be a single RPC that we are only doing when node info has changed. This way we are guaranteed to sync node info even when something goes wrong with services or checks which is more likely because there are more syncs happening for them.
2020-02-06 15:30:58 +01:00
R.B. Boyer a67001aa22
agent: differentiate wan vs lan loggers in memberlist and serf (#7205)
This should be a helpful change until memberlist and serf can be
properly switched to native hclog.
2020-02-05 09:52:43 -06:00
Matt Keeler 119168203b
Fix disco chain graph validation for namespaces (#7217)
Previously this happened to be validating only the chains in the default namespace. Now it will validate all chains in all namespaces when the global proxy-defaults is changed.
2020-02-05 10:06:27 -05:00
Matt Keeler 3621f7090b
Minor Non-Functional Updates (#7215)
* Cleanup the discovery chain compilation route handling

Nothing functionally should be different here. The real difference is that when creating new targets or handling route destinations we use the router config entries name and namespace instead of that of the top level request. Today they SHOULD always be the same but that may not always be the case. This hopefully also makes it easier to understand how the router entries are handled.

* Refactor a small bit of the service manager tests in oss

We used to use the stringHash function to compute part of the filename where things would get persisted to. This has been changed in the core code to calling the StringHash method on the ServiceID type. It just so happens that the new method will output the same value for anything in the default namespace (by design actually). However, logically this filename computation in the test should do the same thing as the core code itself so I updated it here.

Also of note is that newer enterprise-only tests for the service manager cannot use the old stringHash function at all because it will produce incorrect results for non-default namespaces.
2020-02-05 10:06:11 -05:00
Freddy 67e02a0752
Add managed service provider token (#7218)
Stubs for enterprise-only ACL token to be used by managed service providers.
2020-02-04 13:58:56 -07:00
Hans Hasselberg a9f9ed83cb
agent: increase watchLimit to 8192. (#7200)
The previous value was too conservative and users with many instances
were having problems because of it. This change increases the limit to
8192 which reportedly fixed most of the issues with that.

Related: #4984, #4986, #5050.
2020-02-04 13:11:30 +01:00
Matt Keeler 111cb51fc8
Testing updates to support namespaced testing of the agent/xds… (#7185)
* Various testing updates to support namespaced testing of the agent/xds package

* agent/proxycfg package updates to support better namespace testing
2020-02-03 09:26:47 -05:00
Davor Kapsa c280dd8549
auto_encrypt: check previously ignored error (#6604) 2020-02-03 10:35:11 +01:00
hashicorp-ci 5378fd9bf8 update bindata_assetfs.go 2020-01-31 21:38:38 +00:00
Hans Hasselberg 50281032e0
Security fixes (#7182)
* Mitigate HTTP/RPC Services Allow Unbounded Resource Usage

Fixes #7159.

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2020-01-31 11:19:37 -05:00
Matt Keeler 3e56f5c8b8
ACL enforcement for the agent/health/services endpoints (#7191)
ACL enforcement for the agent/health/services endpoints
2020-01-31 11:16:24 -05:00
R.B. Boyer 1d7e4f7de5
cli: improve the file safety of 'consul tls' subcommands (#7186)
- also fixing the signature of file.WriteAtomicWithPerms
2020-01-31 10:12:36 -06:00
Matt Keeler 4bf1c2e4f7
agent: add ACL enforcement to the v1/agent/health/service/* endpoints
This adds acl enforcement to the two endpoints that were missing it.

Note that in the case of getting a services health by its id, we still
must first lookup the service so we still "leak" information about a
service with that ID existing. There isn't really a way around it though
as ACLs are meant to check service names.
2020-01-31 09:57:38 -05:00
Matt Keeler 26bb1584c1
Updates to the Txn API for namespaces (#7172)
* Updates to the Txn API for namespaces

* Update agent/consul/txn_endpoint.go

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>

Co-authored-by: R.B. Boyer <public@richardboyer.net>
2020-01-30 13:12:26 -05:00
Matt Keeler 50092c28c0
Add some better waits to prevent CA is nil test flakes (#7171) 2020-01-29 22:23:11 -05:00
Matt Keeler 78ad3428c4
Small refactoring to move meta parsing into the switch statement (#7170) 2020-01-29 19:12:48 -05:00
Matt Keeler 25568626f7
Fix a couple bugs regarding intentions with namespaces (#7169) 2020-01-29 17:30:38 -05:00
Matt Keeler 3f253080a2
Sync some feature flag support from enterprise (#7167) 2020-01-29 13:21:38 -05:00
R.B. Boyer 01ebdff2a9
various tweaks on top of the hclog work (#7165) 2020-01-29 11:16:08 -06:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
Matt Keeler 3044b4bf89
Output proper HTTP status codes for Txn requests that are too large (#7157) 2020-01-28 16:22:40 -05:00
Kit Patella 49e9bbbdf9
Add accessorID of token when ops are denied by ACL system (#7117)
* agent: add and edit doc comments

* agent: add ACL token accessorID to debugging traces

* agent: polish acl debugging

* agent: minor fix + string fmt over value interp

* agent: undo export & fix logging field names

* agent: remove note and migrate up to code review

* Update agent/consul/acl.go

Co-Authored-By: Matt Keeler <mkeeler@users.noreply.github.com>

* agent: incorporate review feedback

* Update agent/acl.go

Co-Authored-By: R.B. Boyer <public@richardboyer.net>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: R.B. Boyer <public@richardboyer.net>
2020-01-27 11:54:32 -08:00
Anthony Scalisi 4b92c2deee fix spelling errors (#7135) 2020-01-27 07:00:33 -06:00
hashicorp-ci 2a0f0d5057
update bindata_assetfs.go 2020-01-24 17:08:21 +00:00
Matt Keeler 485a0a65ea
Updates to Config Entries and Connect for Namespaces (#7116) 2020-01-24 10:04:58 -05:00
Matt Keeler 90b9f87160
Add the v1/catalog/node-services/:node endpoint (#7115)
The backing RPC already existed but the endpoint will be useful for other service syncing processes such as consul-k8s as this endpoint can return all services registered with a node regardless of namespacing.
2020-01-24 09:27:25 -05:00
Chris Piraino db36928faa
Fix segfault when removing both a service and associated check (#7108)
* Fix segfault when removing both a service and associated check

updateSyncState creates entries in the services and checks maps for
remote services/checks that are not found locally, so that we can then
make sure to delete them in our reconciliation process. However, the
values added to the map are missing key fields that the rest of the code
expects to not be nil.

* Add comment stating Check field can be nil
2020-01-23 10:38:32 -06:00
R.B. Boyer b8801f2136
agent: default the primary_datacenter to the datacenter if not configured (#7111)
Something similar already happens inside of the server
(agent/consul/server.go) but by doing it in the general config parsing
for the agent we can have agent-level code rely on the PrimaryDatacenter
field, too.
2020-01-23 09:59:31 -06:00
Hans Hasselberg 5379cf7c67
raft: increase raft notify buffer. (#6863)
* Increase raft notify buffer.

Fixes https://github.com/hashicorp/consul/issues/6852.

Increasing the buffer helps recovering from leader flapping. It lowers
the chances of the flapping leader to get into a deadlock situation like
described in #6852.
2020-01-22 16:15:59 +01:00
Hans Hasselberg e00effa325
agent: setup grpc server with auto_encrypt certs and add -https-port (#7086)
* setup grpc server with TLS config used across consul.
* add -https-port flag
2020-01-22 11:32:17 +01:00
Hans Hasselberg f3a01e6a4a
connect: use correct subject key id for leaf certificates. (#7091) 2020-01-22 11:28:28 +01:00
R.B. Boyer ce7ab8abc1
make TestCatalogNodes_Blocking less flaky (#7074)
- Explicitly wait to start the test until the initial AE sync of the node.

- Run the blocking query in the main goroutine to cut down on possible
poor goroutine scheduling issues being to blame for delays.

- If the blocking query is woken up with no index change, rerun the
query. This may happen if the CI server is loaded and time dilation is
happening.
2020-01-21 14:58:50 -06:00
R.B. Boyer 791f7baa6b
test: ensure we don't ask vault to sign a leaf that outlives its CA when acting as a secondary (#7100) 2020-01-21 14:55:21 -06:00
Hans Hasselberg d52a4e3b82
tests: fix autopilot test (#7092) 2020-01-21 14:09:51 +01:00
Aestek 8c799447cf agent: remove service sidecars in Agent.cleanupRegistration (#7022)
Sidecar proxies were left behind when cleaning up after an unsuccessful
registration. There are now also removed when the service is cleanup up.
2020-01-20 14:01:40 +01:00
Hans Hasselberg 43392d5db3
raft: update raft to v1.1.2 (#7079)
* update raft
* use hclogger for raft.
2020-01-20 13:58:02 +01:00
Hans Hasselberg 315ba7d6ad
connect: check if intermediate cert needs to be renewed. (#6835)
Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates.
This PR adds a check that renews the cert if it is half way through its validity period.

In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.
2020-01-17 23:27:13 +01:00
Hans Hasselberg b6c83e06d5
auto_encrypt: set dns and ip san for k8s and provide configuration (#6944)
* Add CreateCSRWithSAN
* Use CreateCSRWithSAN in auto_encrypt and cache
* Copy DNSNames and IPAddresses to cert
* Verify auto_encrypt.sign returns cert with SAN
* provide configuration options for auto_encrypt dnssan and ipsan
* rename CreateCSRWithSAN to CreateCSR
2020-01-17 23:25:26 +01:00
Aestek 9329cbac0a Add support for dual stack IPv4/IPv6 network (#6640)
* Use consts for well known tagged adress keys

* Add ipv4 and ipv6 tagged addresses for node lan and wan

* Add ipv4 and ipv6 tagged addresses for service lan and wan

* Use IPv4 and IPv6 address in DNS
2020-01-17 09:54:17 -05:00
Aestek c35af89dfd agent: do not deregister service checks twice (#6168)
Deregistering a service from the catalog automatically deregisters its
checks, however the agent still performs a deregister call for each
service checks even after the service has been deregistered.
With ACLs enabled this results in logs like:
"message:consul: "Catalog.Deregister" RPC failed to server
server_ip:8300: rpc error making call: rpc error making call: Unknown
check 'check_id'"
This change removes associated checks from the agent state when
deregistering a service, which results in less calls to the servers and
supresses the error logs.
2020-01-17 14:26:53 +01:00
Matej Urbas d877e091d6 agent: configurable MaxQueryTime and DefaultQueryTime. (#3777) 2020-01-17 14:20:57 +01:00
Freddy f3ba6a9166
Update force-leave ACL requirement to operator:write (#7033) 2020-01-14 15:40:34 -07:00
Matt Keeler c8294b8595
AuthMethod updates to support alternate namespace logins (#7029) 2020-01-14 10:09:29 -05:00
Matt Keeler baa89c7c65
Intentions ACL enforcement updates (#7028)
* Renamed structs.IntentionWildcard to structs.WildcardSpecifier

* Refactor ACL Config

Get rid of remnants of enterprise only renaming.

Add a WildcardName field for specifying what string should be used to indicate a wildcard.

* Add wildcard support in the ACL package

For read operations they can call anyAllowed to determine if any read access to the given resource would be granted.

For write operations they can call allAllowed to ensure that write access is granted to everything.

* Make v1/agent/connect/authorize namespace aware

* Update intention ACL enforcement

This also changes how intention:read is granted. Before the Intention.List RPC would allow viewing an intention if the token had intention:read on the destination. However Intention.Match allowed viewing if access was allowed for either the source or dest side. Now Intention.List and Intention.Get fall in line with Intention.Matches previous behavior.

Due to this being done a few different places ACL enforcement for a singular intention is now done with the CanRead and CanWrite methods on the intention itself.

* Refactor Intention.Apply to make things easier to follow.
2020-01-13 15:51:40 -05:00
Pierre Souchay 61fc4f8253 rpc: log method when a server/server RPC call fails (#4548)
Sometimes, we have lots of errors in cross calls between DCs (several hundreds / sec)
Enrich the log in order to help diagnose the root cause of issue.
2020-01-13 19:55:29 +01:00
Matt Keeler 0b4bd016a9
Move where the service-resolver watch is done so that it happen… (#7025)
Before we were issuing 1 watch for every service in the services listing which would have caused the agent to process many more identical events simultaneously.
2020-01-10 10:30:13 -05:00
R.B. Boyer 20f51f9181 connect: derive connect certificate serial numbers from a memdb index instead of the provider table max index (#7011) 2020-01-09 16:32:19 +01:00
R.B. Boyer 446f0533cd connect: ensure that updates to the secondary root CA configuration use the correct signing key ID values for comparison (#7012)
Fixes #6886
2020-01-09 16:28:16 +01:00
Matt Keeler 421148f793
Move Session.CheckIDs into OSS only code. (#6993) 2020-01-03 15:51:19 -05:00
hashicorp-ci 8d53a33bf0
update bindata_assetfs.go 2019-12-20 17:16:51 +00:00
R.B. Boyer 42f80367be
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases (#6948)
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases

Namely one omission was that check updates for dumped services were not
unblocking.

Also adds a ServiceDump state store test and also fix a watch bug with the
normal dump.

Follow-on from #6916
2019-12-19 10:15:37 -06:00
Matt Keeler 6de4eb8569
OSS changes for implementing token based namespace inferencing
remove debug log
2019-12-18 14:07:08 -05:00
Matt Keeler 185654b075
Unflake the TestACLEndpoint_TokenList test
In order to do this I added a waitForLeaderEstablishment helper which does the right thing to ensure that leader establishment has finished.

fixup
2019-12-18 14:07:07 -05:00
Matt Keeler 8af12bf4f4
Miscellaneous acl package cleanup
• Renamed EnterpriseACLConfig to just Config
• Removed chained_authorizer_oss.go as it was empty
• Renamed acl.go to errors.go to more closely describe its contents
2019-12-18 13:44:32 -05:00
Matt Keeler bdf025a758
Rename EnterpriseAuthorizerContext -> AuthorizerContext 2019-12-18 13:43:24 -05:00
Matt Keeler af1d101937
OSS changes to allow for parsing the enterprise DNS config prop… (#6959) 2019-12-18 10:16:35 -05:00
Preetha f607a00138 autopilot: fix dead server removal condition to use correct failure tolerance (#4017)
* Make dead server removal condition in autopilot use correct failure tolerance rules
* Introduce func with explanation
2019-12-16 23:35:13 +01:00
Wim e3e56ff3c2 dns: fix memoryleak by upgrading outdated miekg/dns (#6748)
* Add updated github.com/miekg/dns to go modules
* Add updated github.com/miekg/dns to vendor
* Fix github.com/miekg/dns api breakage
* Decrease size when trimming UDP packets
Need more room for the header(?), if we don't decrease the size we get an
"overflow unpacking uint32" from the dns library
* Fix dns truncate tests with api changes
* Make windows build working again. Upgrade x/sys and x/crypto and vendor
This upgrade is needed because of API breakage in x/sys introduced
by the minimal x/sys dependency of miekg/dns

This API breakage has been fixed in commit
855e68c859
2019-12-16 22:31:27 +01:00
Hans Hasselberg ae23376218
acl: use constant time comparing to check token (#6943) 2019-12-16 21:54:52 +01:00
Matt Keeler 0c1a7970e3 ui: feature support templating for index.html (#6921) 2019-12-13 14:50:07 -05:00
hashicorp-ci 30b513fe19
update bindata_assetfs.go 2019-12-10 19:12:06 +00:00
Matt Keeler 9812b32155
Fix blocking for ServiceDumping by kind (#6919) 2019-12-10 13:58:30 -05:00
Matt Keeler 442924c35a
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
rerorero 3653855f13 [ci] fix: go-fmt fails on master branch (#6906) 2019-12-08 20:30:46 -05:00
Matt Keeler 81b5f9df02
Fix the TestAPI_CatalogRegistration test 2019-12-06 15:47:41 -05:00
Hans Hasselberg 368d5c643f
tls: auto_encrypt and verify_incoming (#6811) (#6899)
* relax requirements for auto_encrypt on server
* better error message when auto_encrypt and verify_incoming on
* docs: explain verify_incoming on Consul clients.
2019-12-06 21:36:13 +01:00
Hans Hasselberg a36e58c964
agent: fewer file local differences between enterprise and oss (#6820) (#6898)
* Increase number to test ignore. Consul Enterprise has more flags and since we are trying to reduce the differences between both code bases, we are increasing the number in oss. The semantics don't change, it is just a cosmetic thing.
* Introduce agent.initEnterprise for enterprise related hooks.
* Sync test with ent version.
* Fix import order.
* revert error wording.
2019-12-06 21:35:58 +01:00
Matt Keeler 609c9dab02
Miscellaneous Fixes (#6896)
Ensure we close the Sentinel Evaluator so as not to leak go routines

Fix a bunch of test logging so that various warnings when starting a test agent go to the ltest logger and not straight to stdout.

Various canned ent meta types always return a valid pointer (no more nils). This allows us to blindly deref + assign in various places.

Update ACL index tracking to ensure oss -> ent upgrades will work as expected.

Update ent meta parsing to include function to disallow wildcarding.
2019-12-06 14:01:34 -05:00
Matt Keeler b9996e6bbe
Add Namespace support to the API module and the CLI commands (#6874)
Also update the Docs and fixup the HTTP API to return proper errors when someone attempts to use Namespaces with an OSS agent.

Add Namespace HTTP API docs

Make all API endpoints disallow unknown fields
2019-12-06 11:14:56 -05:00
Matt Keeler c15c81a7ed
[Feature] API: Add a internal endpoint to query for ACL authori… (#6888)
* Implement endpoint to query whether the given token is authorized for a set of operations

* Updates to allow for remote ACL authorization via RPC

This is only used when making an authorization request to a different datacenter.
2019-12-06 09:25:26 -05:00
Matt Keeler 53d9319a82
Fix the TestLeader_SecondaryCA_IntermediateRefresh test flakine… (#6885)
Fix the TestLeader_SecondaryCA_IntermediateRefresh test flakiness
2019-12-05 09:35:45 -05:00
Hans Hasselberg ad65068d6e
tests: increase TLSHandshakeTimeout to help slow tests (#6864)
Fixes https://github.com/hashicorp/consul/issues/6858.
2019-12-05 13:20:07 +01:00
Matt Keeler f30af37d11
Fix the TestLeader_SecondaryCA_IntermediateRefresh test flakiness 2019-12-04 19:19:55 -05:00
Mike Morris 1fe6da2ad6
Bump go-discover to support EC2 Metadata Service v2 (#6865)
Refs https://github.com/hashicorp/go-discover/pull/128

* deps: add replace directive for gocheck

Transitive dep, source at https://launchpad.net/gocheck indicates
project moved. This also avoids a dependency on bzr when fetching
modules. Refs https://github.com/hashicorp/consul/pull/6818

* deps: make update-vendor

* test: update retry-join expected names from go-discover
2019-12-04 11:59:16 -05:00
Chris Piraino 2a95701341
Allow configuration of upstream connection limits in Envoy (#6829)
* Adds 'limits' field to the upstream configuration of a connect proxy

This allows a user to configure the envoy connect proxy with
'max_connections', 'max_queued_requests', and 'max_concurrent_requests'. These
values are defined in the local proxy on a per-service instance basis
and should thus NOT be thought of as a global-level or even service-level value.
2019-12-03 14:13:33 -06:00
Matt Keeler 7557225272
Fix dns service SRV lookup when service address is a fqdn (#6792)
Fix dns service SRV lookup when service address is a fqdn
2019-12-03 10:15:19 -05:00
Sarah Adams 1f5b333290
give feedback to CLI user on forceleave command if node does not exist (#6841) 2019-12-02 11:06:15 -08:00
R.B. Boyer a9343db838
xds: mesh gateway CDS requests are now allowed to receive an empty CDS reply (#6787)
This is the rest of the fix for #6543 that was incompletely fixed in #6576.
2019-11-26 15:55:13 -06:00
Matt Keeler 90ae4a1f1e
OSS KV Modifications to Support Namespaces 2019-11-25 12:57:35 -05:00
Matt Keeler 68d79142c4
OSS Modifications necessary for sessions namespacing 2019-11-25 12:07:04 -05:00
Paul Banks a84b82b3df
connect: Add AWS PCA provider (#6795)
* Update AWS SDK to use PCA features.

* Add AWS PCA provider

* Add plumbing for config, config validation tests, add test for inheriting existing CA resources created by user

* Unparallel the tests so we don't exhaust PCA limits

* Merge updates

* More aggressive polling; rate limit pass through on sign; Timeout on Sign and CA create

* Add AWS PCA docs

* Fix Vault doc typo too

* Doc typo

* Apply suggestions from code review

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>

* Doc fixes; tests for erroring if State is modified via API

* More review cleanup

* Uncomment tests!

* Minor suggested clean ups
2019-11-21 17:40:29 +00:00
Chris Piraino 9a85452787 test: unflake two TestHealthServiceNode_* tests
Replaces WaitForLeader with WaitForTestAgent. This waits to make sure
that the node itself is correctly registered in the catalog before
attempting additional registrations.
2019-11-18 16:21:01 -06:00
Chris Piraino 8c25fff329 test: unflake TestDNS_ServiceLookup_WanTranslation
Use retry.R struct to check length of WANMembers so that retries can
work appropriately.
2019-11-18 16:21:01 -06:00
Chris Piraino d7fc9f3d34 test: unflake TestCatalogServiceNodes_DistanceSort
Remove a time.Sleep and replace with retry.Run around call to
CatalogServiceNodes.
2019-11-18 16:21:01 -06:00
Paul Banks 9e17aa3b41
Change CA Configure struct to pass Datacenter through (#6775)
* Change CA Configure struct to pass Datacenter through

* Remove connect/ca/plugin as we don't have immediate plans to use it.

We still intend to one day but there are likely to be several changes to the CA provider interface before we do so it's better to rebuild from history when we do that work properly.

* Rename PrimaryDC; fix endpoint in secondary DCs
2019-11-18 14:22:19 +00:00
Matt Keeler b8391fa760
Finish the comment 2019-11-15 10:33:21 -05:00
Matt Keeler 036ab56f17
Track the correct check id for idempotent service/check updates 2019-11-14 11:30:44 -05:00
Nicolas Benoit e3e0b8c95d Fix dns service SRV lookup when service address is a fqdn
Refactor dns to have same behavior between A and SRV.
Current implementation returns the node name instead of the service
address.
With this fix when querying for SRV record service address is return in
the SRV record.
And when performing a simple dns lookup it returns a CNAME to the
service address.
2019-11-14 16:50:05 +01:00
Paul Banks 1197b43c7b
Support Connect CAs that can't cross sign (#6726)
* Support Connect CAs that can't cross sign

* revert spurios mod changes from make tools

* Add log warning when forcing CA rotation

* Fixup SupportsCrossSigning to report errors and work with Plugin interface (fixes tests)

* Fix failing snake_case test

* Remove misleading comment

* Revert "Remove misleading comment"

This reverts commit bc4db9cabed8ad5d0e39b30e1fe79196d248349c.

* Remove misleading comment

* Regen proto files messed up by rebase
2019-11-11 21:36:22 +00:00
Paul Banks ca96d5fa72
connect: Allow CA Providers to store small amount of state (#6751)
* pass logger through to provider

* test for proper operation of NeedsLogger

* remove public testServer function

* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault

* Fix all the other places in tests where we set the logger

* Allow CA Providers to persist some state

* Update CA provider plugin interface

* Fix plugin stubs to match provider changes

* Update agent/connect/ca/provider.go

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>

* Cleanup review comments
2019-11-11 20:57:16 +00:00
Todd Radel 19a3892f71 connect: Implement NeedsLogger interface for CA providers (#6556)
* add NeedsLogger to Provider interface

* implements NeedsLogger in default provider

* pass logger through to provider

* test for proper operation of NeedsLogger

* remove public testServer function

* Switch test to actually assert on logging output rather than reflection.

--amend

* Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault

* Fix all the other places in tests where we set the logger

* Add TODO comment
2019-11-11 20:30:01 +00:00
Todd Radel e100fda218 Make all Connect Cert Common Names valid FQDNs (#6423) 2019-11-11 17:11:54 +00:00
Matt Keeler 7081643191
Fill the Authz Context with a Sentinel Scope (#6729) 2019-11-01 17:05:22 -04:00
Matt Keeler ba9871d1c2
Fix type name (#6728) 2019-11-01 16:58:00 -04:00
Matt Keeler 7a2cee53c9
Add DirEntry method to fill enterprise authz context 2019-11-01 16:48:44 -04:00
Matt Keeler c71ea7056f
Miscellaneous fixes (#6727) 2019-11-01 16:11:44 -04:00
Ferenc Fabian 3ad20d8d5b Case sensitive Authorization header with lower-cased scheme in… (#6724) 2019-11-01 09:56:41 -04:00
Paul Banks 5f405c3277
Fix support for RSA CA keys in Connect. (#6638)
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.

* Ensure key type ad bits are populated from CA cert and clean up tests

* Add integration test and fix error when initializing secondary CA with RSA key.

* Add more tests, fix review feedback

* Update docs with key type config and output

* Apply suggestions from code review

Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
2019-11-01 13:20:26 +00:00
Matt Keeler a338357fa3
Fix the Synthetic Policy Tests (#6715) 2019-10-30 15:15:14 -04:00
Matt Keeler 21f98f426e
Add hook for validating the enterprise meta attached to a reque… (#6695) 2019-10-30 12:42:39 -04:00
Matt Keeler ae57e736d2
Add note about RPC multiplexing and TLS content type mutual exc… (#6698) 2019-10-30 09:24:30 -04:00
Matt Keeler c2d9041c0f
PreVerify acl:read access for listing endpoints (#6696)
We still will need to filter results based on the authorizer too but this helps to give an early 403.
2019-10-30 09:10:11 -04:00
Sarah Adams 7a4be7863d
Use encoding/json as JSON decoder instead of mapstructure (#6680)
Fixes #6147
2019-10-29 11:13:36 -07:00
Sarah Christoff 86b30bbfbe
Set MinQuorum variable in Autopilot (#6654)
* Add MinQuorum to Autopilot
2019-10-29 09:04:41 -05:00
Matt Keeler 0fc2c95255
More Replication Abstractions (#6689)
Also updated ACL replication to use a function to fill in the desired enterprise meta for all remote listing RPCs.
2019-10-28 13:49:57 -04:00
Matt Keeler 87c44a3b8d
Ensure that cache entries for tokens are prefixed “token-secret… (#6688)
This will be necessary once we store other types of identities in here.
2019-10-25 13:05:43 -04:00
Matt Keeler a688ea952d
Update the ACL Resolver to allow for Consul Enterprise specific hooks. (#6687) 2019-10-25 11:06:16 -04:00
Matt Keeler 1270a93274
Updates to allow for Namespacing ACL resources in Consul Enterp… (#6675)
Main Changes:

• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.

Secondary Changes:

• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
2019-10-24 14:38:09 -04:00
Sarah Adams 911bb78296
regression tests for existing agent/ decoding behavior (#6624)
tests for existing JSON decoding behavior
2019-10-22 15:26:24 -07:00
rerorero e210b0c854 fix: incorrect struct tag and WaitGroup usage (#6649)
* remove duplicated json tag

* fix: incorrect wait group usage
2019-10-18 13:59:29 -04:00
R.B. Boyer b091647090
agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576)
Fixes #6543

Also improved some of the proxycfg tests to cover snapshot validity
better.
2019-10-17 16:46:49 -05:00
R.B. Boyer 1ab04a8b6a
xds: tcp services using the discovery chain should not assume RDS during LDS (#6623)
Previously the logic for configuring RDS during LDS for L7 upstreams was
overapplied to TCP proxies resulting in a cluster name of <emptystring>
being used incorrectly.

Fixes #6621
2019-10-17 16:44:59 -05:00
Freddy caf658d0d3
Store check type in catalog (#6561) 2019-10-17 20:33:11 +02:00
R.B. Boyer e74a6c44f1
server: ensure the primary dc and ACL dc match (#6634)
This is mostly a sanity check for server tests that skip the normal
config builder equivalent fixup.
2019-10-17 10:57:17 -05:00
R.B. Boyer bc22eb8090
unflake TestLeader_SecondaryCA_Initialize (#6631) 2019-10-16 16:49:01 -05:00
R.B. Boyer 3ae748c7a4
fix flaky multidc acl tests that failed to wait for token replication (#6628)
If acls have not yet replicated to the secondary then authz requests
will be remotely resolved by the primary. Now these tests explicitly
wait until replication has caught up first.
2019-10-16 12:24:29 -05:00
R.B. Boyer a4c5b8e85c
appease the retry linter (#6629) 2019-10-16 11:39:22 -05:00
Paul Banks 979ad7fecb
Allow time for secondary CA to initialize (#6627) 2019-10-16 17:03:31 +01:00
Matt Keeler f9a43a1e2d
ACL Authorizer overhaul (#6620)
* ACL Authorizer overhaul

To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.

Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.

* Include stub for extra enterprise rules in the global management policy

* Allow for an upgrade of the global-management policy
2019-10-15 16:58:50 -04:00
PHBourquin 16ca8340c1 Checks to passing/critical only after reaching a consecutive success/failure threshold (#5739)
A check may be set to become passing/critical only if a specified number of successive
checks return passing/critical in a row. Status will stay identical as before until
the threshold is reached.
This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.
2019-10-14 21:49:49 +01:00
Sarah Christoff 6247ca7f0d
ui_content_path config option fix (#6601)
* fix ui-content-path config option
2019-10-09 09:14:48 -05:00
Hans Hasselberg 7e4017551a Do not surface left servers (#6420)
* do not surface left servers in catalog
2019-10-08 22:16:00 -05:00
R.B. Boyer 9a51ecc98b
agent: clients should only attempt to remove pruned nodes once per call (#6591) 2019-10-07 16:15:23 -05:00
Sarah Christoff 9b93dd93c9
Prune Unhealthy Agents (#6571)
* Add -prune flag to ForceLeave
2019-10-04 16:10:02 -05:00
R.B. Boyer fa9c567278
agent: updates to the agent token trigger anti-entropy full syncs (#6577) 2019-10-04 13:37:34 -05:00
Matt Keeler b0b57588d1
Implement Leader Routine Management (#6580)
* Implement leader routine manager

Switch over the following to use it for go routine management:

• Config entry Replication
• ACL replication - tokens, policies, roles and legacy tokens
• ACL legacy token upgrade
• ACL token reaping
• Intention Replication
• Secondary CA Roots Watching
• CA Root Pruning

Also added the StopAll call into the Server Shutdown method to ensure all leader routines get killed off when shutting down.

This should be mostly unnecessary as `revokeLeadership` should manually stop each one but just in case we really want these to go away (eventually).
2019-10-04 13:08:45 -04:00
Matt Keeler 29f0616708
Use encoding/json instead of jsonpb even for protobuf types (#6572)
This only works so long as we use simplistic protobuf types. Constructs such as oneof or Any types that require type annotations for decoding properly will fail hard but that is by design. If/when we want to use any of that we will probably need to consider a v2 API.
2019-10-02 15:32:15 -04:00
Matt Keeler 9bd378a95c
Add EnterpriseConfig stubs (#6566) 2019-10-01 14:34:55 -04:00
Matt Keeler cfa879d63c
Generate JSON and Binary Marshalers for Protobuf Types (#6564)
* Add JSON and Binary Marshaler Generators for Protobuf Types

* Generate files with the correct version of gogo/protobuf

I have pinned the version in the makefile so when you run make tools you get the right version. This pulls the version out of go.mod so it should remain up to date.

The version at the time of this commit we are using is v1.2.1

* Fixup some shell output

* Update how we determine the version of gogo
This just greps the go.mod file instead of expecting the go mod cache to already be present

* Fixup vendoring and remove no longer needed json encoder functions
2019-09-30 15:39:20 -04:00
John Cowen 338812f5c2
ui: UI Release Merge (ui-staging merge) (#6527)
## HTTPAdapter (#5637)

## Ember upgrade 2.18 > 3.12 (#6448)

### Proxies can no longer get away with not calling _super

This means that we can't use create anymore to define dynamic methods.
Therefore we dynamically make 2 extended Proxies on demand, and then
create from those. Therefore we can call _super in the init method of
the extended Proxies.

### We aren't allowed to reset a service anymore

We never actually need to now anyway, this is a remnant of the refactor
from browser based confirmations. We fix it as simply as possible here
but will revisit and remove the old browser confirm functionality at a
later date

### Revert classes to use ES5 style to workaround babel transp. probs

Using a mixture of ES6 classes (and hence super) and arrow functions
means that when babel transpiles the arrow functions down to ES5, a
reference to this is moved before the call to super, hence causing a js
error.

Furthermore, we the testing environment no longer lets use use
apply/call on the constructor.

These errors only manifests during testing (only in the testing
environment), the application itself runs fine with no problems without
this change.

Using ES5 style class definitions give us freedom to do all of the above
without causing any errors, so we reverted these classes back to ES5
class definitions

### Skip test that seems to have changed due to a change in RSVP timing

This test tests a usecase/area of the API that will probably never ever
be used, it was more testing out the API. We've skipped the test for now
as this doesn't affect the application itself, but left a note to come
back here later to investigate further

### Remove enumerableContentDidChange

Initial testing looks like we don't need to call this function anymore,
the function no longer exists

### Rework Changeset.isSaving to take into account new ember APIs

Setting/hanging a computedProperty of an instantiated object no longer
works. Move to setting it on the prototype/class definition instead

### Change how we detect whether something requires listening

New ember API's have changed how you can detect whether something is a
computedProperty or not. It's not immediately clear if its even possible
now. Therefore we change how we detect whether something should be
listened to or not by just looking for presence of `addEventListener`

### Potentially temporary change of ci test scripts to ensure deps exist

All our tooling scripts run through a Makefile (for people familiar with
only using those), which then call yarn scripts which can be called
independently (for people familar with only using yarn).

The Makefile targets always check to make sure all the dependencies are
installed before running anything that requires them (building, testing
etc).

The CI scripts/targets didn't follow this same route and called the yarn
scripts directly (usually CI builds a cache of the dependencies first).

For some reason this cache isn't doing what it usually does, and it
looks as though, in CI, ember isn't installed.

This commit makes the CI scripts consistently use the same method as all
of the other tooling scripts (Makefile target > Install Deps if
required > call yarn script). This should install the dependencies if
for some reason the CI cache building doesn't complete/isn't successful.

Potentially this commit may be reverted if, the root of the problem is
elsewhere, although consistency is always good, so it might be a good
idea to leave this commit as is even if we need to debug and fix things
elsewhere.

### Make test-parallel consistent with the rest of the tooling scripts

As we are here making changes for CI purposes (making test-ci
consistent), we spotted that test-parallel is also inconsistent and also
the README manual instructions won't work without `ember` installed
globally.

This commit makes everything consistent and changes the manual
instructions to use the local ember instance that gets installed via
yarn

### Re-wrangle catchable to fit with new ember 3.12 APIs

In the upgrade from ember 3.8 > 3.12 the public interfaces for
ComputedProperties have changed slightly. `meta` is no longer a public
property of ComputedProperty but of a ComputedDecoratorImpl mixin
instead.

7e4ba1096e/packages/%40ember/-internals/metal/lib/computed.ts (L725)

There seems to be no way, by just using publically available
methods, to replicate this behaviour so that we can create our own
'ComputedProperty` factory via injecting the ComputedProperty class as
we did previously.

3f333bada1/ui-v2/app/utils/computed/factory.js (L1-L18)

Instead we dynamically hang our `Catchable` `catch` method off the
instantiated ComputedProperty. In doing it like this `ComputedProperty`
has already has its `meta` method mixed in so we don't have to manually
mix it in ourselves (which doesn't seem possible)

This functionality is only used during our work in trying to ensure
our EventSource/BlockingQuery work was as 'ember-like' as possible (i.e.
using the traditional Route.model hooks and ember-like Controller
properties). Our ongoing/upcoming work on a componentized approach to
data a.k.a `<DataSource />` means we will be able to remove the majority
of the code involved here now that it seems to be under an amount of
flux in ember.

### Build bindata_assetfs.go with new UI changes
2019-09-30 14:47:49 +01:00
Matt Keeler 04dbd48ce5
Add support for parameterizing the ACL config used with a TestA… (#6559)
* Add support for parameterizing the ACL config used with a TestAgent

Using tokens that are UUIDs will get rid of some warnings

* Refactor to allow setting all tokens and change the template to ignore unset values.
2019-09-27 17:06:43 -04:00
R.B. Boyer 8433ef02a8
connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate (#6513)
This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
2019-09-26 11:54:14 -05:00
R.B. Boyer 55fdae203f
agent: cache notifications work after error if the underlying RPC returns index=1 (#6547)
Fixes #6521

Ensure that initial failures to fetch an agent cache entry using the
notify API where the underlying RPC returns a synthetic index of 1
correctly recovers when those RPCs resume working.

The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the
index for the next query from 0 to 1 for all queries, when it should
have not done so for queries that errored.

Also fixed some things that made debugging difficult:

- config entry read/list endpoints send back QueryMeta headers
- xds event loops don't swallow the cache notification errors
2019-09-26 10:42:17 -05:00
Matt Keeler 5b83f589da
Expand the QueryOptions and QueryMeta interfaces (#6545)
In a previous PR I made it so that we had interfaces that would work enough to allow blockingQueries to work. However to complete this we need all fields to be settable and gettable.

Notes:
   • If Go ever gets contracts/generics then we could get rid of all the Getters/Setters
   • protoc / protoc-gen-gogo are going to generate all the getters for us.
   • I copied all the getters/setters from the protobuf funcs into agent/structs/protobuf_compat.go
   • Also added JSON marshaling funcs that use jsonpb for protobuf types.
2019-09-26 09:55:02 -04:00
Freddy 5eace88ce2
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
R.B. Boyer 682b5370c9
agent: tolerate more failure scenarios during service registration with central config enabled (#6472)
Also:

* Finished threading replaceExistingChecks setting (from GH-4905)
  through service manager.

* Respected the original configSource value that was used to register a
  service or a check when restoring persisted data.

* Run several existing tests with and without central config enabled
  (not exhaustive yet).

* Switch to ioutil.ReadFile for all types of agent persistence.
2019-09-24 10:04:48 -05:00
Matt Keeler 8885c8d318
Allow for enterprise only leader routines (#6533)
Eventually I am thinking we may need a way to register these at different priority levels but for now sticking this here is fine
2019-09-23 20:09:56 -04:00
R.B. Boyer cc889443a5
connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492)
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.

The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
2019-09-23 12:52:35 -05:00
R.B. Boyer 1d54909333
connect: intermediate CA certs generated with the vault provider lack URI SANs (#6491)
This only affects vault versions >=1.1.1 because the prior code
accidentally relied upon a bug that was fixed in
https://github.com/hashicorp/vault/pull/6505

The existing tests should have caught this, but they were using a
vendored copy of vault version 0.10.3. This fixes the tests by running
an actual copy of vault instead of an in-process copy. This has the
added benefit of changing the dependency on vault to just vault/api.

Also update VaultProvider to use similar SetIntermediate validation code
as the ConsulProvider implementation.
2019-09-23 12:04:40 -05:00
Matt Keeler 8431c5f533
Add support for implementing new requests with protobufs instea… (#6502)
* Add build system support for protobuf generation

This is done generically so that we don’t have to keep updating the makefile to add another proto generation.

Note: anything not in the vendor directory and with a .proto extension will be run through protoc if the corresponding namespace.pb.go file is not up to date.

If you want to rebuild just a single proto file you can do so with: make proto-rebuild PROTOFILES=<list of proto files to rebuild>

Providing the PROTOFILES var will override the default behavior of finding all the .proto files.

* Start adding types to the agent/proto package

These will be needed for some other work and are by no means comprehensive.

* Add ability to resolve/fixup the agentpb.ACLLinks structure in the state store.

* Use protobuf marshalling of raft requests instead of msgpack for protoc generated types.

This does not change any encoding of existing types.

* Removed structs package automatically encoding with protobuf marshalling

Instead the caller of raftApply that wants to opt-in to protobuf encoding will have to call `raftApplyProtobuf`

* Run update-vendor to fixup modules.txt

Nothing changed as far as dependencies go but the ordering of modules in that file depends on the time they are first seen and its not alphabetical.

* Rename some things and implement the structs.RPCInfo interface bits

agentpb.QueryOptions and agentpb.WriteRequest implement 3 of the 4 RPCInfo funcs and the new TargetDatacenter message type implements the fourth.

* Use the right encoding function.

* Renamed agent/proto package to agent/agentpb to prevent package name conflicts

* Update modules.txt to fix ordering

* Change blockingQuery to take in interfaces for the query options and meta

* Add %T to error output.

* Add/Update some comments
2019-09-20 14:37:22 -04:00
R.B. Boyer 5c5f21088c sdk: add freelist tracking and ephemeral port range skipping to freeport
This should cut down on test flakiness.

Problems handled:

- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.

- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).

- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
2019-09-17 14:30:43 -05:00
R.B. Boyer edf5347d3c fix typo of 'unknown' in log messages 2019-09-13 15:59:49 -05:00
R.B. Boyer c17e417cc8 cache: remove data race in agent cache
In normal operations there is a read/write race related to request
QueryOptions fields. An example race:

    WARNING: DATA RACE
    Read at 0x00c000836950 by goroutine 30:
      github.com/hashicorp/consul/agent/structs.(*ServiceConfigRequest).CacheInfo()
          /go/src/github.com/hashicorp/consul/agent/structs/config_entry.go:506 +0x109
      github.com/hashicorp/consul/agent/cache.(*Cache).getWithIndex()
          /go/src/github.com/hashicorp/consul/agent/cache/cache.go:262 +0x5c
      github.com/hashicorp/consul/agent/cache.(*Cache).notifyBlockingQuery()
          /go/src/github.com/hashicorp/consul/agent/cache/watch.go:89 +0xd7

    Previous write at 0x00c000836950 by goroutine 147:
      github.com/hashicorp/consul/agent/cache-types.(*ResolvedServiceConfig).Fetch()
          /go/src/github.com/hashicorp/consul/agent/cache-types/resolved_service_config.go:31 +0x219
      github.com/hashicorp/consul/agent/cache.(*Cache).fetch.func1()
          /go/src/github.com/hashicorp/consul/agent/cache/cache.go:495 +0x112

This patch does a lightweight copy of the request struct so that the
embedded QueryOptions fields that are mutated during Fetch() are scoped
to just that one RPC.
2019-09-12 16:18:01 -05:00
hashicorp-ci 52bd29b337
update bindata_assetfs.go 2019-09-12 19:39:58 +00:00
Hans Hasselberg f025a7440d
agent: handleEnterpriseLeave (#6453) 2019-09-11 11:01:37 +02:00
R.B. Boyer 4aaaad089f
test: actually wait for the TestAgent to be fully shutdown (#6441) 2019-09-05 13:36:26 -05:00
Sarah Adams 8e673371df
test: ensure all TestAgent constructions use a constructor (#6443)
ensure all TestAgent constructions use a constructor to get start retries + test logs going to the right place

Fixes #6435
2019-09-05 10:24:36 -07:00
Sarah Adams c6c5f9c494
remove funky panic/recover in agent tests (#6442) 2019-09-04 13:59:11 -07:00
Sarah Adams f8fa10fecb
refactor & add better retry logic to NewTestAgent (#6363)
Fixes #6361
2019-09-03 15:05:51 -07:00
Pierre Souchay 6d13efa828 Distinguish between DC not existing and not being available (#6399) 2019-09-03 09:46:24 -06:00
Aestek 19c4459d19 Add option to register services and their checks idempotently (#4905) 2019-09-02 09:38:29 -06:00
Sarah Adams 2254633d93
txn: don't try to decode request bodies > raft.SuggestedMaxDataSize (#6422)
txn: don't try to decode request bodies > raft.SuggestedMaxDataSize
2019-08-30 10:41:25 -07:00
Matt Keeler 31d9d2e557
Store primaries root in secondary after intermediate signature (#6333)
* Store primaries root in secondary after intermediate signature

This ensures that the intermediate exists within the CA root stored in raft and not just in the CA provider state. This has the very nice benefit of actually outputting the intermediate cert within the ca roots HTTP/RPC endpoints.

This change means that if signing the intermediate fails it will not set the root within raft. So far I have not come up with a reason why that is bad. The secondary CA roots watch will pull the root again and go through all the motions. So as soon as getting an intermediate CA works the root will get set.

* Make TestAgentAntiEntropy_Check_DeferSync less flaky

I am not sure this is the full fix but it seems to help for me.
2019-08-30 11:38:46 -04:00
R.B. Boyer c5e1faaddb
test: ensure the node name is a valid dns name (#6424)
The space in the node name was making every test emit a useless warning.
2019-08-29 16:52:13 -05:00
R.B. Boyer 4a2867a814
test: explicitly run the pprof tests for 1s instead of the 30s default (#6421) 2019-08-29 12:06:50 -05:00
R.B. Boyer 0f29543315
test: add additional http status code assertions in coordinate HTTP API tests (#6410)
When this test flakes sometimes this happens:

    --- FAIL: TestCoordinate_Node (1.69s)
    panic: interface conversion: interface {} is nil, not structs.Coordinates [recovered]
    FAIL    github.com/hashicorp/consul/agent       19.999s
    Exit code: 1
    panic: interface conversion: interface {} is nil, not structs.Coordinates [recovered]
            panic: interface conversion: interface {} is nil, not structs.Coordinates

There is definitely a bug lurking, but the code seems to imply this can
only return nil on 404. The tests previously were not checking the
status code.

The underlying cause of the flake is unknown, but this should turn the
failure into a more normal test failure.
2019-08-29 09:55:05 -05:00
Pierre Souchay 35d90fc899 Display IPs of machines when node names conflict to ease troubleshooting
When there is an node name conflicts, such messages are displayed within Consul:

`consul.fsm: EnsureRegistration failed: failed inserting node: Error while renaming Node ID: "e1d456bc-f72d-98e5-ebb3-26ae80d785cf": Node name node001 is reserved by node 05f10209-1b9c-b90c-e3e2-059e64556d4a with name node001`

While it is easy to find the node that has reserved the name, it is hard to find
the node trying to aquire the name since it is not registered, because it
is not part of `consul members` output

This PR will display the IP of the offender and solve far more easily those issues.
2019-08-28 15:57:05 -04:00
Alvin Huang e4e9381851
revert commits on master (#6413) 2019-08-27 17:45:58 -04:00
tradel 2838a1550a update tests to match new method signatures 2019-08-27 14:16:39 -07:00
tradel 93c839b76c confi\gure providers with DC and domain 2019-08-27 14:16:25 -07:00
tradel 1acde6e30a create a common name for autoTLS agent certs 2019-08-27 14:15:53 -07:00
tradel 82544b64e5 add subject names to issued certs 2019-08-27 14:15:10 -07:00
tradel 1c9b271731 construct a common name for each CSR 2019-08-27 14:12:56 -07:00
tradel 8c733260cd add serviceID to leaf cert request 2019-08-27 14:12:22 -07:00
tradel b0bbcd8b94 add domain and nodeName to agent cert request 2019-08-27 14:11:40 -07:00
tradel 3dc47a9251 Added DC and domain args to Configure method 2019-08-27 14:09:01 -07:00
R.B. Boyer 1b3d066f90
test: send testagent logs through testing.Logf (#6411) 2019-08-27 12:21:30 -05:00
R.B. Boyer 0c5409d172
test: fix TestAgent.Start() to not segfault if the DNSServer cannot ListenAndServe (#6409)
The embedded `Server` field on a `DNSServer` is only set inside of the
`ListenAndServe` method. If that method fails for reasons like the
address being in use and is not bindable, then the `Server` field will
not be set and the overall `Agent.Start()` will fail.

This will trigger the inner loop of `TestAgent.Start()` to invoke
`ShutdownEndpoints` which will attempt to pretty print the DNS servers
using fields on that inner `Server` field. Because it was never set,
this causes a nil pointer dereference and crashes the test.
2019-08-27 10:45:05 -05:00
Alvin Huang 9662b7c01a
add nil pointer check for pointer to ACLToken struct (#6407) 2019-08-27 11:23:28 -04:00
Hans Hasselberg 3314dfd4ec
make sure auto_encrypt has private key type and bits (#6392) 2019-08-27 14:37:56 +02:00
Hans Hasselberg dee5a4ac51
auto_encrypt: verify_incoming_rpc is good enough for auto_encrypt.allow_tls (#6376)
Previously `verify_incoming` was required when turning on `auto_encrypt.allow_tls`, but that doesn't work together with HTTPS UI in some scenarios. Adding `verify_incoming_rpc` to the allowed configurations.
2019-08-27 14:36:36 +02:00
R.B. Boyer 09ce7e1220
test: don't leak agent goroutines in TestAgent_sidecarServiceFromNodeService (#6396)
A goroutine dump using runtime.Stack() before/after shows a drop from 121 => 4.
2019-08-26 15:19:59 -05:00
Hans Hasselberg 4f7a3e8fa8 make sure auto_encrypt has private key type and bits 2019-08-26 13:09:50 +02:00
hashicorp-ci a3ac04526a
update bindata_assetfs.go 2019-08-23 22:10:50 +00:00
R.B. Boyer 2d4a3b51d0
Merge pull request #6388 from hashicorp/release/1-6
merging release/1-6 into master
2019-08-23 13:44:46 -05:00
Matt Keeler 89ac998e8b
Secondary CA `establishLeadership` fix (#6383)
This prevents ACL issues (or other issues) during intermediate CA cert signing from failing leader establishment.
2019-08-23 11:32:37 -04:00
Hans Hasselberg aada537d87
auto_encrypt: use server-port (#6287)
AutoEncrypt needs the server-port because it wants to talk via RPC. Information from gossip might not be available at that point and thats why the server-port is being used.
2019-08-23 10:18:46 +02:00
R.B. Boyer ea65298070
connect: allow 'envoy_cluster_json' escape hatch to continue to function (#6378) 2019-08-22 15:11:56 -05:00
R.B. Boyer e7b8032b48
docs: document how envoy escape hatches work with the discovery chain (#6350)
- Bootstrap escape hatches are OK.
- Public listener/cluster escape hatches are OK.
- Upstream listener/cluster escape hatches are not supported.

If an unsupported escape hatch is configured and the discovery chain is
activated log a warning and act like it was not configured.

Fixes #6160
2019-08-21 15:10:12 -05:00
Matt Keeler 8cb0560f52
Ensure that config entry writes are forwarded to the primary DC (#6339) 2019-08-20 12:01:13 -04:00
R.B. Boyer 94c473fa5f
connect: ensure time.Duration fields retain their human readable forms in the API (#6348)
This applies for both config entries and the compiled discovery chain.

Also omit some other config entries fields when empty.
2019-08-19 15:31:05 -05:00
R.B. Boyer 0675e0606e
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer d6456fddeb
connect: introduce ExternalSNI field on service-defaults (#6324)
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.

Setting ExternalSNI will disable mesh gateway use for that target. It also 
disables several service-resolver features that do not make sense for an 
external service.
2019-08-19 12:19:44 -05:00
R.B. Boyer f84f509ce4
connect: updating a service-defaults config entry should leave an unset protocol alone (#6342)
If the entry is updated for reasons other than protocol it is surprising
that the value is explicitly persisted as 'tcp' rather than leaving it
empty and letting it fall back dynamically on the proxy-defaults value.
2019-08-19 10:44:06 -05:00
Matt Keeler 73888eed36
Filter out left/leaving serf members when determining if new AC… (#6332) 2019-08-16 10:34:18 -04:00
R.B. Boyer 91df06098c
xds: improve how envoy metrics are emitted (#6312)
Since generated envoy clusters all are named using (mostly) SNI syntax
we can have envoy read the various fields out of that structure and emit
it as stats labels to the various telemetry backends.

I changed the delimiter for the 'customization hash' from ':' to '~'
because ':' is always reencoded by envoy as '_' when generating metrics
keys.
2019-08-16 09:30:17 -05:00
R.B. Boyer 22ee60d1ba
agent: blocking central config RPCs iterations should not interfere with each other (#6316) 2019-08-14 09:08:46 -05:00
hashicorp-ci 33cab6ae72
update bindata_assetfs.go 2019-08-13 15:28:06 +00:00
hashicorp-ci 29767157ed Merge Consul OSS branch 'master' at commit 8f7586b339dbb518eff3a2eec27d7b8eae7a3fbb 2019-08-13 02:00:43 +00:00
Sarah Adams 2f7a90bc52
add flag to allow /operator/keyring requests to only hit local servers (#6279)
Add parameter local-only to operator keyring list requests to force queries to only hit local servers (no WAN traffic).

HTTP API: GET /operator/keyring?local-only=true
CLI: consul keyring -list --local-only

Sending the local-only flag with any non-GET/list request will result in an error.
2019-08-12 11:11:11 -07:00
Mike Morris e637b48117
snapshot: add TLS support to HalfCloser interface (#6216)
Calls net.TCPConn.CloseWrite or mtls.Conn.CloseWrite, which was added in https://go-review.googlesource.com/c/go/+/31318/
2019-08-12 12:47:02 -04:00
Matt Keeler 0e1b7038cc
Update the v1/agent/service/:service endpoint to output tagged… (#6304) 2019-08-10 09:15:19 -04:00
R.B. Boyer 3b30dfd3ee
connect: allow mesh gateways to use central config (#6302) 2019-08-09 15:07:01 -05:00
Mike Morris 88df658243
connect: remove managed proxies (#6220)
* connect: remove managed proxies implementation and all supporting config options and structs

* connect: remove deprecated ProxyDestination

* command: remove CONNECT_PROXY_TOKEN env var

* agent: remove entire proxyprocess proxy manager

* test: remove all managed proxy tests

* test: remove irrelevant managed proxy note from TestService_ServerTLSConfig

* test: update ContentHash to reflect managed proxy removal

* test: remove deprecated ProxyDestination test

* telemetry: remove managed proxy note

* http: remove /v1/agent/connect/proxy endpoint

* ci: remove deprecated test exclusion

* website: update managed proxies deprecation page to note removal

* website: remove managed proxy configuration API docs

* website: remove managed proxy note from built-in proxy config

* website: add note on removing proxy subdirectory of data_dir
2019-08-09 15:19:30 -04:00
R.B. Boyer 357ca39868
connect: ensure intention replication continues to work when the replication ACL token changes (#6288) 2019-08-07 11:34:09 -05:00
hashicorp-ci 3ac803da5e Merge Consul OSS branch 'master' at commit d84863799deca45ccf4bec5ab9f645ccae6b3aeb 2019-08-06 02:00:30 +00:00
Sarah Adams 9ed3e64510
fallback to proxy config global protocol when upstream services' protocol is unset (#6277)
fallback to proxy config global protocol when upstream services' protocol is unset

Fixes #5857
2019-08-05 12:52:35 -07:00
R.B. Boyer 64fc002e03
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer 0165e93517
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
Todd Radel 295abd82c3
connect: generate intermediate at same time as root (#6272)
Generate intermediate at same time as root
Co-Authored-By: Freddy <freddygv@users.noreply.github.com>
2019-08-02 15:36:03 -04:00
R.B. Boyer 4e2fb5730c
connect: detect and prevent circular discovery chain references (#6246) 2019-08-02 09:18:45 -05:00
R.B. Boyer 6c9edb17c2
server: if inserting bootstrap config entries fails don't silence the errors (#6256) 2019-08-01 23:07:11 -05:00
R.B. Boyer 782c647bf4
connect: simplify the compiled discovery chain data structures (#6242)
This should make them better for sending over RPC or the API.

Instead of a chain implemented explicitly like a linked list (nodes
holding pointers to other nodes) instead switch to a flat map of named
nodes with nodes linking other other nodes by name. The shipped
structure is just a map and a string to indicate which key to start
from.

Other changes:

* inline the compiler option InferDefaults as true

* introduce compiled target config to avoid needing to send back
  additional maps of Resolvers; future target-specific compiled state
  can go here

* move compiled MeshGateway out of the Resolver and into the
  TargetConfig where it makes more sense.
2019-08-01 22:44:05 -05:00
R.B. Boyer 4666599e18
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
R.B. Boyer 6bbbfde88b
connect: validate upstreams and prevent duplicates (#6224)
* connect: validate upstreams and prevent duplicates

* Actually run Upstream.Validate() instead of ignoring it as dead code.

* Prevent two upstreams from declaring the same bind address and port.
  It wouldn't work anyway.

* Prevent two upstreams from being declared that use the same
  type+name+namespace+datacenter. Due to how the Upstream.Identity()
  function worked this ended up mostly being enforced in xDS at use-time,
  but it should be enforced more clearly at register-time.
2019-08-01 13:26:02 -05:00
Paul Banks a5c70d79d0 Revert "connect: support AWS PCA as a CA provider" (#6251)
This reverts commit 3497b7c00d49c4acbbf951d84f2bba93f3da7510.
2019-07-31 09:08:10 -04:00
Todd Radel d3b7fd83fe
connect: support AWS PCA as a CA provider (#6189)
Port AWS PCA provider from consul-ent
2019-07-30 22:57:51 -04:00
Todd Radel 1b14d6595e
connect: Support RSA keys in addition to ECDSA (#6055)
Support RSA keys in addition to ECDSA
2019-07-30 17:47:39 -04:00
freddygv 00157a2c1f Update default gossip encryption key size to 32 bytes 2019-07-30 09:45:41 -06:00
hashicorp-ci 20f477cf13 Merge Consul OSS branch 'master' at commit a1725e6b5299c6ce12e8273205f90fba31403686 2019-07-30 02:00:29 +00:00
Matt Keeler e9f6805adc Fix flaky tests (#6229) 2019-07-29 15:07:25 -04:00
Matt Keeler 7e69646a77
Fix prepared query upstream endpoint generation (#6236)
Use the correct SNI value for prepared query upstreams
2019-07-29 11:15:55 -04:00
hashicorp-ci 252900d3e5
update bindata_assetfs.go 2019-07-26 23:15:20 +00:00
Matt Keeler a7c4b7af7c
Fix CA Replication when ACLs are enabled (#6201)
Secondary CA initialization steps are:

• Wait until the primary will be capable of signing intermediate certs. We use serf metadata to check the versions of servers in the primary which avoids needing a token like the previous implementation that used RPCs. We require at least one alive server in the primary and the all alive servers meet the version requirement.
• Initialize the secondary CA by getting the primary to sign an intermediate

When a primary dc is configured, if no existing CA is initialized and for whatever reason we cannot initialize a secondary CA the secondary DC will remain without a CA. As soon as it can it will initialize the secondary CA by pulling the primaries roots and getting the primary to sign an intermediate.

This also fixes a segfault that can happen during leadership revocation. There was a spot in the secondaryCARootsWatch that was getting the CA Provider and executing methods on it without nil checking. Under normal circumstances it wont be nil but during leadership revocation it gets nil'ed out. Therefore there is a period of time between closing the stop chan and when the go routine is actually stopped where it could read a nil provider and cause a segfault.
2019-07-26 15:57:57 -04:00
R.B. Boyer 1b95d2e5e3 Merge Consul OSS branch master at commit b3541c4f34d43ab92fe52256420759f17ea0ed73 2019-07-26 10:34:24 -05:00
hashicorp-ci 86ff9e9dc9
update bindata_assetfs.go 2019-07-25 23:41:16 +00:00
Jack Pearkes 3b98867a2c
config: correct limit to limits in config example (#6219)
This isn't yet documented on the website, but wanted to update this to add the missing s.
2019-07-25 12:38:57 -07:00
Matt Keeler c4a34602b6
Allow forwarding of some status RPCs (#6198)
* Allow forwarding of some status RPCs

* Update docs

* add comments about not using the regular forward
2019-07-25 14:26:22 -04:00
Jeff Mitchell e266b038cc Make the chunking test multidimensional (#6212)
This ensures that it's not just a single operation we restores
successfully, but many. It's the same foundation, just with multiple
going on at once.
2019-07-25 11:40:09 +01:00
Freddy 7dbbe7e55a
auto-encrypt: Fix port resolution and fallback to default port (#6205)
Auto-encrypt meant to fallback to the default port when it wasn't provided, but it hadn't been because of an issue with the error handling. We were checking against an incomplete error value:
"missing port in address" vs "address $HOST: missing port in address"

Additionally, all RPCs to AutoEncrypt.Sign were using a.config.ServerPort, so those were updated to use ports resolved by resolveAddrs, if they are available.
2019-07-24 16:49:37 -07:00
Jeff Mitchell e0068431f5 Chunking support (#6172)
* Initial chunk support

This uses the go-raft-middleware library to allow for chunked commits to the KV
2019-07-24 17:06:39 -04:00
Matt Keeler 155cdf022f
Envoy Mesh Gateway integration tests (#6187)
* Allow setting the mesh gateway mode for an upstream in config files

* Add envoy integration test for mesh gateways

This necessitated many supporting changes in most of the other test cases.

Add remote mode mesh gateways integration test
2019-07-24 17:01:42 -04:00
Freddy 1b97d65873
Make new config when retrying testServer creation (#6204) 2019-07-24 08:41:00 -06:00
R.B. Boyer bd4a2d7be2
connect: allow L7 routers to match on http methods (#6164)
Fixes #6158
2019-07-23 20:56:39 -05:00
R.B. Boyer 67f3da61af
connect: change router syntax for matching query parameters to resemble the syntax for matching paths and headers for consistency. (#6163)
This is a breaking change, but only in the context of the beta series.
2019-07-23 20:55:26 -05:00
R.B. Boyer fc90beb925
connect: validate and test more of the L7 config entries (#6156) 2019-07-23 20:50:23 -05:00
R.B. Boyer 2bfad66efa
connect: rework how the service resolver subset OnlyPassing flag works (#6173)
The main change is that we no longer filter service instances by health,
preferring instead to render all results down into EDS endpoints in
envoy and merely label the endpoints as HEALTHY or UNHEALTHY.

When OnlyPassing is set to true we will force consul checks in a
'warning' state to render as UNHEALTHY in envoy.

Fixes #6171
2019-07-23 20:20:24 -05:00
Alvin Huang 5b6fa58453 resolve circleci config conflicts 2019-07-23 20:18:36 -04:00
Freddy c19f46639b
Restore NotifyListen to avoid panic in newServer retry (#6200) 2019-07-23 14:33:00 -06:00
Pierre Souchay 80204641de Display nicely Networks (CIDR) in runtime configuration (#6029)
* Display nicely Networks (CIDR) in runtime configuration

CIDR mask is displayed in binary in configuration.
This add support for nicely displaying CIDR in runtime configuration.

Currently, if a configuration contains the following lines:

  "http_config": {
    "allow_write_http_from": [
      "127.0.0.0/8",
      "::1/128"
    ]
  }

A call to `/v1/agent/self?pretty` would display

  "AllowWriteHTTPFrom": [
            {
                "IP": "127.0.0.0",
                "Mask": "/wAAAA=="
            },
            {
                "IP": "::1",
                "Mask": "/////////////////////w=="
            }
  ]

This PR fixes it and it will now display:

   "AllowWriteHTTPFrom": [ "127.0.0.0/8", "::1/128" ]

* Added test for cidr nice rendering in `TestSanitize()`.
2019-07-23 16:30:16 -04:00
Matt Keeler c51b7aa676
Update go-bexpr (#6190)
* Update go-bexpr to v0.1.1

This brings in:

• `in`/`not in` operators to do substring matching
• `matches` / `not matches` operators to perform regex string matching.

* Add the capability to auto-generate the filtering selector ops tables for our docs
2019-07-23 14:45:20 -04:00
Paul Banks 42296292a4
Allow raft TrailingLogs to be configured. (#6186)
This fixes pathological cases where the write throughput and snapshot size are both so large that more than 10k log entries are written in the time it takes to restore the snapshot from disk. In this case followers that restart can never catch up with leader replication again and enter a loop of constantly downloading a full snapshot and restoring it only to find that snapshot is already out of date and the leader has truncated its logs so a new snapshot is sent etc.

In general if you need to adjust this, you are probably abusing Consul for purposes outside its design envelope and should reconsider your usage to reduce data size and/or write volume.
2019-07-23 15:19:57 +01:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
hashicorp-ci 8b109e5f9f Merge Consul OSS branch 'master' at commit ef257b084d2e2a474889518440515e360d0cd990 2019-07-20 02:00:29 +00:00
javicrespo d4f3eebf9d log rotation: limit count of rotated log files (#5831) 2019-07-19 15:36:34 -06:00
Christian Muehlhaeuser 46542b2ff7 Avoid unnecessary conversions (#6178)
Those values already have the right type.
2019-07-19 09:13:18 -04:00
Christian Muehlhaeuser 26f9368567 Fixed typos in comments (#6175)
Just a few nitpicky typo fixes.
2019-07-19 07:54:53 -04:00