Commit Graph

4637 Commits

Author SHA1 Message Date
Chris S. Kim a2c857df40 Fix tests for enterprise 2022-08-26 11:14:02 -04:00
Chris S. Kim a5e9ea6d96 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
Chris S. Kim a8090268d4
Replace ring buffer with async version (#14314)
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.

Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
2022-08-26 10:27:13 -04:00
alex f64af3be24
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 2e75833133 Exit loop when context is cancelled 2022-08-25 11:48:25 -04:00
cskh 7ee1c857c3
Fix: the inboundconnection limit filter should be placed in front of http co… (#14325)
* fix: the inboundconnection limit should be placed in front of http connection manager

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-08-24 14:13:10 -04:00
Chris S. Kim eac63fea1f Update test comment 2022-08-24 13:50:24 -04:00
Chris S. Kim 6f98c853b8 Add check for zero-length server addresses 2022-08-24 13:30:52 -04:00
skpratt c039028401
no-op: refactor usagemetrics tests for clarity and DRY cases (#14313) 2022-08-24 12:00:09 -05:00
Pablo Ruiz García 4188769c32
Added new auto_encrypt.grpc_server_tls config option to control AutoTLS enabling of GRPC Server's TLS usage
Fix for #14253

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-08-24 12:31:38 -04:00
Dan Upton 20c87d235f
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00
Daniel Upton 1cd7ec0543 proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Chris S. Kim 1e7a3b8d8d PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn 3d45306e1b
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim c14b166b80 Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim 587c57d3f4 Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim c68e589f26 Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim 0ae3462e61 Add missing mock assertions 2022-08-22 13:55:01 -04:00
Chris S. Kim 575a56062f Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
cskh e30d6bfc40
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim 98d102326f Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim 205e873689 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim 4cf54bef4e Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim e1a7456a69 Cleanup unused logger 2022-08-22 13:40:23 -04:00
Chris S. Kim 9f96f98ab6 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5da416224ba615405269661ba1f4d8d
2022-08-22 10:07:00 -04:00
cskh a87d8f48be
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh 7f66dfc780
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig a5a200e0e9 Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
Eric Haberkorn 40ce1c8288
Add `Targets` field to service resolver failovers. (#14162)
This field will be used for cluster peering failover.
2022-08-15 09:20:25 -04:00
cskh c20d016f62
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00
Eric Haberkorn 11884bfb99
Refactor failover code to use Envoy's aggregate clusters (#14178) 2022-08-12 14:30:46 -04:00
cskh e7b5baa3cc
feat(telemetry): add labels to serf and memberlist metrics (#14161)
* feat(telemetry): add labels to serf and memberlist metrics
* changelog
* doc update

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-08-11 22:09:56 -04:00
Chris S. Kim 182399255b
Handle breaking change for ServiceVirtualIP restore (#14149)
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.

This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
2022-08-11 14:47:10 -04:00
Chris S. Kim 55945a8231 Add test to verify forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim fbbb54fdc2 Register peerStreamServer internally to enable RPC forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim 534096a6ac Handle wrapped errors in isFailedPreconditionErr 2022-08-11 11:16:02 -04:00
Daniel Kimsey 4243e1e05f Add support for filtering the 'List Services' API
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
   list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
   functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
2022-08-10 16:52:32 -05:00
cskh 647f9787f8
fix: shadowed err in retryJoin() (#14112)
- err value will be used later to surface the error message
  if r.join() returns any err.
2022-08-10 10:53:57 -04:00
skpratt 070ed3738d
Merge pull request #14056 from hashicorp/proxy-register-port-race
Refactor sidecar_service method to separate port assignment
2022-08-10 09:46:29 -05:00
skpratt 7f1f095b2f Merge branch 'main' into proxy-register-port-race 2022-08-10 08:40:45 -05:00
Chris S. Kim 79d00f59cd Close active listeners on error
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
2022-08-09 12:22:39 -04:00
Chris S. Kim 4de96a1f3c Add retry in TestAgentConnectCALeafCert_good 2022-08-09 11:20:37 -04:00
Kyle Havlovitz 57afbb58ac
Merge pull request #13958 from hashicorp/gateway-wildcard-fix
Fix wildcard picking up services it shouldn't for ingress/terminating gateways
2022-08-08 12:54:40 -07:00
Kyle Havlovitz 2a0ab31ca4 Add some extra handling for destination deletes 2022-08-08 11:38:13 -07:00
freddygv 1e48b4f665 Update snapshot test 2022-08-08 09:17:15 -06:00
freddygv 65bcd3d84f Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv 67aa7ed15c Test fixes 2022-08-08 08:31:47 -06:00
freddygv 01b0cbcbd7 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
Kyle Havlovitz 3f435f31ac Update ingress/terminating wildcard logic and handle destinations 2022-08-05 07:56:10 -07:00
freddygv 3a623f2e9d Inherit active secret when exchanging 2022-08-03 17:32:53 -05:00
freddygv b089472a12 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
freddygv 544b3603e9 Avoid deleting peering secret UUIDs at dialers
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.

We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
2022-08-03 16:34:57 -05:00
skpratt 1ded7a7632
Merge pull request #13906 from skpratt/validate-port-agent-split
Separate port and socket path validation for local agent
2022-08-02 16:58:41 -05:00
Dhia Ayachi c1ca9afdf2
add token to the request when creating a cacheIntentions query (#14005) 2022-08-02 14:27:34 -04:00
Kyle Havlovitz fce49a1ec0 Fix wildcard picking up services it shouldn't for ingress/terminating gateways 2022-08-02 09:41:31 -07:00
Daniel Upton 8da6710958 proxycfg-sources: fix hot loop when service not found in catalog
Fixes a bug where a service getting deleted from the catalog would cause
the ConfigSource to spin in a hot loop attempting to look up the service.

This is because we were returning a nil WatchSet which would always
unblock the select.

Kudos to @freddygv for discovering this!
2022-08-02 15:42:29 +01:00
Freddy 56144cf5f7
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
DanStough e46a4b3cc1 fix: ipv4 destination dns resolution 2022-08-01 16:45:57 -04:00
Luke Kysow e9960dfdf3
peering: default to false (#13963)
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2022-08-01 15:22:36 -04:00
Freddy a54903b0f4
Merge branch 'main' into fix-kv_entries-metric 2022-08-01 13:19:27 -06:00
Freddy 593add2ec0
Merge pull request #13499 from maxb/delete-unused-metric
Delete definition of metric `consul.acl.blocked.node.deregistration`
2022-08-01 12:31:05 -06:00
Dhia Ayachi cf7e175eab
Tgtwy egress HTTP support (#13953)
* add golden files

* add support to http in tgateway egress destination

* fix slice sorting to include both address and port when using server_names

* fix listener loop for http destination

* fix routes to generate a route per port and a virtualhost per port-address combination

* sort virtual hosts list to have a stable order

* extract redundant serviceNode
2022-08-01 14:12:43 -04:00
Matt Keeler 795e5830c6
Implement/Utilize secrets for Peering Replication Stream (#13977) 2022-08-01 10:33:18 -04:00
alex 0f6354685b
block PeerName register requests (#13887)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-29 14:36:22 -07:00
Luke Kysow 17594a123e
peering: retry establishing connection more quickly on certain errors (#13938)
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.

The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
2022-07-29 13:04:32 -07:00
Sarah Pratt 11c7a465b7 Separate port and socket path requirement in case of local agent assignment 2022-07-29 13:28:21 -05:00
alex 74d79cc7e6
Merge pull request #13952 from hashicorp/sync-more-acl
sync more acl enforcement
2022-07-28 12:31:02 -07:00
Dhia Ayachi 09340a846c
inject gateway addons to destination clusters (#13951) 2022-07-28 15:17:35 -04:00
acpana 70e052f35f
sync more acl enforcement
sync w ent at 32756f7

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-28 12:01:52 -07:00
alex 08b94640bc
Merge pull request #13929 from hashicorp/fix-validation
[sync] fix empty partitions matching
2022-07-28 10:14:49 -07:00
Sarah Pratt f01a4f91dc refactor sidecare_service method into parts 2022-07-28 09:07:13 -05:00
Ashwin Venkatesh e4aaf467eb
Add peer counts to emitted metrics. (#13930) 2022-07-27 18:34:04 -04:00
Luke Kysow a2290791b2
Merge pull request #13924 from hashicorp/lkysow/util-metric-peering
peering: don't track imported services/nodes in usage
2022-07-27 14:49:55 -07:00
acpana 778c796ec9
use EqualPartitions
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:48:30 -07:00
acpana 8042b3aeed
better fix
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:28:08 -07:00
acpana b03467e3bd
sync w ent
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 11:41:39 -07:00
Chris S. Kim 213e985d17 Reduce arm64 flakes for TestConnectCA_ConfigurationSet_ChangeKeyConfig_Primary
There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12.

testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.
2022-07-27 13:54:34 -04:00
Chris S. Kim c80ab10527 Retry checks for virtual IP metadata 2022-07-27 13:54:34 -04:00
Chris S. Kim 146dd93775 Sort slice of ServiceNames deterministically 2022-07-27 13:54:34 -04:00
Sarah Pratt 1bd5470b07 Separate port and socket path requirement in case of local agent assignment 2022-07-27 12:30:52 -05:00
Luke Kysow 92c1f30359 peering: don't track imported services/nodes in usage
Services/nodes that are imported from other peers are stored in
state. We don't want to count those as part of our own cluster's usage.
2022-07-27 09:08:51 -07:00
cskh f7858a1bda
chore: clarify the error message: service.service must not be empty (#13907)
- when register service using catalog endpoint, the key of service
  name actually should be "service". Add this information to the
  error message will help user to quickly fix in the request.
2022-07-27 10:16:46 -04:00
cskh ae04e2f048
chore: removed unused method AddService (#13905)
- This AddService is not used anywhere.
  AddServiceWithChecks is place of AddService
- Test code is updated
2022-07-26 16:54:53 -04:00
Luke Kysow 0e10e5b765 Remove duplicate comment 2022-07-26 10:19:49 -07:00
alex 0a66d0188d
peering: prevent peering in same partition (#13851)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-07-25 18:00:48 -07:00
Nitya Dhanushkodi 03ea6517c9
peering: remove validation that forces peering token server addresses to be an IP, allow hostname based addresses (#13874) 2022-07-25 16:33:47 -07:00
Luke Kysow 5d4209eaf8
Rename receive to recv in tracker (#13896)
Because it's shorter
2022-07-25 16:08:03 -07:00
Luke Kysow a8ae88ec59
peering: read endpoints can now return failing status (#13849)
Track streams that have been disconnected due to an error and
set their statuses to failing.
2022-07-25 14:27:53 -07:00
Kyle Havlovitz ec70713dd3
Merge pull request #13872 from hashicorp/remove-upstream-log
Remove extra logging from ingress upstream watch shutdown
2022-07-25 12:55:30 -07:00
Chris S. Kim 1f8ae56951
Preserve PeeringState on upsert (#13666)
Fixes a bug where if the generate token is called twice, the second call upserts the zero-value (undefined) of PeeringState.
2022-07-25 14:37:56 -04:00
Chris S. Kim c752c5bff2
Update envoy metrics label extraction for peered clusters and listeners (#13818)
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.

Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.

Listener filter names were updated to better match the existing regex.

Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
2022-07-25 13:49:00 -04:00
DanStough f690d299c9 feat: convert destination address to slice 2022-07-25 12:31:58 -04:00
Freddy e6f997ac5b
[OSS] Add ACL enforcement to peering endpoints (#13878) 2022-07-25 10:04:10 -06:00
Matt Keeler 6a47c44755
Enable/Disable Peering Support in the UI (#13816)
We enabled/disable based on the config flag.
2022-07-25 11:50:11 -04:00
freddygv 5bbc0cc615 Add ACL enforcement to peering endpoints 2022-07-25 09:34:29 -06:00
Kyle Havlovitz 75efc0649b Remove excess debug log from ingress upstream shutdown 2022-07-22 17:29:38 -07:00
alex b60ebc022e
peering: use ShouldDial to validate peer role (#13823)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 15:56:25 -07:00
Luke Kysow d21f793b74
peering: add config to enable/disable peering (#13867)
* peering: add config to enable/disable peering

Add config:

```
peering {
  enabled = true
}
```

Defaults to true. When disabled:
1. All peering RPC endpoints will return an error
2. Leader won't start its peering establishment goroutines
3. Leader won't start its peering deletion goroutines
2022-07-22 15:20:21 -07:00
Kyle Havlovitz 3cbcfd4b13
Merge pull request #13847 from hashicorp/gateway-goroutine-leak
Fix goroutine leaks in proxycfg when using ingress gateway
2022-07-22 14:43:22 -07:00
Freddy 922592d6bb
[OSS] Add new peering ACL rule (#13848)
This commit adds a new ACL rule named "peering" to authorize
actions taken against peering-related endpoints.

The "peering" rule has several key properties:
- It is scoped to a partition, and MUST be defined in the default
  namespace.

- Its access level must be "read', "write", or "deny".

- Granting an access level will apply to all peerings. This ACL rule
  cannot be used to selective grant access to some peerings but not
  others.

- If the peering rule is not specified, we fall back to the "operator"
  rule and then the default ACL rule.
2022-07-22 14:42:23 -06:00
alex 7bd55578cc
peering: emit exported services count metric (#13811)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 12:05:08 -07:00
Daniel Upton f018bd6e09 proxycfg-glue: server-local implementation of `ExportedPeeredServices`
This is the OSS portion of enterprise PR 2377.

Adds a server-local implementation of the proxycfg.ExportedPeeredServices
interface that sources data from a blocking query against the server's
state store.
2022-07-22 15:23:23 +01:00
Eric Haberkorn e044343105
Add Cluster Peering Failover Support to Prepared Queries (#13835)
Add peering failover support to prepared queries
2022-07-22 09:14:43 -04:00
Nitya Dhanushkodi cbafabde16
update generate token endpoint to take external addresses (#13844)
Update generate token endpoint (rpc, http, and api module)

If ServerExternalAddresses are set, it will override any addresses gotten from the "consul" service, and be used in the token instead, and dialed by the dialer. This allows for setting up a load balancer for example, in front of the consul servers.
2022-07-21 14:56:11 -07:00
acpana b847f656a8
Rename peering internal to ~
sync ENT to 5679392c81

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-21 10:51:05 -07:00
Luke Kysow ba7f3fbebc
peering: Add heartbeating to peering streams (#13806)
* Add heartbeating to peering streams
2022-07-21 10:03:27 -07:00
Daniel Upton e3bff8fb39 proxycfg-glue: server-local implementation of `PeeredUpstreams`
This is the OSS portion of enterprise PR 2352.

It adds a server-local implementation of the proxycfg.PeeredUpstreams interface
based on a blocking query against the server's state store.

It also fixes an omission in the Virtual IP freeing logic where we were never
updating the max index (and therefore blocking queries against
VirtualIPsForAllImportedServices would not return on service deletion).
2022-07-21 13:51:59 +01:00
Luke Kysow 4cec3bd9db
Add send mutex to protect against concurrent sends (#13805) 2022-07-20 15:48:18 -07:00
Kyle Havlovitz affbb28eb5 Cancel upstream watches when the discovery chain has been removed 2022-07-20 14:26:52 -07:00
Kyle Havlovitz 77a263ebbb Fix duplicate Notify calls for discovery chains in ingress gateways 2022-07-20 14:25:20 -07:00
Evan Culver 285b4cef2b
connect: Add support for Envoy 1.23, remove 1.19 (#13807) 2022-07-19 14:51:04 -07:00
Paul Glass a9f17c0f99
Extract AWS auth implementation out of Consul (#13760) 2022-07-19 16:26:44 -05:00
Chris S. Kim dcc230f699
Make envoy resources for inferred peered upstreams (#13758)
Peered upstreams has a separate loop in xds from discovery chain upstreams. This PR adds similar but slightly modified code to add filters for peered upstream listeners, clusters, and endpoints in the case of transparent proxy.
2022-07-19 14:56:28 -04:00
alex 64b3705a31
peering: refactor reconcile, cleanup (#13795)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-19 11:43:29 -07:00
Luke Kysow eba682fc08
peerstream: set keepalive enforcement to 15s (#13796)
The client is set to send keepalive pings every 30s. The server
keepalive enforcement must be set to a number less than that,
otherwise it will disconnect clients for sending pings too often.
MinTime governs the minimum amount of time between pings.
2022-07-18 16:12:03 -07:00
alex 4ff097c4cf
peering: track exported services (#13784)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-18 10:20:04 -07:00
R.B. Boyer bec4df0679
peerstream: require a resource subscription to receive updates of that type (#13767)
This mimics xDS's discovery protocol where you must request a resource
explicitly for the exporting side to send those events to you.

As part of this I aligned the overall ResourceURL with the TypeURL that
gets embedded into the encoded protobuf Any construct. The
CheckServiceNodes is now wrapped in a better named "ExportedService"
struct now.
2022-07-15 15:03:40 -05:00
R.B. Boyer 7da65c02a6
peerstream: fix test assertions (#13780) 2022-07-15 14:43:24 -05:00
Luke Kysow 3968f21339
Add docs for peerStreamServer vs peeringServer. (#13781) 2022-07-15 12:23:05 -07:00
Luke Kysow a8721c33c5
peerstream: dialer should reconnect when stream closes (#13745)
* peerstream: dialer should reconnect when stream closes

If the stream is closed unexpectedly (i.e. when we haven't received
a terminated message), the dialer should attempt to re-establish the
stream.

Previously, the `HandleStream` would return `nil` when the stream
was closed. The caller then assumed the stream was terminated on purpose
and so didn't reconnect when instead it was stopped unexpectedly and
the dialer should have attempted to reconnect.
2022-07-15 11:58:33 -07:00
R.B. Boyer 61ebb38092
server: ensure peer replication can successfully use TLS over external gRPC (#13733)
Ensure that the peer stream replication rpc can successfully be used with TLS activated.

Also:

- If key material is configured for the gRPC port but HTTPS is not
  enabled now TLS will still be activated for the gRPC port.

- peerstream replication stream opened by the establishing-side will now
  ignore grpc.WithBlock so that TLS errors will bubble up instead of
  being awkwardly delayed or suppressed
2022-07-15 13:15:50 -05:00
alex 70ad4804b6
peering: track imported services (#13718) 2022-07-15 10:20:43 -07:00
Matt Keeler 7ae0c69729
Use Node Name for peering healthSnapshot instead of ID (#13773)
A Node ID is not a required field with Consul’s data model. Therefore we cannot reliably expect all uses to have it. However the node name is required and must be unique so its equally as good of a key for the internal healthSnapshot node tracking.
2022-07-15 10:51:38 -04:00
Matt Keeler 17565a4fca
Enable partition support for peering establishment (#13772)
Prior to this the dialing side of the peering would only ever work within the default partition. This commit allows properly parsing the partition field out of the API struct request body, query param and header.
2022-07-15 10:07:07 -04:00
Dan Stough 084f9d7084 feat: connect proxy xDS for destinations
Signed-off-by: Dhia Ayachi <dhia@hashicorp.com>
2022-07-14 15:27:02 -04:00
Daniel Upton 7f69e27926 proxycfg-glue: server-local implementation of `FederationStateListMeshGateways`
This is the OSS portion of enterprise PR 2265.

This PR provides a server-local implementation of the
proxycfg.FederationStateListMeshGateways interface based on blocking queries.
2022-07-14 18:22:12 +01:00
Daniel Upton a5a6102a3b proxycfg-glue: server-local implementation of `GatewayServices`
This is the OSS portion of enterprise PR 2259.

This PR provides a server-local implementation of the proxycfg.GatewayServices
interface based on blocking queries.
2022-07-14 18:22:12 +01:00
Daniel Upton a280c9a10b proxycfg-glue: server-local implementation of `TrustBundle` and `TrustBundleList`
This is the OSS portion of enterprise PR 2250.

This PR provides server-local implementations of the proxycfg.TrustBundle and
proxycfg.TrustBundleList interfaces, based on local blocking queries.
2022-07-14 18:22:12 +01:00
Daniel Upton 70f29942f4 proxycfg-glue: server-local implementation of the `Health` interface
This is the OSS portion of enterprise PR 2249.

This PR introduces an implementation of the proxycfg.Health interface based on a
local materialized view of the health events.

It reuses the view and request machinery from agent/rpcclient/health, which made
it super straightforward.
2022-07-14 18:22:12 +01:00
Daniel Upton 688dfe3138 proxycfg-glue: server-local implementation of `ServiceList`
This is the OSS portion of enterprise PR 2242.

This PR introduces a server-local implementation of the proxycfg.ServiceList
interface, backed by streaming events and a local materializer.
2022-07-14 18:22:12 +01:00
Daniel Upton 599f5e2207 proxycfg-glue: server-local compiled discovery chain data source
This is the OSS portion of enterprise PR 2236.

Adds a local blocking query-based implementation of the proxycfg.CompiledDiscoveryChain interface.
2022-07-14 18:22:12 +01:00
Chris S. Kim d12b3d286e Check if an upstream is implicit from either intentions or peered services 2022-07-13 16:53:20 -04:00
Chris S. Kim 5d890cdbb2 Use new maps for proxycfg peered data 2022-07-13 16:05:10 -04:00
Chris S. Kim 34c0093d44 Add new watch.Map type to refactor proxycfg 2022-07-13 16:05:10 -04:00
Chris S. Kim 0936942b2d Scrub VirtualIPs before exporting 2022-07-13 16:05:10 -04:00
Kyle Havlovitz a7ea6cb771
Merge pull request #13699 from hashicorp/tgate-http2-upstream
Respect http2 protocol for upstreams of terminating gateways
2022-07-13 09:41:15 -07:00
Dan Upton 34140ff3e0
grpc: rename public/private directories to external/internal (#13721)
Previously, public referred to gRPC services that are both exposed on
the dedicated gRPC port and have their definitions in the proto-public
directory (so were considered usable by 3rd parties). Whereas private
referred to services on the multiplexed server port that are only usable
by agents and other servers.

Now, we're splitting these definitions, such that external/internal
refers to the port and public/private refers to whether they can be used
by 3rd parties.

This is necessary because the peering replication API needs to be
exposed on the dedicated port, but is not (yet) suitable for use by 3rd
parties.
2022-07-13 16:33:48 +01:00
R.B. Boyer c880728ab4
peerstream: some cosmetic refactors to make this easier to follow (#13732)
- Use some protobuf construction helper methods for brevity.
- Rename a local variable to avoid later shadowing.
- Rename the Nonce field to be more like xDS's naming.
- Be more explicit about which PeerID fields are empty.
2022-07-13 10:00:35 -05:00
Kyle Havlovitz 0ac7de3bae Use protocol from resolved config entry, not gateway service 2022-07-12 16:23:40 -07:00
Kyle Havlovitz 54d8fe9032 Enable http2 options for grpc protocol 2022-07-12 14:38:44 -07:00
R.B. Boyer 81764a5650
peering: always send the mesh gateway SpiffeID even for tcp services (#13728)
If someone were to switch a peer-exported service from L4 to L7 there
would be a brief SAN validation hiccup as traffic shifted to the mesh
gateway for termination.

This PR sends the mesh gateway SpiffeID down all the time so the clients
always expect a switch.
2022-07-12 11:38:13 -05:00
R.B. Boyer ee5eb5a960
state: prohibit changing an exported tcp discovery chain in a way that would break SAN validation (#13727)
For L4/tcp exported services the mesh gateways will not be terminating
TLS. A caller in one peer will be directly establishing TLS connections
to the ultimate exported service in the other peer.

The caller will be doing SAN validation using the replicated SpiffeID
values shipped from the exporting side. There are a class of discovery
chain edits that could be done on the exporting side that would cause
the introduction of a new SpiffeID value. In between the time of the
config entry update on the exporting side and the importing side getting
updated peer stream data requests to the exported service would fail due
to SAN validation errors.

This is unacceptable so instead prohibit the exporting peer from making
changes that would break peering in this way.
2022-07-12 11:17:33 -05:00
R.B. Boyer 2c329475ce
state: prohibit exported discovery chains to have cross-datacenter or cross-partition references (#13726)
Because peerings are pairwise, between two tuples of (datacenter,
partition) having any exported reference via a discovery chain that
crosses out of the peered datacenter or partition will ultimately not be
able to work for various reasons. The biggest one is that there is no
way in the ultimate destination to configure an intention that can allow
an external SpiffeID to access a service.

This PR ensures that a user simply cannot do this, so they won't run
into weird situations like this.
2022-07-12 11:03:41 -05:00
Chris S. Kim 9f5ab3ec10
Return error if ServerAddresses is empty (#13714) 2022-07-12 11:09:00 -04:00
Kyle Havlovitz 616a2da835 Respect http2 protocol for upstreams of terminating gateways 2022-07-08 14:30:45 -07:00
R.B. Boyer 5b801db24b
peering: move peer replication to the external gRPC port (#13698)
Peer replication is intended to be between separate Consul installs and
effectively should be considered "external". This PR moves the peer
stream replication bidirectional RPC endpoint to the external gRPC
server and ensures that things continue to function.
2022-07-08 12:01:13 -05:00
R.B. Boyer 40c5c7eee2
server: broadcast the public grpc port using lan serf and update the consul service in the catalog with the same data (#13687)
Currently servers exchange information about their WAN serf port
and RPC port with serf tags, so that they all learn of each other's
addressing information. We intend to make larger use of the new
public-facing gRPC port exposed on all of the servers, so this PR
addresses that by passing around the gRPC port via serf tags and
then ensuring the generated consul service in the catalog has
metadata about that new port as well for ease of non-serf-based lookup.
2022-07-07 13:55:41 -05:00
Freddy ed9808c4f1
Parse peer name for virtual IP DNS queries (#13602)
This commit updates the DNS query locality parsing so that the virtual
IP for an imported service can be queried.

Note that:
- Support for parsing a peer in other service discovery queries was not
  added.
- Querying another datacenter for a virtual IP is not supported. This
  was technically allowed in 1.11 but is being rolled back for 1.13
  because it is not a use-case we intended to support. Virtual IPs in
  different datacenters are going to collide because they are allocated
  sequentially.
2022-07-06 10:30:04 -06:00
R.B. Boyer 4ce9651421
test: update mockery use to put mocks into test files (#13656)
--testonly doesn't do anything anymore so switch to --filename instead
2022-07-05 16:57:15 -05:00
Chris S. Kim 0910c41d95
Revise possible states for a peering. (#13661)
These changes are primarily for Consul's UI, where we want to be more
specific about the state a peering is in.

- The "initial" state was renamed to pending, and no longer applies to
  peerings being established from a peering token.

- Upon request to establish a peering from a peering token, peerings
  will be set as "establishing". This will help distinguish between the
  two roles: the cluster that generates the peering token and the
  cluster that establishes the peering.

- When marked for deletion, peering state will be set to "deleting".
  This way the UI determines the deletion via the state rather than the
  "DeletedAt" field.

Co-authored-by: freddygv <freddy@hashicorp.com>
2022-07-04 10:47:58 -04:00