open-consul

Commit Graph

Author	SHA1	Message	Date
Derek Menteer	f926c7643f	Enforce lowercase peer names. (#15697 ) Enforce lowercase peer names. Prior to this change peer names could be mixed case. This can cause issues, as peer names are used as DNS labels in various locations. It also caused issues with envoy configuration.	2023-01-13 14:20:28 -06:00
Dan Upton	15c7c03fa5	grpc: switch servers and retry on error (#15892 ) This is the OSS portion of enterprise PR 3822. Adds a custom gRPC balancer that replicates the router's server cycling behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors, which we now get for free.	2023-01-05 10:21:27 +00:00
Dan Upton	006138beb4	Wire in rate limiter to handle internal and external gRPC calls (#15857 )	2022-12-23 13:42:16 -06:00
John Murret	2a0aeb2349	Rate Limit Handler - ensure rate limiting is not in the code path when not configured (#15819 ) * Rate limiting handler - ensure configuration has changed before modifying limiters * Updating test to validate arguments to UpdateConfig * Removing duplicate test. Updating mock. * Renaming NullRateLimiter to NullRequestLimitsHandler * Rate Limit Handler - ensure rate limiting is not in the code path when not configured * Update agent/consul/rate/handler.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * formatting handler.go * Rate limiting handler - ensure configuration has changed before modifying limiters * Updating test to validate arguments to UpdateConfig * Removing duplicate test. Updating mock. * adding logging for when UpdateConfig is called but the config has not changed. * Update agent/consul/rate/handler.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * Update agent/consul/rate/handler_test.go Co-authored-by: Dan Upton <daniel@floppy.co> * modifying existing variable name based on pr feedback * updating a broken merge conflict; Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> Co-authored-by: Dan Upton <daniel@floppy.co>	2022-12-20 15:00:22 -07:00
John Murret	700c693b33	adding config for request_limits (#15531 ) * server: add placeholder glue for rate limit handler This commit adds a no-op implementation of the rate-limit handler and adds it to the `consul.Server` struct and setup code. This allows us to start working on the net/rpc and gRPC interceptors and config logic. * Add handler errors * Set the global read and write limits * fixing multilimiter moving packages * Fix typo * Simplify globalLimit usage * add multilimiter and tests * exporting LimitedEntity * Apply suggestions from code review Co-authored-by: John Murret <john.murret@hashicorp.com> * add config update and rename config params * add doc string and split config * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * use timer to avoid go routine leak and change the interface * add comments to tests * fix failing test * add prefix with config edge, refactor tests * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * refactor to apply configs for limiters under a prefix * add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic * make KeyType an exported type * split the config and limiter trees to fix race conditions in config update * rename variables * fix race in test and remove dead code * fix reconcile loop to not create a timer on each loop * add extra benchmark tests and fix tests * fix benchmark test to pass value to func * server: add placeholder glue for rate limit handler This commit adds a no-op implementation of the rate-limit handler and adds it to the `consul.Server` struct and setup code. This allows us to start working on the net/rpc and gRPC interceptors and config logic. * Set the global read and write limits * fixing multilimiter moving packages * add server configuration for global rate limiting. * remove agent test * remove added stuff from handler * remove added stuff from multilimiter * removing unnecessary TODOs * Removing TODO comment from handler * adding in defaulting to infinite * add disabled status in there * adding in documentation for disabled mode. * make disabled the default. * Add mock and agent test * addig documentation and missing mock file. * Fixing test TestLoad_IntegrationWithFlags * updating docs based on PR feedback. * Updating Request Limits mode to use int based on PR feedback. * Adding RequestLimits struct so we have a nested struct in ReloadableConfig. * fixing linting references * Update agent/consul/rate/handler.go Co-authored-by: Dan Upton <daniel@floppy.co> * Update agent/consul/config.go Co-authored-by: Dan Upton <daniel@floppy.co> * removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf * added conversion function to convert request limits object to handler config. * Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well. * Updating values for TestLoad_FullConfig() so that they were different and discernable. * Updating TestRuntimeConfig_Sanitize * Fixing TestLoad_IntegrationWithFlags test * putting nil check in place * fixing rebase * removing change for missing error checks. will put in another PR * Rebasing after default multilimiter config change * resolving rebase issues * updating reference for incomingRPCLimiter to use interface * updating interface * Updating interfaces * Fixing mock reference Co-authored-by: Daniel Upton <daniel@floppy.co> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2022-12-13 13:09:55 -07:00
Dan Upton	c73707ca3c	grpc: add rate-limiting middleware (#15550 ) Implements the gRPC middleware for rate-limiting as a tap.ServerInHandle function (executed before the request is unmarshaled). Mappings between gRPC methods and their operation type are generated by a protoc plugin introduced by #15564.	2022-12-13 15:01:56 +00:00
Dan Stough	ee56e06f22	[OSS] fix: wait and try longer to peer through mesh gw (#15328 )	2022-11-10 13:54:00 -05:00
Kyle Schochenmaier	2b1e5f69e2	removes ioutil usage everywhere which was deprecated in go1.16 (#15297 ) * update go version to 1.18 for api and sdk, go mod tidy * removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward. Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-11-10 10:26:01 -06:00
Derek Menteer	a8eb047ee6	Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267 ) Re-add ServerExternalAddresses parameter in GenerateToken endpoint This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6 and adds extra functionality to support newer peering behaviors.	2022-11-08 14:55:18 -06:00
Chris S. Kim	dbe3dc96f3	Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185 )	2022-11-07 11:34:30 -05:00
Derek Menteer	261ba1e65d	Backport various fixes from ENT. (#15254 ) * Regenerate golden files. * Backport from ENT: "Avoid race" Original commit: 5006c8c858b0e332be95271ef9ba35122453315b Original author: freddygv * Backport from ENT: "chore: fix flake peerstream test" Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8 Original author: DanStough	2022-11-03 16:34:57 -05:00
Derek Menteer	58f15db4c4	Allow peering endpoints to bypass verify_incoming.	2022-10-31 09:56:30 -05:00
R.B. Boyer	87432a8dd4	chore: update golangci-lint to v1.50.1 (#15022 )	2022-10-24 11:48:02 -05:00
freddygv	483720a443	Return forbidden on permission denied This commit updates the establish endpoint to bubble up a 403 status code to callers when the establishment secret from the token is invalid. This is a signal that a new peering token must be generated.	2022-10-20 17:11:49 -06:00
Nitya Dhanushkodi	598670e376	Remove ability to specify external addresses in GenerateToken endpoint (#14930 ) * Reverts "update generate token endpoint to take external addresses (#13844)" This reverts commit f47319b7c6b6e7c7dd720a5af927ad2d33fa536d.	2022-10-19 09:31:36 -07:00
freddygv	239f0e3084	Update peering establishment to maybe use gateways When peering through mesh gateways we expect outbound dials to peer servers to flow through the local mesh gateway addresses. Now when establishing a peering we get a list of dial addresses as a ring buffer that includes local mesh gateway addresses if the local DC is configured to peer through mesh gateways. The ring buffer includes the mesh gateway addresses first, but also includes the remote server addresses as a fallback. This fallback is present because it's possible that direct egress from the servers may be allowed. If not allowed then the leader will cycle back to a mesh gateway address through the ring. When attempting to dial the remote servers we retry up to a fixed timeout. If using mesh gateways we also have an initial wait in order to allow for the mesh gateways to configure themselves. Note that if we encounter a permission denied error we do not retry since that error indicates that the secret in the peering token is invalid.	2022-10-13 14:57:55 -06:00
Derek Menteer	cc0a05ffa0	Disallow peering to the same cluster.	2022-10-13 14:11:02 -05:00
Derek Menteer	bfa4adbfce	Add remote peer partition and datacenter info.	2022-10-13 10:37:41 -05:00
Paul Glass	8cf430140a	gRPC server metrics (#14922 ) * Move stats.go from grpc-internal to grpc-middleware * Update grpc server metrics with server type label * Add stats test to grpc-external * Remove global metrics instance from grpc server tests	2022-10-11 17:00:32 -05:00
Freddy	8d93f120ea	Merge pull request #14796 from hashicorp/peering/use-connect-ca	2022-10-07 10:37:37 -06:00
freddygv	6ef8d329d2	Require Connect and TLS to generate peering tokens By requiring Connect and a gRPC TLS listener we can automatically configure TLS for all peering control-plane traffic.	2022-10-07 09:06:29 -06:00
freddygv	a21e5799f7	Use internal server certificate for peering TLS A previous commit introduced an internally-managed server certificate to use for peering-related purposes. Now the peering token has been updated to match that behavior: - The server name matches the structure of the server cert - The CA PEMs correspond to the Connect CA Note that if Conect is disabled, and by extension the Connect CA, we fall back to the previous behavior of returning the manually configured certs and local server SNI. Several tests were updated to use the gRPC TLS port since they enable Connect by default. This means that the peering token will embed the Connect CA, and the dialer will expect a TLS listener.	2022-10-07 09:05:32 -06:00
DanStough	df94470e76	feat: xDS updates for peerings control plane through mesh gw	2022-10-07 08:46:42 -06:00
Eric Haberkorn	2178e38204	Rename `PeerName` to `Peer` on prepared queries and exported services (#14854 )	2022-10-04 14:46:15 -04:00
Eric Haberkorn	5fd1e6daea	Add exported services event to cluster peering replication. (#14797 )	2022-09-29 15:37:19 -04:00
malizz	5c470b28dd	Support Stale Queries for Trust Bundle Lookups (#14724 ) * initial commit * add tags, add conversations * add test for query options utility functions * update previous tests * fix test * don't error out on empty context * add changelog * update decode config	2022-09-28 09:56:59 -07:00
DanStough	b37a2ba889	feat(peering): validate server name conflicts on establish	2022-09-14 11:37:30 -04:00
Dan Upton	9fe6c33c0d	xDS Load Balancing (#14397 ) Prior to #13244, connect proxies and gateways could only be configured by an xDS session served by the local client agent. In an upcoming release, it will be possible to deploy a Consul service mesh without client agents. In this model, xDS sessions will be handled by the servers themselves, which necessitates load-balancing to prevent a single server from receiving a disproportionate amount of load and becoming overwhelmed. This introduces a simple form of load-balancing where Consul will attempt to achieve an even spread of load (xDS sessions) between all healthy servers. It does so by implementing a concurrent session limiter (limiter.SessionLimiter) and adjusting the limit according to autopilot state and proxy service registrations in the catalog. If a server is already over capacity (i.e. the session limit is lowered), Consul will begin draining sessions to rebalance the load. This will result in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's responsibility to observe this response and reconnect to a different server. Users of the gRPC client connection brokered by the consul-server-connection-manager library will get this for free. The rate at which Consul will drain sessions to rebalance load is scaled dynamically based on the number of proxies in the catalog.	2022-09-09 15:02:01 +01:00
freddygv	19f25fc3a5	Allow terminated peerings to be deleted Peerings are terminated when a peer decides to delete the peering from their end. Deleting a peering sends a termination message to the peer and triggers them to mark the peering as terminated but does NOT delete the peering itself. This is to prevent peerings from disappearing from both sides just because one side deleted them. Previously the Delete endpoint was skipping the deletion if the peering was not marked as active. However, terminated peerings are also inactive. This PR makes some updates so that peerings marked as terminated can be deleted by users.	2022-08-26 10:52:47 -06:00
Luke Kysow	e9960dfdf3	peering: default to false (#13963 ) * defaulting to false because peering will be released as beta * Ignore peering disabled error in bundles cachetype Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: freddygv <freddy@hashicorp.com> Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>	2022-08-01 15:22:36 -04:00
Matt Keeler	795e5830c6	Implement/Utilize secrets for Peering Replication Stream (#13977 )	2022-08-01 10:33:18 -04:00
alex	0a66d0188d	peering: prevent peering in same partition (#13851 ) Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-07-25 18:00:48 -07:00
freddygv	5bbc0cc615	Add ACL enforcement to peering endpoints	2022-07-25 09:34:29 -06:00
alex	b60ebc022e	peering: use ShouldDial to validate peer role (#13823 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-22 15:56:25 -07:00
Luke Kysow	d21f793b74	peering: add config to enable/disable peering (#13867 ) * peering: add config to enable/disable peering Add config: ``` peering { enabled = true } ``` Defaults to true. When disabled: 1. All peering RPC endpoints will return an error 2. Leader won't start its peering establishment goroutines 3. Leader won't start its peering deletion goroutines	2022-07-22 15:20:21 -07:00
Nitya Dhanushkodi	cbafabde16	update generate token endpoint to take external addresses (#13844 ) Update generate token endpoint (rpc, http, and api module) If ServerExternalAddresses are set, it will override any addresses gotten from the "consul" service, and be used in the token instead, and dialed by the dialer. This allows for setting up a load balancer for example, in front of the consul servers.	2022-07-21 14:56:11 -07:00
R.B. Boyer	61ebb38092	server: ensure peer replication can successfully use TLS over external gRPC (#13733 ) Ensure that the peer stream replication rpc can successfully be used with TLS activated. Also: - If key material is configured for the gRPC port but HTTPS is not enabled now TLS will still be activated for the gRPC port. - peerstream replication stream opened by the establishing-side will now ignore grpc.WithBlock so that TLS errors will bubble up instead of being awkwardly delayed or suppressed	2022-07-15 13:15:50 -05:00
Dan Upton	34140ff3e0	grpc: rename public/private directories to external/internal (#13721 ) Previously, public referred to gRPC services that are both exposed on the dedicated gRPC port and have their definitions in the proto-public directory (so were considered usable by 3rd parties). Whereas private referred to services on the multiplexed server port that are only usable by agents and other servers. Now, we're splitting these definitions, such that external/internal refers to the port and public/private refers to whether they can be used by 3rd parties. This is necessary because the peering replication API needs to be exposed on the dedicated port, but is not (yet) suitable for use by 3rd parties.	2022-07-13 16:33:48 +01:00
R.B. Boyer	5b801db24b	peering: move peer replication to the external gRPC port (#13698 ) Peer replication is intended to be between separate Consul installs and effectively should be considered "external". This PR moves the peer stream replication bidirectional RPC endpoint to the external gRPC server and ensures that things continue to function.	2022-07-08 12:01:13 -05:00
Chris S. Kim	0910c41d95	Revise possible states for a peering. (#13661 ) These changes are primarily for Consul's UI, where we want to be more specific about the state a peering is in. - The "initial" state was renamed to pending, and no longer applies to peerings being established from a peering token. - Upon request to establish a peering from a peering token, peerings will be set as "establishing". This will help distinguish between the two roles: the cluster that generates the peering token and the cluster that establishes the peering. - When marked for deletion, peering state will be set to "deleting". This way the UI determines the deletion via the state rather than the "DeletedAt" field. Co-authored-by: freddygv <freddy@hashicorp.com>	2022-07-04 10:47:58 -04:00
Daniel Upton	497df1ca3b	proxycfg: server-local config entry data sources This is the OSS portion of enterprise PR 2056. This commit provides server-local implementations of the proxycfg.ConfigEntry and proxycfg.ConfigEntryList interfaces, that source data from streaming events. It makes use of the LocalMaterializer type introduced for peering replication, adding the necessary support for authorization. It also adds support for "wildcard" subscriptions (within a topic) to the event publisher, as this is needed to fetch service-resolvers for all services when configuring mesh gateways. Currently, events will be emitted for just the ingress-gateway, service-resolver, and mesh config entry types, as these are the only entries required by proxycfg — the events will be emitted on topics named IngressGateway, ServiceResolver, and MeshConfig topics respectively. Though these events will only be consumed "locally" for now, they can also be consumed via the gRPC endpoint (confirmed using grpcurl) so using them from client agents should be a case of swapping the LocalMaterializer for an RPCMaterializer.	2022-07-04 10:48:36 +01:00
alex	90577810cc	peering: add imported/exported counts to peering (#13644 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-06-29 14:07:30 -07:00
alex	a8ae8de20e	peering: reconcile/ hint active state for list (#13619 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-29 09:43:50 -07:00
R.B. Boyer	2dba16be52	peering: replicate all SpiffeID values necessary for the importing side to do SAN validation (#13612 ) When traversing an exported peered service, the discovery chain evaluation at the other side may re-route the request to a variety of endpoints. Furthermore we intend to terminate mTLS at the mesh gateway for arriving peered traffic that is http-like (L7), so the caller needs to know the mesh gateway's SpiffeID in that case as well. The following new SpiffeID values will be shipped back in the peerstream replication: - tcp: all possible SpiffeIDs resulting from the service-resolver component of the exported discovery chain - http-like: the SpiffeID of the mesh gateway	2022-06-27 14:37:18 -05:00
R.B. Boyer	e7a7232a6b	state: peering ID assignment cannot happen inside of the state store (#13525 ) Move peering ID assignment outisde of the FSM, so that the ID is written to the raft log and the same ID is used by all voters, and after restarts.	2022-06-21 13:04:08 -05:00
freddygv	a288d0c388	Avoid deleting peerings marked as terminated. When our peer deletes the peering it is locally marked as terminated. This termination should kick off deleting all imported data, but should not delete the peering object itself. Keeping peerings marked as terminated acts as a signal that the action took place.	2022-06-14 15:37:09 -06:00
freddygv	a5283e4361	Add leader routine to clean up peerings Once a peering is marked for deletion a new leader routine will now clean up all imported resources and then the peering itself. A lot of the logic was grabbed from the namespace/partitions deferred deletions but with a handful of simplifications: - The rate limiting is not configurable. - Deleting imported nodes/services/checks is done by deleting nodes with the Txn API. The services and checks are deleted as a side-effect. - There is no "round rate limiter" like with namespaces and partitions. This is because peerings are purely local, and deleting a peering in the datacenter does not depend on deleting data from other DCs like with WAN-federated namespaces. All rate limiting is handled by the Raft rate limiter.	2022-06-14 15:36:50 -06:00
freddygv	6d368b5eed	Update peering state and RPC for deferred deletion When deleting a peering we do not want to delete the peering and all imported data in a single operation, since deleting a large amount of data at once could overload Consul. Instead we defer deletion of peerings so that: 1. When a peering deletion request is received via gRPC the peering is marked for deletion by setting the DeletedAt field. 2. A leader routine will monitor for peerings that are marked for deletion and kick off a throttled deletion of all imported resources before deleting the peering itself. This commit mostly addresses point #1 by modifying the peering service to mark peerings for deletion. Another key change is to add a PeeringListDeleted state store function which can return all peerings marked for deletion. This function is what will be watched by the deferred deletion leader routine.	2022-06-13 12:10:32 -06:00
R.B. Boyer	33b497e7c9	peering: rename initiate to establish in the context of the APIs (#13419 )	2022-06-10 11:10:46 -05:00
freddygv	ad6dbe081a	Add agent cache-type for TrustBundleListByService There are a handful of changes in this commit: * When querying trust bundles for a service we need to be able to specify the namespace of the service. * The endpoint needs to track the index because the cache watches use it. * Extracted bulk of the endpoint's logic to a state store function so that index tracking could be tested more easily. * Removed check for service existence, deferring that sort of work to ACL authz * Added the cache type	2022-06-01 17:05:10 -06:00

1 2

61 Commits