open-consul

Commit Graph

Author	SHA1	Message	Date
Matt Keeler	f3c80c4eef	Protobuf Refactoring for Multi-Module Cleanliness (#16302 ) Protobuf Refactoring for Multi-Module Cleanliness This commit includes the following: Moves all packages that were within proto/ to proto/private Rewrites imports to account for the packages being moved Adds in buf.work.yaml to enable buf workspaces Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes) Why: In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage. There were some recent changes to have our own ratelimiting annotations. The two combined were not working when I was trying to use them together (attempting to rebase another branch) Buf workspaces should be the solution to the problem Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root. This resulted in proto file name conflicts in the Go global protobuf type registry. The solution to that was to add in a private/ directory into the path within the proto/ directory. That then required rewriting all the imports. Is this safe? AFAICT yes The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc) Other than imports, there were no changes to any generated code as a result of this.	2023-02-17 16:14:46 -05:00
Dhia Ayachi	0f3e935228	Net 2229/rpc reduce max retries 2 (#16165 ) * feat: calculate retry wait time with exponential back-off * test: add test for getWaitTime method * feat: enforce random jitter between min value from previous iteration and current * extract randomStagger to simplify tests and use Milliseconds to avoid float math. * rename variables * add test and rename comment --------- Co-authored-by: Poonam Jadhav <poonam.jadhav@hashicorp.com>	2023-02-06 14:07:41 -05:00
Poonam Jadhav	59ca3b8332	feat: client RPC is retries on ErrRetryElsewhere error and forwardRequestToLeader method retries ErrRetryLater error (#16099 )	2023-02-06 11:31:25 -05:00
Chris S. Kim	82d6d12a13	Output user-friendly name for anonymous token (#15884 )	2023-01-09 12:28:53 -06:00
Dan Upton	f24f8c681f	Rate limit improvements and fixes (#15917 ) - Fixes a panic when Operation.SourceAddr is nil (internal net/rpc calls) - Adds proper HTTP response codes (429 and 503) for rate limit errors - Makes the error messages clearer - Enables automatic retries for rate-limit errors in the net/rpc stack	2023-01-09 10:20:05 +00:00
Dan Upton	15c7c03fa5	grpc: switch servers and retry on error (#15892 ) This is the OSS portion of enterprise PR 3822. Adds a custom gRPC balancer that replicates the router's server cycling behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors, which we now get for free.	2023-01-05 10:21:27 +00:00
Semir Patel	1f82e82e04	Pass remote addr of incoming HTTP requests through to RPC(..) calls (#15700 )	2022-12-14 09:24:22 -06:00
Kyle Schochenmaier	2b1e5f69e2	removes ioutil usage everywhere which was deprecated in go1.16 (#15297 ) * update go version to 1.18 for api and sdk, go mod tidy * removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward. Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-11-10 10:26:01 -06:00
Chris S. Kim	e4c20ec190	Refactor client RPC timeouts (#14965 ) Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.	2022-10-18 15:05:09 -04:00
Dan Upton	34140ff3e0	grpc: rename public/private directories to external/internal (#13721 ) Previously, public referred to gRPC services that are both exposed on the dedicated gRPC port and have their definitions in the proto-public directory (so were considered usable by 3rd parties). Whereas private referred to services on the multiplexed server port that are only usable by agents and other servers. Now, we're splitting these definitions, such that external/internal refers to the port and public/private refers to whether they can be used by 3rd parties. This is necessary because the peering replication API needs to be exposed on the dedicated port, but is not (yet) suitable for use by 3rd parties.	2022-07-13 16:33:48 +01:00
Daniel Upton	497df1ca3b	proxycfg: server-local config entry data sources This is the OSS portion of enterprise PR 2056. This commit provides server-local implementations of the proxycfg.ConfigEntry and proxycfg.ConfigEntryList interfaces, that source data from streaming events. It makes use of the LocalMaterializer type introduced for peering replication, adding the necessary support for authorization. It also adds support for "wildcard" subscriptions (within a topic) to the event publisher, as this is needed to fetch service-resolvers for all services when configuring mesh gateways. Currently, events will be emitted for just the ingress-gateway, service-resolver, and mesh config entry types, as these are the only entries required by proxycfg — the events will be emitted on topics named IngressGateway, ServiceResolver, and MeshConfig topics respectively. Though these events will only be consumed "locally" for now, they can also be consumed via the gRPC endpoint (confirmed using grpcurl) so using them from client agents should be a case of swapping the LocalMaterializer for an RPCMaterializer.	2022-07-04 10:48:36 +01:00
R.B. Boyer	9ad10318cd	add general runstep test helper instead of copying it all over the place (#13013 )	2022-05-10 15:25:51 -05:00
Will Jordan	45ffdc360e	Add timeout to Client RPC calls (#11500 ) Adds a timeout (deadline) to client RPC calls, so that streams will no longer hang indefinitely in unstable network conditions. Co-authored-by: kisunji <ckim@hashicorp.com>	2022-04-21 16:21:35 -04:00
Matt Keeler	3447880091	Enable running autopilot state updates on all servers (#12617 ) * Fixes a lint warning about t.Errorf not supporting %w * Enable running autopilot on all servers On the non-leader servers all they do is update the state and do not attempt any modifications. * Fix the RPC conn limiting tests Technically they were relying on racey behavior before. Now they should be reliable.	2022-04-07 10:48:48 -04:00
Mark Anderson	ed3e42296d	Fixup acl.EnterpriseMeta Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-04-05 15:11:49 -07:00
Dan Upton	fb441e323a	Restructure gRPC server setup (#12586 ) OSS sync of enterprise changes at 0b44395e	2022-03-22 12:40:24 +00:00
Dan Upton	57f0f42733	Support per-listener TLS configuration ⚙️ (#12504 ) Introduces the capability to configure TLS differently for Consul's listeners/ports (i.e. HTTPS, gRPC, and the internal multiplexed RPC port) which is useful in scenarios where you may want the HTTPS or gRPC interfaces to present a certificate signed by a well-known/public CA, rather than the certificate used for internal communication which must have a SAN in the form `server.<dc>.consul`.	2022-03-18 10:46:58 +00:00
Eric	ae1cdc85b1	Remove the stdduration gogo extension	2022-03-16 12:12:29 -04:00
R.B. Boyer	b63a0f3909	reduce flakiness/raciness of errNotFound and errNotChanged blocking query tests (#12518 ) Improves tests from #12362 These tests try to setup the following concurrent scenario: 1. (goroutine 1) execute read RPC with index=0 2. (goroutine 1) get response from (1) @ index=10 3. (goroutine 1) execute read RPC with index=10 and block 4. (goroutine 2) WHILE (3) is blocking, start slamming the system with stray writes that will cause the WatchSet to wakeup 5. (goroutine 2) after doing all writes, shut down the reader above 6. (goroutine 1) stops reading, double checks that it only ever woke up once (from 1)	2022-03-04 11:20:01 -06:00
R.B. Boyer	3804677570	server: suppress spurious blocking query returns where multiple config entries are involved (#12362 ) Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - `DiscoveryChain.Get` - `ConfigEntry.ResolveServiceConfig` - `Intentions.Match` All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. ### No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. ### No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report. ### Extra Endpoints Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated: - `ConfigEntry.List` - `Internal.IntentionUpstreams` (tproxy)	2022-02-25 15:46:34 -06:00
Daniel Nephin	bdafa24c50	Make blockingQuery efficient with 'not found' results. By using the query results as state. Blocking queries are efficient when the query matches some results, because the ModifyIndex of those results, returned as queryMeta.Mindex, will never change unless the items themselves change. Blocking queries for non-existent items are not efficient because the queryMeta.Index can (and often does) change when other entities are written. This commit reduces the churn of these queries by using a different comparison for "has changed". Instead of using the modified index, we use the existence of the results. If the previous result was "not found" and the new result is still "not found", we know we can ignore the modified index and continue to block. This is done by setting the minQueryIndex to the returned queryMeta.Index, which prevents the query from returning before a state change is observed.	2022-02-15 18:24:33 -05:00
FFMMM	1f8fb17be7	Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311 ) This commit syncs ENT changes to the OSS repo. Original commit details in ENT: ``` commit 569d25f7f4578981c3801e6e067295668210f748 Author: FFMMM <FFMMM@users.noreply.github.com> Date: Thu Feb 10 10:23:33 2022 -0800 Vendor fork net rpc (#1538) * replace net/rpc w consul-net-rpc/net/rpc Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * replace msgpackrpc and go-msgpack with fork from mono repo Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * gofmt all files touched Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> ``` Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-02-14 09:45:45 -08:00
Daniel Nephin	95e471052b	rpc: add subtests to blockingQuery test	2022-01-17 16:59:25 -05:00
Dan Upton	8bc11b08dc	Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746 )	2021-12-07 12:39:28 +00:00
Dan Upton	0efe478044	Groundwork for exposing when queries are filtered by ACLs (#11569 )	2021-12-03 17:11:26 +00:00
Dhia Ayachi	4d763ef9e6	regenerate expired certs (#11462 ) * regenerate expired certs * add documentation to generate tests certificates	2021-11-01 11:40:16 -04:00
Daniel Nephin	ebb2388605	acl: remove legacy ACL upgrades from Server As part of removing the legacy ACL system	2021-09-29 15:19:23 -04:00
R.B. Boyer	ba13416b57	grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099 ) Fixes #11086	2021-09-22 13:14:26 -05:00
Hans Hasselberg	24c6ce0be0	tls: consider presented intermediates during server connection tls handshake. (#10964 ) * use intermediates when verifying * extract connection state * remove useless import * add changelog entry * golint * better error * wording * collect errors * use SAN.DNSName instead of CommonName * Add test for unknown intermediate * improve changelog entry	2021-09-09 21:48:54 +02:00
Evan Culver	93f94ac24f	rpc: authorize raft requests (#10925 )	2021-08-26 15:04:32 -07:00
R.B. Boyer	a84f5fa25d	grpc: ensure that streaming gRPC requests work over mesh gateway based wan federation (#10838 ) Fixes #10796	2021-08-24 16:28:44 -05:00
Daniel Nephin	75baa22e64	acl: remove ACLResolver config fields from consul.Config	2021-08-17 13:32:52 -04:00
Daniel Nephin	1e23d181b5	config: remove misleading UseTLS field This field was documented as enabling TLS for outgoing RPC, but that was not the case. All this field did was set the use_tls serf tag. Instead of setting this field in a place far from where it is used, move the logic to where the serf tag is set, so that the code is much more obvious.	2021-07-09 19:01:45 -04:00
Daniel Nephin	3c60a46376	config: remove duplicate TLSConfig fields from agent/consul.Config tlsutil.Config already presents an excellent structure for this configuration. Copying the runtime config fields to agent/consul.Config makes code harder to trace, and provides no advantage. Instead of copying the fields around, use the tlsutil.Config struct directly instead. This is one small step in removing the many layers of duplicate configuration.	2021-07-09 18:49:42 -04:00
Dhia Ayachi	0c13f80d5a	RPC Timeout/Retries account for blocking requests (#8978 )	2021-05-27 17:29:43 -04:00
Daniel Nephin	4905ac6f44	rpc: add tests for canRetry Also accept an RPCInfo instead of interface{}. Accepting an interface lead to a bug where the caller was expecting the arg to be the response when in fact it was always passed the request. By accepting RPCInfo it should indicate that this is actually the request value. One caller of canRetry already passed an RPCInfo, the second handles the type assertion before calling canRetry.	2021-05-06 13:30:07 -04:00
Daniel Nephin	e8427a48ab	agent/consuk: Rename RPCRate -> RPCRateLimit so that the field name is consistent across config structs.	2021-01-14 17:26:00 -05:00
Daniel Nephin	e5320c2db6	agent/consul: make Client/Server config reloading more obvious I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value. Also improve the test runtime by not doing a lot of unnecessary work.	2021-01-14 17:21:10 -05:00
Matt Keeler	3a79b559f9	Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487 ) This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error. That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.	2021-01-04 14:05:23 -05:00
Daniel Nephin	ef0999547a	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
Matt Keeler	1f40f51a58	Fix a bunch of linter warnings	2020-11-09 09:22:12 -05:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
R.B. Boyer	16db20b1f3	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
R.B. Boyer	10d3ff9a4f	server: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#7419 ) Fixes #7414	2020-03-10 11:15:22 -05:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
R.B. Boyer	c37d00791c	make the TestRPC_RPCMaxConnsPerClient test less flaky (#7255 )	2020-02-10 15:13:53 -06:00
Hans Hasselberg	50281032e0	Security fixes (#7182 ) * Mitigate HTTP/RPC Services Allow Unbounded Resource Usage Fixes #7159. Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Paul Banks <banks@banksco.de>	2020-01-31 11:19:37 -05:00
Alvin Huang	96c2c79908	Add fmt and vet (#5671 ) * add go fmt and vet * go fmt fixes	2019-04-25 12:26:33 -04:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	a41c865059	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00

1 2

60 Commits