open-consul

Commit Graph

Author	SHA1	Message	Date
Dan Upton	52ce151221	Raft storage backend (#16619 )	2023-04-04 17:30:06 +01:00
Ronald	dd0e8eec14	copyright headers for agent folder (#16704 ) * copyright headers for agent folder * Ignore test data files * fix proto files and remove headers in agent/uiserver folder * ignore deep-copy files	2023-03-28 14:39:22 -04:00
Dhia Ayachi	0f3e935228	Net 2229/rpc reduce max retries 2 (#16165 ) * feat: calculate retry wait time with exponential back-off * test: add test for getWaitTime method * feat: enforce random jitter between min value from previous iteration and current * extract randomStagger to simplify tests and use Milliseconds to avoid float math. * rename variables * add test and rename comment --------- Co-authored-by: Poonam Jadhav <poonam.jadhav@hashicorp.com>	2023-02-06 14:07:41 -05:00
Poonam Jadhav	59ca3b8332	feat: client RPC is retries on ErrRetryElsewhere error and forwardRequestToLeader method retries ErrRetryLater error (#16099 )	2023-02-06 11:31:25 -05:00
Dhia Ayachi	9e99715ee8	avoid logging RPC errors when it's specific rate limiter errors (#15968 ) * avoid logging RPC errors when it's specific rate limiter errors * simplify if statements	2023-01-16 12:08:09 -05:00
Dan Upton	f24f8c681f	Rate limit improvements and fixes (#15917 ) - Fixes a panic when Operation.SourceAddr is nil (internal net/rpc calls) - Adds proper HTTP response codes (429 and 503) for rate limit errors - Makes the error messages clearer - Enables automatic retries for rate-limit errors in the net/rpc stack	2023-01-09 10:20:05 +00:00
Chris S. Kim	ae1646706f	Regenerate files according to 1.19.2 formatter	2022-10-24 16:12:08 -04:00
FFMMM	3c08843847	[sync oss] add net/rpc interceptor implementation (#12573 ) * sync ent changes from 866dcb0667 Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * update oss go.mod Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-03-17 16:02:26 -07:00
Eric	ae1cdc85b1	Remove the stdduration gogo extension	2022-03-16 12:12:29 -04:00
R.B. Boyer	3804677570	server: suppress spurious blocking query returns where multiple config entries are involved (#12362 ) Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - `DiscoveryChain.Get` - `ConfigEntry.ResolveServiceConfig` - `Intentions.Match` All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. ### No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. ### No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report. ### Extra Endpoints Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated: - `ConfigEntry.List` - `Internal.IntentionUpstreams` (tproxy)	2022-02-25 15:46:34 -06:00
Daniel Nephin	cb1a80184f	rpc: set response to nil when not found Otherwise when the query times out we might incorrectly send a value for the reply, when we should send an empty reply. Also document errNotFound and how to handle the result in that case.	2022-02-18 12:26:06 -05:00
Daniel Nephin	bdafa24c50	Make blockingQuery efficient with 'not found' results. By using the query results as state. Blocking queries are efficient when the query matches some results, because the ModifyIndex of those results, returned as queryMeta.Mindex, will never change unless the items themselves change. Blocking queries for non-existent items are not efficient because the queryMeta.Index can (and often does) change when other entities are written. This commit reduces the churn of these queries by using a different comparison for "has changed". Instead of using the modified index, we use the existence of the results. If the previous result was "not found" and the new result is still "not found", we know we can ignore the modified index and continue to block. This is done by setting the minQueryIndex to the returned queryMeta.Index, which prevents the query from returning before a state change is observed.	2022-02-15 18:24:33 -05:00
Daniel Nephin	a4e1c59cd8	rpc: improve docs for blockingQuery Follow the Go convention of accepting a small interface that documents the methods used by the function. Clarify the rules for implementing a query function passed to blockingQuery.	2022-02-15 14:20:14 -05:00
FFMMM	1f8fb17be7	Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311 ) This commit syncs ENT changes to the OSS repo. Original commit details in ENT: ``` commit 569d25f7f4578981c3801e6e067295668210f748 Author: FFMMM <FFMMM@users.noreply.github.com> Date: Thu Feb 10 10:23:33 2022 -0800 Vendor fork net rpc (#1538) * replace net/rpc w consul-net-rpc/net/rpc Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * replace msgpackrpc and go-msgpack with fork from mono repo Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> * gofmt all files touched Signed-off-by: FFMMM <FFMMM@users.noreply.github.com> ``` Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>	2022-02-14 09:45:45 -08:00
Daniel Nephin	59206e38c7	rpc: cleanup exit and blocking condition logic in blockingQuery Remove some unnecessary comments around query_blocking metric. The only line that needs any comments in the atomic decrement. Cleanup the block and return comments and logic. The old comment about AbandonCh may have been relevant before, but it is expected behaviour now. The logic was simplified by inverting the err condition.	2022-01-17 16:59:25 -05:00
Daniel Nephin	a28d1268cb	rpc: extract rpcQueryTimeout method This helps keep the logic in blockingQuery more focused. In the future we may have a separate struct for RPC queries which may allow us to move this off of Server.	2022-01-17 16:59:25 -05:00
Daniel Nephin	751bc2e7d3	rpc: move the index defaulting to setQueryMeta. This safeguard should be safe to apply in general. We are already applying it to non-blocking queries that call blockingQuery, so it should be fine to apply it to others.	2022-01-17 16:59:25 -05:00
Daniel Nephin	6bf8efe607	rpc: refactor blocking query To remove the TODO, and make it more readable. In general this reduces the scope of variables, making them easier to reason about. It also introduces more early returns so that we can see the flow from the structure of the function.	2022-01-17 16:58:47 -05:00
Evan Culver	32a04317bf	rpc: Unset partition before forwarding to remote datacenter (#11758 )	2021-12-08 11:02:14 -08:00
Dan Upton	0efe478044	Groundwork for exposing when queries are filtered by ACLs (#11569 )	2021-12-03 17:11:26 +00:00
R.B. Boyer	ba13416b57	grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099 ) Fixes #11086	2021-09-22 13:14:26 -05:00
Evan Culver	93f94ac24f	rpc: authorize raft requests (#10925 )	2021-08-26 15:04:32 -07:00
R.B. Boyer	a84f5fa25d	grpc: ensure that streaming gRPC requests work over mesh gateway based wan federation (#10838 ) Fixes #10796	2021-08-24 16:28:44 -05:00
Daniel Nephin	86244967c5	docs: correct some misleading telemetry docs The query metrics are actually reported for all read queries, not only ones that use a MinIndex to block for updates. Also clarify the raft.apply metric is only on the leader.	2021-06-28 12:20:53 -04:00
Dhia Ayachi	0c13f80d5a	RPC Timeout/Retries account for blocking requests (#8978 )	2021-05-27 17:29:43 -04:00
Daniel Nephin	4905ac6f44	rpc: add tests for canRetry Also accept an RPCInfo instead of interface{}. Accepting an interface lead to a bug where the caller was expecting the arg to be the response when in fact it was always passed the request. By accepting RPCInfo it should indicate that this is actually the request value. One caller of canRetry already passed an RPCInfo, the second handles the type assertion before calling canRetry.	2021-05-06 13:30:07 -04:00
Daniel Nephin	c38f4869ad	rpc: remove unnecessary arg to ForwardRPC	2021-05-06 13:30:07 -04:00
Daniel Nephin	6d1a5b3629	Handle ErrChunkingResubmit.Error properly Previously canRetry was attempting to retrieve this error from args, however there was never any callers that would pass an error to args. With the change to raftApply to move this error to the error return value, it is now possible to receive this error from the err argument. This commit updates canRetry to check for ErrChunkingResubmit in err.	2021-04-20 13:29:31 -04:00
Daniel Nephin	8654adfc53	Handle FSM.Apply errors in raftApply Previously we were inconsistently checking the response for errors. This PR moves the response-is-error check into raftApply, so that all callers can look at only the error response, instead of having to know that errors could come from two places. This should expose a few more errors that were previously hidden because in some calls to raftApply we were ignoring the response return value. Also handle errors more consistently. In some cases we would log the error before returning it. This can be very confusing because it can result in the same error being logged multiple times. Instead return a wrapped error.	2021-04-20 13:29:29 -04:00
Matt Keeler	3a79b559f9	Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487 ) This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error. That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.	2021-01-04 14:05:23 -05:00
Kit Patella	0b18f5612e	trim help strings to save a few bytes	2020-11-16 11:02:11 -08:00
Kit Patella	af719981f3	finish adding static server metrics	2020-11-13 16:26:08 -08:00
Kit Patella	b486c1bce8	add the service name in the agent rather than in the definitions themselves	2020-11-13 13:18:04 -08:00
Kit Patella	9533372ded	first pass on agent-configured prometheusDefs and adding defs for every consul metric	2020-11-12 18:12:12 -08:00
Daniel Nephin	863a9df951	server: add gRPC server for streaming events Includes a stats handler and stream interceptor for grpc metrics. Co-authored-by: Paul Banks <banks@banksco.de>	2020-09-08 12:10:41 -04:00
Hans Hasselberg	e0297b6e99	Refactor keyring ops: * changes some functions to return data instead of modifying pointer arguments * renames globalRPC() to keyringRPCs() to make its purpose more clear * restructures KeyringOperation() to make it more understandable	2020-08-11 13:42:03 +02:00
Daniel Nephin	73493ca01b	Pass a logger to ConnPool and yamux, instead of an io.Writer Allowing us to remove the LogOutput field from config.	2020-08-05 13:25:08 -04:00
Matt Keeler	a77ed471c8	Rename (Server).forward to (Server).ForwardRPC Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.	2020-07-08 11:05:44 -04:00
Matt Keeler	90e741c6d2	Allow cancelling blocking queries in response to shutting down.	2020-06-24 17:09:50 -04:00
Hans Hasselberg	9ef44ec3da	pool: remove version The version field has been used to decide which multiplexing to use. It was introduced in 2457293dceec95ecd12ef4f01442e13710ea131a. But this is 6y ago and there is no need for this differentiation anymore.	2020-05-28 23:06:01 +02:00
Hans Hasselberg	6626cb69d6	rpc: oss changes for network area connection pooling (#7735 )	2020-04-30 22:12:17 +02:00
R.B. Boyer	10d3ff9a4f	server: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#7419 ) Fixes #7414	2020-03-10 11:15:22 -05:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
R.B. Boyer	919741838d	fix use of hclog logger (#7264 )	2020-02-12 09:37:16 -06:00
Kit Patella	d28bc1acbe	rpc: measure blocking queries (#7224 ) * agent: measure blocking queries * agent.rpc: update docs to mention we only record blocking queries * agent.rpc: make go fmt happy * agent.rpc: fix non-atomic read and decrement with bitwise xor of uint64 0 * agent.rpc: clarify review question * agent.rpc: today I learned that one must declare all variables before interacting with goto labels * Update agent/consul/server.go agent.rpc: more precise comment on `Server.queriesBlocking` Co-Authored-By: Paul Banks <banks@banksco.de> * Update website/source/docs/agent/telemetry.html.md agent.rpc: improve queries_blocking description Co-Authored-By: Paul Banks <banks@banksco.de> * agent.rpc: fix some bugs found in review * add a note about the updated counter behavior to telemetry.md * docs: add upgrade-specific note on consul.rpc.quer{y,ies_blocking} behavior Co-authored-by: Paul Banks <banks@banksco.de>	2020-02-10 10:01:15 -08:00
R.B. Boyer	b4325dfbce	agent: ensure that we always use the same settings for msgpack (#7245 ) We set RawToString=true so that []uint8 => string when decoding an interface{}. We set the MapType so that map[interface{}]interface{} decodes to map[string]interface{}. Add tests to ensure that this doesn't break existing usages. Fixes #7223	2020-02-07 15:50:24 -06:00
Hans Hasselberg	50281032e0	Security fixes (#7182 ) * Mitigate HTTP/RPC Services Allow Unbounded Resource Usage Fixes #7159. Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Paul Banks <banks@banksco.de>	2020-01-31 11:19:37 -05:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matej Urbas	d877e091d6	agent: configurable MaxQueryTime and DefaultQueryTime. (#3777 )	2020-01-17 14:20:57 +01:00
Pierre Souchay	61fc4f8253	rpc: log method when a server/server RPC call fails (#4548 ) Sometimes, we have lots of errors in cross calls between DCs (several hundreds / sec) Enrich the log in order to help diagnose the root cause of issue.	2020-01-13 19:55:29 +01:00

1 2

77 Commits