open-nomad

Commit Graph

Author	SHA1	Message	Date
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Mahmood Ali	53e20e5cc2	Deflake TestRPC_Limits_Streaming test The test starts enough connections to hit the limit, then closes the connection and immediately starts one expecting the new one to succeed. We must wait until the server side recognizes the connection closing and free up a limits slot. The current test attempts to achieve that by waiting to get an error on conn.Read, however, this error is returned from local client without waiting for server update. As such, I change the logic so it retries on connection rejection but force the first non-EOF failure to be a deadline error.	2020-03-20 17:21:43 -04:00
Mahmood Ali	acbfeb5815	Simplify Bootstrap logic in tests This change updates tests to honor `BootstrapExpect` exclusively when forming test clusters and removes test only knobs, e.g. `config.DevDisableBootstrap`. Background: Test cluster creation is fragile. Test servers don't follow the BootstapExpected route like production clusters. Instead they start as single node clusters and then get rejoin and may risk causing brain split or other test flakiness. The test framework expose few knobs to control those (e.g. `config.DevDisableBootstrap` and `config.Bootstrap`) that control whether a server should bootstrap the cluster. These flags are confusing and it's unclear when to use: their usage in multi-node cluster isn't properly documented. Furthermore, they have some bad side-effects as they don't control Raft library: If `config.DevDisableBootstrap` is true, the test server may not immediately attempt to bootstrap a cluster, but after an election timeout (~50ms), Raft may force a leadership election and win it (with only one vote) and cause a split brain. The knobs are also confusing as Bootstrap is an overloaded term. In BootstrapExpect, we refer to bootstrapping the cluster only after N servers are connected. But in tests and the knobs above, it refers to whether the server is a single node cluster and shouldn't wait for any other server. Changes: This commit makes two changes: First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or `DevMode` flags. This change is relatively trivial. Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped. This allows us to keep `BootstrapExpected` immutable. Previously, the flag was a config value but it gets set to 0 after cluster bootstrap completes.	2020-03-02 13:47:43 -05:00
Michael Schurter	8c332a3757	Merge pull request #7102 from hashicorp/test-limits Fix some race conditions and flaky tests	2020-02-13 10:19:11 -08:00
Michael Schurter	c5073f61a7	test: add timeout to ease debugging	2020-02-07 15:50:53 -08:00
Michael Schurter	9905dec6a3	test: workaround limits race	2020-02-07 15:50:53 -08:00
Mahmood Ali	e106d373b2	rpc: Use MultiplexV2 for connections MultiplexV2 is a new connection multiplex header that supports multiplex both RPC and streaming requests over the same Yamux connection. MultiplexV2 was added in 0.8.0 as part of https://github.com/hashicorp/nomad/pull/3892 . So Nomad 0.11 can expect it to be supported. Though, some more rigorous testing is required before merging this. I want to call out some implementation details: First, the current connection pool reuses the Yamux stream for multiple RPC calls, and doesn't close them until an error is encountered. This commit doesn't change it, and sets the `RpcNomad` byte only at stream creation. Second, the StreamingRPC session gets closed by callers and cannot be reused. Every StreamingRPC opens a new Yamux session.	2020-02-03 19:31:39 -05:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Drew Bailey	a61bf32314	Allow nomad monitor command to lookup server UUID Allows addressing servers with nomad monitor using the servers name or ID. Also unifies logic for addressing servers for client_agent_endpoint commands and makes addressing logic region aware. rpc getServer test	2020-01-29 13:55:29 -05:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Mahmood Ali	d699a70875	Merge pull request #5911 from hashicorp/b-rpc-consistent-reads Block rpc handling until state store is caught up	2019-08-20 09:29:37 -04:00
Mahmood Ali	ad39bcef60	rpc: use tls wrapped connection for streaming rpc This ensures that server-to-server streaming RPC calls use the tls wrapped connections. Prior to this, `streamingRpcImpl` function uses tls for setting header and invoking the rpc method, but returns unwrapped tls connection. Thus, streaming writes fail with tls errors. This tls streaming bug existed since 0.8.0[1], but PR #5654[2] exacerbated it in 0.9.2. Prior to PR #5654, nomad client used to shuffle servers at every heartbeat -- `servers.Manager.setServers`[3] always shuffled servers and was called by heartbeat code[4]. Shuffling servers meant that a nomad client would heartbeat and establish a connection against all nomad servers eventually. When handling streaming RPC calls, nomad servers used these local connection to communicate directly to the client. The server-to-server forwarding logic was left mostly unexercised. PR #5654 means that a nomad client may connect to a single server only and caused the server-to-server forward streaming RPC code to get exercised more and unearthed the problem. [1] https://github.com/hashicorp/nomad/blob/v0.8.0/nomad/rpc.go#L501-L515 [2] https://github.com/hashicorp/nomad/pull/5654 [3] https://github.com/hashicorp/nomad/blob/v0.9.1/client/servers/manager.go#L198-L216 [4] https://github.com/hashicorp/nomad/blob/v0.9.1/client/client.go#L1603	2019-07-12 14:41:44 +08:00
Mahmood Ali	9c9bec62fd	rpc: add positive tests for server streaming RPC	2019-07-12 14:32:52 +08:00
Mahmood Ali	ea3a98357f	Block rpc handling until state store is caught up Here, we ensure that when leader only responds to RPC calls when state store is up to date. At leadership transition or launch with restored state, the server local store might not be caught up with latest raft logs and may return a stale read. The solution here is to have an RPC consistency read gate, enabled when `establishLeadership` completes before we respond to RPC calls. `establishLeadership` is gated by a `raft.Barrier` which ensures that all prior raft logs have been applied. Conversely, the gate is disabled when leadership is lost. This is very much inspired by https://github.com/hashicorp/consul/pull/3154/files	2019-07-02 16:07:37 +08:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Alex Dadgar	e8f89597f5	fix rpc test	2018-09-19 10:17:54 -07:00
Alex Dadgar	9971b3393f	yamux	2018-09-17 14:22:40 -07:00
Michael Schurter	0a17076ad2	refactor drainer into a subpkg	2018-03-21 16:51:44 -07:00
Alex Dadgar	f9fd1ce98c	fix unknown rpc tests	2018-02-15 15:29:31 -08:00
Alex Dadgar	a1faab0e58	Server TLS	2018-02-15 15:03:12 -08:00
Alex Dadgar	64ad3119d0	Implement MultiplexV2 RPC handling Implements and tests the V2 multiplexer. This will not be used until several versions of Nomad have been released to mitigate upgrade concerns.	2018-02-15 13:59:02 -08:00
Alex Dadgar	cea77df6a7	Add Streaming RPC ack This PR introduces an ack allowing the receiving end of the streaming RPC to return any error that may have occured during the establishment of the streaming RPC.	2018-02-15 13:59:02 -08:00
Alex Dadgar	6dd1c9f49d	Refactor	2018-02-15 13:59:00 -08:00
Alex Dadgar	a6dfffa4fa	Add testing interfaces	2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo	bf90176278	connection receives only EOF	2017-11-01 15:21:05 -05:00
Chelsea Holland Komlo	afe9f9a714	add rpc_upgrade_mode as config option for tls upgrades	2017-11-01 15:19:52 -05:00
Alex Dadgar	06eddf243c	parallel nomad tests	2017-07-25 17:39:36 -07:00
Armon Dadgar	75abbc74a5	nomad: modify forward RPC to hold when no known leader	2016-07-10 13:36:55 -04:00
Alex Dadgar	a56c808ff8	Update Nomad Client/Server RPC codecs to use custom msgpackHandle	2015-11-15 18:27:02 -08:00
Alex Dadgar	0bfc1977c3	TestRPC_forwardRegion waits for test servers to join	2015-10-08 16:32:42 -07:00
Armon Dadgar	2ff133c0e6	nomad: rename region1 to global. Fixes #41	2015-09-13 18:18:40 -07:00
Armon Dadgar	2ea99f211a	nomad: updating for new alloc representation	2015-08-25 17:36:52 -07:00
Armon Dadgar	dd8f5a0bef	nomad: testing RPC forwarding	2015-06-07 12:08:47 -07:00

34 Commits