open-nomad

Author	SHA1	Message	Date
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Mahmood Ali	cf53ee57cd	remove unused dropButLastChannel	2020-02-13 18:56:53 -05:00
Mahmood Ali	687d2b7054	tests: defer closing shutdownCh	2020-01-28 09:53:48 -05:00
Mahmood Ali	79823ae07d	handle channel close signal Always deliver last value then send close signal.	2020-01-28 09:44:34 -05:00
Mahmood Ali	d202924a93	include test and address review comments	2020-01-28 09:06:52 -05:00
Mahmood Ali	e436d2701a	Handle Nomad leadership flapping Fixes a deadlock in leadership handling if leadership flapped. Raft propagates leadership transition to Nomad through a NotifyCh channel. Raft blocks when writing to this channel, so channel must be buffered or aggressively consumed[1]. Otherwise, Raft blocks indefinitely in `raft.runLeader` until the channel is consumed[1] and does not move on to executing follower related logic (in `raft.runFollower`). While Raft `runLeader` defer function blocks, raft cannot process any other raft operations. For example, `run{Leader\|Follower}` methods consume `raft.applyCh`, and while runLeader defer is blocked, all raft log applications or config lookup will block indefinitely. Sadly, `leaderLoop` and `establishLeader` makes few Raft calls! `establishLeader` attempts to auto-create autopilot/scheduler config [3]; and `leaderLoop` attempts to check raft configuration [4]. All of these calls occur without a timeout. Thus, if leadership flapped quickly while `leaderLoop/establishLeadership` is invoked and hit any of these Raft calls, Raft handler _deadlock_ forever. Depending on how many times it flapped and where exactly we get stuck, I suspect it's possible to get in the following case: * Agent metrics/stats http and RPC calls hang as they check raft.Configurations * raft.State remains in Leader state, and server attempts to handle RPC calls (e.g. node/alloc updates) and these hang as well As we create goroutines per RPC call, the number of goroutines grow over time and may trigger a out of memory errors in addition to missed updates. [1] `d90d6d6bda/config.go (L190-L193)` [2] `d90d6d6bda/raft.go (L425-L436)` [3] `2a89e47746/nomad/leader.go (L198-L202)` [4] `2a89e47746/nomad/leader.go (L877)`	2020-01-22 13:08:34 -05:00
Lang Martin	10848841be	util partitionAll for paging	2019-07-10 13:56:19 -04:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Alex Dadgar	5009566503	do not bootstrap with non voters	2018-09-19 17:17:39 -07:00
Kyle Havlovitz	0eb0acacdc	Fix remaining issues with autopilot change	2018-01-30 15:21:28 -08:00
Kyle Havlovitz	1c07066064	Add autopilot functionality based on Consul's autopilot	2017-12-18 14:29:41 -08:00
Kyle Havlovitz	045f346293	Use region instead of datacenter for version checking	2017-12-12 10:17:16 -06:00
Kyle Havlovitz	b775fc7b33	Added support for v2 raft APIs and -raft-protocol option	2017-12-12 10:17:16 -06:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	06eddf243c	parallel nomad tests	2017-07-25 17:39:36 -07:00
Sean Chittenden	bff57a0dce	Reconcile, clean up, and centralize API version numbers (major and minor). Reduce future confusion by introducing a minor version that is gossiped out via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for to communicate `Major Version Number`). Background: hashicorp/consul/issues/1346#issuecomment-151663152	2016-06-10 15:50:11 -04:00
Sean Chittenden	e36686a17d	Use consul/lib's RandomStagger Removes four redundant copies of the method in the process.	2016-06-10 15:48:36 -04:00
Sean Chittenden	e0e7d94450	Use consul/lib's RateScaledInterval	2016-06-10 15:48:36 -04:00
Armon Dadgar	ea0795995d	Use a single implementation of GenerateUUID	2015-09-07 15:23:03 -07:00
Armon Dadgar	e489ee8ebd	nomad: add rate based scaling util methods	2015-08-22 17:12:24 -07:00
Armon Dadgar	8913a42674	nomad: move and test max function	2015-08-04 17:13:40 -07:00
Armon Dadgar	890db2d2b7	nomad: adding utility shuffle	2015-07-23 17:30:07 -07:00
Armon Dadgar	e8964a4975	nomad: adding utility methods	2015-06-06 00:14:08 +02:00
Armon Dadgar	4f6ecce727	nomad: working on serf member parsing	2015-06-03 13:35:48 +02:00

26 commits