open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	a51149736d	Rename `nomad.broker.total_blocked` metric (#15835 ) This changeset fixes a long-standing point of confusion in metrics emitted by the eval broker. The eval broker has a queue of "blocked" evals that are waiting for an in-flight ("unacked") eval of the same job to be completed. But this "blocked" state is not the same as the `blocked` status that we write to raft and expose in the Nomad API to end users. There's a second metric `nomad.blocked_eval.total_blocked` that refers to evaluations in that state. This has caused ongoing confusion in major customer incidents and even in our own documentation! (Fixed in this PR.) There's little functional change in this PR aside from the name of the metric emitted, but there's a bit refactoring to clean up the names in `eval_broker.go` so that there aren't name collisions and multiple names for the same state. Changes included are: * Everything that was previously called "pending" referred to entities that were associated witht he "ready" metric. These are all now called "ready" to match the metric. * Everything named "blocked" in `eval_broker.go` is now named "pending", except for a couple of comments that actually refer to blocked RPCs. * Added a note to the upgrade guide docs for 1.5.0. * Fixed the scheduling performance metrics docs because the description for `nomad.broker.total_blocked` was actually the description for `nomad.blocked_eval.total_blocked`.	2023-01-20 14:23:56 -05:00
Tim Gross	6415fb4284	eval broker: shed all but one blocked eval per job after ack (#14621 ) When an evaluation is acknowledged by a scheduler, the resulting plan is guaranteed to cover up to the `waitIndex` set by the worker based on the most recent evaluation for that job in the state store. At that point, we no longer need to retain blocked evaluations in the broker that are older than that index. Move all but the highest priority / highest `ModifyIndex` blocked eval into a canceled set. When the `Eval.Ack` RPC returns from the eval broker it will signal a reap of a batch of cancelable evals to write to raft. This paces the cancelations limited by how frequently the schedulers are acknowledging evals; this should reduce the risk of cancelations from overwhelming raft relative to scheduler progress. In order to avoid straggling batches when the cluster is quiet, we also include a periodic sweep through the cancelable list.	2022-11-16 16:10:11 -05:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Michael Schurter	d7e123d7cd	test: fix fake by increasing time window Test originally only had a 10ms time window tolerance. Increased to 100ms and also improved assertions and docstrings.	2021-09-28 12:22:59 -07:00
Lars Lehtonen	adbab29228	nomad: TestEvalBroker_Dequeue_Empty_Timeout() proper goroutine error handling (#6657 )	2019-11-08 14:35:06 -05:00
Lars Lehtonen	39b68e0b88	TestEvalBroker_Dequeue_Blocked() proper goroutine error handling (#6651 ) TestEvalBroker_Dequeue_Blocked() improve test readability	2019-11-08 08:52:23 -05:00
Lars Lehtonen	6deae70e35	TestEvalBroker_PauseResumeNackTimeout() proper goroutine error handling (#6649 ) TestEvalBroker_PauseResumeNackTimeout() improve test readability	2019-11-07 16:04:59 -05:00
Lars Lehtonen	2638cbb31d	nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() proper goroutine error handling (#6636 ) nomad: TestEvalBroker_EnqueueAll_Dequeue_Fair() improve test readability	2019-11-07 10:39:29 -05:00
Danielle Lancashire	2fb93a6229	evalbroker: test for no enqueue on disabled	2019-05-15 11:02:21 +02:00
Preetha Appan	1ab8f2b57a	Address some code review comments	2018-03-14 16:10:32 -05:00
Preetha Appan	7887f39ff4	Added a delay heap to track evals with WaitUntil set, and use in eval broker	2018-03-14 16:10:32 -05:00
Preetha Appan	9a4fcaaf34	Fix bug with not including namespace in indexing blocked evals	2018-03-13 13:23:11 -05:00
Josh Soref	f1d21bfdfe	spelling: enqueuing	2018-03-11 18:00:07 +00:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	06eddf243c	parallel nomad tests	2017-07-25 17:39:36 -07:00
Alex Dadgar	a9c8b09da8	Push to configs	2017-04-14 15:24:55 -07:00
Alex Dadgar	ef875f6dda	Delay Nack re-enqueue Add a delay when an evaluation is nacked that starts off small but compounds to a larger delay for subsequent Nacks. This creates some back pressure.	2017-04-12 13:41:40 -07:00
Alex Dadgar	fd3e469d5e	Remove requeue because it is a subset of EnqueueAll now	2016-06-24 10:14:34 -07:00
Alex Dadgar	b7e3a45fef	fix channel being nil on restore	2016-06-07 15:03:08 -07:00
Alex Dadgar	1f9f015c1b	Fix race condition in which a reblocked evaluation could be dropped	2016-05-27 16:53:10 -07:00
Alex Dadgar	1c6d3e129a	EnqueueAll inserts all evaluations before unblocking dequeue calls	2016-05-18 12:13:59 -07:00
Alex Dadgar	045f7807e0	eval_broker.Enqueue no longer returns an error	2016-05-18 11:35:15 -07:00
Alex Dadgar	74726278b9	core: Pause NackTimeout while in the plan_queue as progress is being made	2016-03-04 12:59:35 -08:00
Alex Dadgar	d2e88f0116	Fix panic when Ack occurs at delivery limit	2016-02-11 11:07:18 -08:00
Alex Dadgar	cbb1992929	Make a zero timeout for eval_broker.Dequeue() block	2015-11-23 11:59:49 -08:00
Alex Dadgar	af7d896c2a	nil-pointer dereference with empty timeout Dequeue	2015-11-20 16:49:55 -08:00
Armon Dadgar	b9bb7bdaaa	nomad: OutstandingReset returns specific errors	2015-10-23 10:22:17 -07:00
Armon Dadgar	16fd84f25a	nomad: Adding OutstandingReset to EvalBroker	2015-10-23 10:14:16 -07:00
Armon Dadgar	fe2e046481	nomad: support time wait for evaluations	2015-09-07 13:00:45 -07:00
Armon Dadgar	79a1471b85	nomad: add delivery limit to eval broker	2015-08-16 10:55:55 -07:00
Armon Dadgar	183a238481	nomad: avoid split-brain eval handling after leader transition	2015-08-12 15:25:31 -07:00
Armon Dadgar	343b1b9c89	nomad: move state and mocks into shared packages	2015-08-11 14:27:14 -07:00
Armon Dadgar	011484ea74	nomad: eval broker serializes by JobID	2015-08-05 17:55:15 -07:00
Armon Dadgar	078d80432a	nomad: deduplicate enqueue of evaluations	2015-08-05 17:06:02 -07:00
Armon Dadgar	2f05a260bb	nomad: add checking if eval broker enabled	2015-08-05 16:41:39 -07:00
Armon Dadgar	df6ab9898b	nomad: fixing tests	2015-08-04 17:11:20 -07:00
Armon Dadgar	7f49f25de1	nomad: ensure FIFO dequeue with same priority	2015-07-23 22:58:12 -07:00
Armon Dadgar	7aad418345	nomad: method to test if outstanding evaluation	2015-07-23 21:58:13 -07:00
Armon Dadgar	fc11808fe9	nomad: add eval broker, configurable nack timeout	2015-07-23 21:44:17 -07:00
Armon Dadgar	5fcc39c399	nomad: testing the eval broker	2015-07-23 21:37:28 -07:00
Armon Dadgar	2a76fd34f0	nomad: first pass at eval broker	2015-07-23 17:31:08 -07:00

41 Commits