open-nomad/scheduler
Tim Gross b0c3b99b03
scheduler: fix quadratic performance with spread blocks (#11712)
When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.
2021-12-21 10:10:01 -05:00
..
annotate.go scheduler: label loops with nested switch statements for effective break (#8528) 2020-07-24 08:50:41 -04:00
annotate_test.go Deprecate IOPS 2018-12-06 15:09:26 -08:00
context.go scheduler: fix panic when preempting and evicting 2019-12-02 20:22:22 -08:00
context_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
device.go Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
device_test.go Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
feasible.go add support for host network interpolation 2021-04-13 09:53:05 -04:00
feasible_test.go add support for host network interpolation 2021-04-13 09:53:05 -04:00
generic_sched.go core: allow setting and propagation of eval priority on job de/registration (#11532) 2021-11-23 09:23:31 +01:00
generic_sched_test.go gofmt all the files 2021-10-01 10:14:28 -04:00
preemption.go Only preempt for network when there is a network 2019-06-07 18:55:55 -04:00
preemption_test.go Fix preemption panic (#11346) 2021-10-19 20:22:03 -04:00
propertyset.go server 2018-09-15 16:23:13 -07:00
rank.go chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
rank_test.go Allow configuring memory oversubscription (#10466) 2021-04-29 22:09:56 -04:00
reconcile.go core: allow setting and propagation of eval priority on job de/registration (#11532) 2021-11-23 09:23:31 +01:00
reconcile_test.go core: allow setting and propagation of eval priority on job de/registration (#11532) 2021-11-23 09:23:31 +01:00
reconcile_util.go chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
reconcile_util_test.go removed deprecated fields from Drain structs and API 2021-03-21 15:30:11 +00:00
scheduler.go core: implement system batch scheduler 2021-08-03 10:30:47 -04:00
scheduler_oss.go gofmt all the files 2021-10-01 10:14:28 -04:00
scheduler_sysbatch_test.go test: use Len instead of Equal on system and sysbatch node constraint tests 2021-09-02 11:36:02 -04:00
scheduler_system.go scheduler: fix panic in system jobs when nodes filtered by class (#11565) 2021-11-24 12:28:47 -05:00
scheduler_system_test.go scheduler: stop allocs in unrelated nodes (#11391) 2021-10-27 07:04:13 -07:00
select.go chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
select_test.go Implement affinity support in generic scheduler 2018-09-04 16:10:11 -05:00
spread.go More error->debug for logging in the bin packing iterator 2019-12-12 15:50:16 -06:00
spread_test.go scheduler: fix quadratic performance with spread blocks (#11712) 2021-12-21 10:10:01 -05:00
stack.go scheduler: fix quadratic performance with spread blocks (#11712) 2021-12-21 10:10:01 -05:00
stack_not_ent.go gofmt all the files 2021-10-01 10:14:28 -04:00
stack_test.go core: implement system batch scheduler 2021-08-03 10:30:47 -04:00
testing.go tests: use standard library testing.TB 2021-06-09 16:18:45 -07:00
util.go scheduler: stop allocs in unrelated nodes (#11391) 2021-10-27 07:04:13 -07:00
util_test.go scheduler: stop allocs in unrelated nodes (#11391) 2021-10-27 07:04:13 -07:00