open-nomad/scheduler
Luiz Aoqui 8070882c4b
scheduler: fix reconciliation of reconnecting allocs (#16609)
When a disconnect client reconnects the `allocReconciler` must find the
allocations that were created to replace the original disconnected
allocations.

This process was being done in only a subset of non-terminal untainted
allocations, meaning that, if the replacement allocations were not in
this state the reconciler didn't stop them, leaving the job in an
inconsistent state.

This inconsistency is only solved in a future job evaluation, but at
that point the allocation is considered reconnected and so the specific
reconnection logic was not applied, leading to unexpected outcomes.

This commit fixes the problem by running reconnecting allocation
reconciliation logic earlier into the process, leaving the rest of the
reconciler oblivious of reconnecting allocations.

It also uses the full set of allocations to search for replacements,
stopping them even if they are not in the `untainted` set.

The system `SystemScheduler` is not affected by this bug because
disconnected clients don't trigger replacements: every eligible client
is already running an allocation.
2023-03-24 19:38:31 -04:00
..
benchmarks build: run gofmt on all go source files 2022-08-16 11:14:11 -05:00
annotate.go build: run gofmt on all go source files 2022-08-16 11:14:11 -05:00
annotate_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
context.go scheduler: stopped-yet-running allocs are still running (#10446) 2022-09-13 12:52:47 -07:00
context_test.go build: run gofmt on all go source files 2022-08-16 11:14:11 -05:00
device.go
device_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
feasible.go Allow per_alloc to be used with host volumes (#15780) 2023-01-26 09:14:47 -05:00
feasible_test.go Allow per_alloc to be used with host volumes (#15780) 2023-01-26 09:14:47 -05:00
generic_sched.go renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
generic_sched_test.go scheduling: prevent self-collision in dynamic port network offerings (#16401) 2023-03-09 10:09:54 -05:00
preemption.go renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
preemption_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
propertyset.go renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
rank.go scheduling: prevent self-collision in dynamic port network offerings (#16401) 2023-03-09 10:09:54 -05:00
rank_test.go core: merge reserved_ports into host_networks (#13651) 2022-07-12 14:40:25 -07:00
reconcile.go scheduler: fix reconciliation of reconnecting allocs (#16609) 2023-03-24 19:38:31 -04:00
reconcile_test.go scheduler: fix reconciliation of reconnecting allocs (#16609) 2023-03-24 19:38:31 -04:00
reconcile_util.go scheduler: fix reconciliation of reconnecting allocs (#16609) 2023-03-24 19:38:31 -04:00
reconcile_util_test.go Update alloc after reconnect and enforece client heartbeat order (#15068) 2022-11-04 16:25:11 -04:00
scheduler.go make version checks specific to region (1.4.x) (#14912) 2022-10-17 16:23:51 -04:00
scheduler_oss.go
scheduler_sysbatch_test.go System and sysbatch jobs always have zero index (#16030) 2023-02-02 16:18:01 -05:00
scheduler_system.go scheduler: log stack in case of panic (#15303) 2022-11-17 18:59:33 -05:00
scheduler_system_test.go System and sysbatch jobs always have zero index (#16030) 2023-02-02 16:18:01 -05:00
select.go
select_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
spread.go renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
spread_test.go main: remove deprecated uses of rand.Seed (#16074) 2023-02-07 09:19:38 -06:00
stack.go scheduler: move utils into files specific to their scheduler type (#16051) 2023-02-03 12:29:39 -05:00
stack_oss.go
stack_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
system_util.go scheduler: move utils into files specific to their scheduler type (#16051) 2023-02-03 12:29:39 -05:00
system_util_test.go scheduler: refactor system util tests (#16416) 2023-03-13 11:59:31 -04:00
testing.go disconnected clients: ensure servers meet minimum required version (#12202) 2022-04-05 17:12:23 -04:00
util.go scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421) 2023-03-14 09:46:00 -05:00
util_test.go scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421) 2023-03-14 09:46:00 -05:00