open-nomad/client/allocrunner
Tim Gross 0a19fe3b60 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:39:09 -04:00
..
interfaces prioritized client updates (#17354) 2023-05-31 15:34:16 -04:00
state CSI: persist previous mounts on client to restore during restart (#17840) 2023-07-10 13:20:15 -04:00
tasklifecycle [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
taskrunner fix multiple overflow errors in exponential backoff (#18200) 2023-08-15 14:39:09 -04:00
alloc_runner.go Backport of Retain task states for post stop tasks at the time of node GC into release/1.6.x (#18033) 2023-07-21 12:55:29 -05:00
alloc_runner_hooks.go full task cleanup when alloc prerun hook fails (#17104) 2023-05-08 13:17:10 -05:00
alloc_runner_test.go prioritized client updates (#17354) 2023-05-31 15:34:16 -04:00
alloc_runner_unix_test.go allocrunner: provide factory function so we can build mock ARs (#17161) 2023-05-12 13:29:44 -04:00
allocdir_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
cgroup_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
checks_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
checks_hook_test.go chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968) (#18074) 2023-07-26 16:38:39 +01:00
consul_grpc_sock_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
consul_grpc_sock_hook_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
consul_http_sock_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
consul_http_sock_hook_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
csi_hook.go CSI: persist previous mounts on client to restore during restart (#17840) 2023-07-10 13:20:15 -04:00
csi_hook_test.go CSI: persist previous mounts on client to restore during restart (#17840) 2023-07-10 13:20:15 -04:00
fail_hook.go full task cleanup when alloc prerun hook fails (#17104) 2023-05-08 13:17:10 -05:00
group_service_hook.go services: un-mark group services as deregistered if restart hook runs (#16905) 2023-04-24 14:24:51 -05:00
group_service_hook_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
health_hook.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
health_hook_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
migrate_hook.go allocrunner: provide factory function so we can build mock ARs (#17161) 2023-05-12 13:29:44 -04:00
network_hook.go docker: stop network pause container of lost alloc after node restart (#17455) 2023-06-09 08:46:29 -05:00
network_hook_test.go docker: stop network pause container of lost alloc after node restart (#17455) 2023-06-09 08:46:29 -05:00
network_manager_linux.go docker: stop network pause container of lost alloc after node restart (#17455) 2023-06-09 08:46:29 -05:00
network_manager_linux_test.go allocrunner: prevent panic on network manager (#16921) 2023-04-18 13:39:13 -07:00
network_manager_nonlinux.go client: fix panic on alloc stop in non-Linux environments (#17515) 2023-06-14 10:22:38 -04:00
networking.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
networking_bridge_linux.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
networking_bridge_linux_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
networking_cni.go cni: ensure to setup CNI addresses in deterministic order (#17766) 2023-07-06 13:25:29 -07:00
networking_cni_test.go cni: ensure to setup CNI addresses in deterministic order (#17766) 2023-07-06 13:25:29 -07:00
testing.go allocrunner: provide factory function so we can build mock ARs (#17161) 2023-05-12 13:29:44 -04:00
upstream_allocs_hook.go allocrunner: provide factory function so we can build mock ARs (#17161) 2023-05-12 13:29:44 -04:00