open-nomad/client/devicemanager
Tim Gross 0a19fe3b60 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:39:09 -04:00
..
state [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
instance.go fix multiple overflow errors in exponential backoff (#18200) 2023-08-15 14:39:09 -04:00
manager.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
manager_test.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
testing.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
utils.go [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00