open-nomad/helper/backoff.go
Tim Gross 0a19fe3b60 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:39:09 -04:00

32 lines
745 B
Go

// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1
package helper
import (
"time"
)
func Backoff(backoffBase time.Duration, backoffLimit time.Duration, attempt uint64) time.Duration {
const MaxUint = ^uint64(0)
const MaxInt = int64(MaxUint >> 1)
// Ensure lack of non-positive backoffs since these make no sense
if backoffBase.Nanoseconds() <= 0 {
return max(backoffBase, 0*time.Second)
}
// Ensure that a large attempt will not cause an overflow
if attempt > 62 || MaxInt/backoffBase.Nanoseconds() < (1<<attempt) {
return backoffLimit
}
// Compute deadline and clamp it to backoffLimit
deadline := 1 << attempt * backoffBase
if deadline > backoffLimit {
deadline = backoffLimit
}
return deadline
}