On a few occasions I've had to read timeout stack traces for tests and
noticed that retry.Run runs the function in a goroutine. This makes
debuging a timeout more difficult because the gourinte of the retryable
function is disconnected from the stack of the actual test. It requires
searching through the entire stack trace to find the other goroutine.
By using panic instead of runtime.Goexit() we remove the need for a
separate goroutine.
Also a few other small improvements:
* add `R.Helper` so that an assertion function can be used with both
testing.T and retry.R.
* Pass t to `Retryer.NextOr`, and call `t.Helper` in a number of places
so that the line number reported by `t.Log` is the line in the test
where `retry.Run` was called, instead of some line in `retry.go` that
is not relevant to the failure.
* improve the implementation of `dedup` by removing the need to iterate
twice. Instad track the lines and skip any duplicate when writing to
the buffer.
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.
With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.
This commit updates canRetry to check for ErrChunkingResubmit in err.
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.
This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.
Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.
We noticed that our mock API would sometimes respond with an empty array
of addresses - which resulted in an empty space in the gateway upstream
listing which looked as though it could be broken.
I checked with backend, and as this will never happen, I made the change
here also so the gateway upstream list is always fully populated with
addresses.
The extra argument meant that the blocking query configuration wasn't
being read properly, and therefore the correct ?index wasn't being sent
with the request.