client: avoid acting on stale data after launch (#10907)

When the client launches, use a consistent read to fetch its own allocs,
but allow stale read afterwards as long as reads don't revert into older
state.

This change addresses an edge case affecting restarting client. When a
client restarts, it may fetch a stale data concerning its allocs: allocs
that have completed prior to the client shutdown may still have "run/running"
desired/client status, and have the client attempt to re-run again.

An alternative approach is to track the indices such that the client
set MinQueryIndex on the maximum index the client ever saw, or compare
received allocs against locally restored client state. Garbage
collection complicates this approach (local knowledge is not complete),
and the approach still risks starting "dead" allocations (e.g. the
allocation may have been placed when client just restarted and have
already been reschuled by the time the client started. This approach
here is effective against all kinds of stalness problems with small
overhead.
This commit is contained in:
Mahmood Ali 2021-07-20 15:13:28 -04:00 committed by GitHub
parent efe8ea2c2c
commit 8df9b1fd0f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 15 additions and 3 deletions

4
.changelog/10907.txt Normal file
View File

@ -0,0 +1,4 @@
```release-note:bug
client: Fixed a bug where a restarted client may start an already completed tasks in rare conditions
```

View File

@ -2003,7 +2003,14 @@ func (c *Client) watchAllocations(updates chan *allocUpdates) {
SecretID: c.secretNodeID(),
QueryOptions: structs.QueryOptions{
Region: c.Region(),
AllowStale: true,
// Make a consistent read query when the client starts
// to avoid acting on stale data that predates this
// client state before a client restart.
//
// After the first request, only require monotonically
// increasing state.
AllowStale: false,
},
}
var resp structs.NodeClientAllocsResponse
@ -2153,7 +2160,8 @@ OUTER:
c.logger.Debug("updated allocations", "index", resp.Index,
"total", len(resp.Allocs), "pulled", len(allocsResp.Allocs), "filtered", len(filtered))
// Update the query index.
// After the first request, only require monotonically increasing state.
req.AllowStale = true
if resp.Index > req.MinQueryIndex {
req.MinQueryIndex = resp.Index
}