Commit graph

19 commits

Author SHA1 Message Date
Mahmood Ali c5f5a1fcb9 client: defensive against getting stale alloc updates
When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
2019-06-29 04:17:35 -05:00
Mahmood Ali 6bdc9860b7 client: avoid registering node twice right away
I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.

This call is unnecessary and made debugging a bit harder.  So here, we
ensure that we only re-register node for new node events, not for
initial registration.
2019-04-19 09:12:50 -04:00
Michael Schurter 0f7dcfdc9a example redis job "runs" on arv2! see below
Tons left to do and lots of churn:
1. No state saving
2. No shutdown or gc
3. Removed AR factory *for now*
4. Made all "Config" structs local to the package they configure
5. Added allocID to GC to avoid a lookup

Really hating how many things use *structs.Allocation. It's not bad
without state saving, but if AllocRunner starts updating its copy things
get racy fast.
2018-10-16 16:53:29 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Alex Dadgar 843bc26e5d Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar e00f9c9413 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar bddedd7aba Don't deepcopy job when retrieving copy of Alloc
This PR removes deepcopying of the job attached to the allocation in the
alloc runner. This operation is called very often so removing reflect
from the code path and the potentially large number of mallocs need to
create a job reduced memory and cpu pressure.
2017-05-01 14:50:34 -07:00
Michael Schurter 1c4195b985 Fix string formatting 2016-12-01 11:22:51 -08:00
Michael Schurter e7dd443447 Add sanity check to SaveState
Also just reuse the task states snapshot taken by `Alloc()` instead of
doing a redundant copy.
2016-09-02 16:07:06 -07:00
Cameron Davison d1e7d9c50f
write state to temp file and then rename 2016-06-27 12:29:33 -05:00
Sean Chittenden e36686a17d
Use consul/lib's RandomStagger
Removes four redundant copies of the method in the process.
2016-06-10 15:48:36 -04:00
Alex Dadgar 2d98c0eadd Fix double pull with introduction of AllocModifyIndex 2016-02-01 15:43:59 -08:00
Ryan Uber 1ff724ab25 client: alloc dirs tolerate missing directories 2015-09-11 20:32:55 -07:00
Armon Dadgar ea0795995d Use a single implementation of GenerateUUID 2015-09-07 15:23:03 -07:00
Armon Dadgar 50c677a9bb client: adding state save helpers 2015-08-29 18:03:00 -07:00
Armon Dadgar c71c9bec1a client: working with alloc diffs 2015-08-23 14:54:52 -07:00
Armon Dadgar 1dfa7296c1 client: alloc diffing 2015-08-23 14:47:51 -07:00
Armon Dadgar 2b2e4c2256 client: register on start 2015-08-20 17:49:04 -07:00
Armon Dadgar 7c3e987617 client: skeleton package 2015-08-20 16:07:26 -07:00