open-nomad

Author	SHA1	Message	Date
Mahmood Ali	c5f5a1fcb9	client: defensive against getting stale alloc updates When fetching node alloc assignments, be defensive against a stale read before killing local nodes allocs. The bug is when both client and servers are restarting and the client requests the node allocation for the node, it might get stale data as server hasn't finished applying all the restored raft transaction to store. Consequently, client would kill and destroy the alloc locally, just to fetch it again moments later when server store is up to date. The bug can be reproduced quite reliably with single node setup (configured with persistence). I suspect it's too edge-casey to occur in production cluster with multiple servers, but we may need to examine leader failover scenarios more closely. In this commit, we only remove and destroy allocs if the removal index is more recent than the alloc index. This seems like a cheap resiliency fix we already use for detecting alloc updates. A more proper fix would be to ensure that a nomad server only serves RPC calls when state store is fully restored or up to date in leadership transition cases.	2019-06-29 04:17:35 -05:00
Mahmood Ali	6bdc9860b7	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Michael Schurter	0f7dcfdc9a	example redis job "runs" on arv2! see below Tons left to do and lots of churn: 1. No state saving 2. No shutdown or gc 3. Removed AR factory for now 4. Made all "Config" structs local to the package they configure 5. Added allocID to GC to avoid a lookup Really hating how many things use *structs.Allocation. It's not bad without state saving, but if AllocRunner starts updating its copy things get racy fast.	2018-10-16 16:53:29 -07:00
Alex Dadgar	f5ff509fa5	Refactor - wip	2018-06-12 10:23:45 -07:00
Alex Dadgar	843bc26e5d	Respond to comments	2017-05-09 10:50:24 -07:00
Alex Dadgar	e00f9c9413	Restore state + upgrade path	2017-05-02 18:21:49 -07:00
Alex Dadgar	bddedd7aba	Don't deepcopy job when retrieving copy of Alloc This PR removes deepcopying of the job attached to the allocation in the alloc runner. This operation is called very often so removing reflect from the code path and the potentially large number of mallocs need to create a job reduced memory and cpu pressure.	2017-05-01 14:50:34 -07:00
Michael Schurter	1c4195b985	Fix string formatting	2016-12-01 11:22:51 -08:00
Michael Schurter	e7dd443447	Add sanity check to SaveState Also just reuse the task states snapshot taken by `Alloc()` instead of doing a redundant copy.	2016-09-02 16:07:06 -07:00
Cameron Davison	d1e7d9c50f	write state to temp file and then rename	2016-06-27 12:29:33 -05:00
Sean Chittenden	e36686a17d	Use consul/lib's RandomStagger Removes four redundant copies of the method in the process.	2016-06-10 15:48:36 -04:00
Alex Dadgar	2d98c0eadd	Fix double pull with introduction of AllocModifyIndex	2016-02-01 15:43:59 -08:00
Ryan Uber	1ff724ab25	client: alloc dirs tolerate missing directories	2015-09-11 20:32:55 -07:00
Armon Dadgar	ea0795995d	Use a single implementation of GenerateUUID	2015-09-07 15:23:03 -07:00
Armon Dadgar	50c677a9bb	client: adding state save helpers	2015-08-29 18:03:00 -07:00
Armon Dadgar	c71c9bec1a	client: working with alloc diffs	2015-08-23 14:54:52 -07:00
Armon Dadgar	1dfa7296c1	client: alloc diffing	2015-08-23 14:47:51 -07:00
Armon Dadgar	2b2e4c2256	client: register on start	2015-08-20 17:49:04 -07:00
Armon Dadgar	7c3e987617	client: skeleton package	2015-08-20 16:07:26 -07:00

19 commits