open-nomad/client
Mahmood Ali c5f5a1fcb9 client: defensive against getting stale alloc updates
When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
2019-06-29 04:17:35 -05:00
..
allocdir goimports 2019-01-22 15:44:31 -08:00
allochealth client: fix setting alloc unhealthy at deadline 2019-02-19 07:44:14 -08:00
allocrunner tr: Fetch Wait channel before killTask in restart 2019-06-26 15:20:57 +02:00
allocwatcher goimports 2019-01-22 15:44:31 -08:00
config client config flag to disable remote exec 2019-06-03 15:31:39 -04:00
consul test: add some extra logging 2019-01-14 09:56:53 -08:00
devicemanager test: fix NewMemDB API change 2019-03-04 13:37:20 -08:00
fingerprint Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config 2019-04-26 12:55:34 -04:00
interfaces Populate alloc stats API with device stats 2018-11-16 10:26:32 -05:00
lib tests: fix fifo lib race 2019-05-21 09:49:56 -04:00
logmon comment on use of init() for plugin handlers 2019-06-18 20:54:55 -04:00
pluginmanager implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
servers client: drop unused DC field from servers list 2019-05-20 14:19:15 -07:00
state test: fix NewMemDB API change 2019-03-04 13:37:20 -08:00
stats Add Client Device Stats structs in api package 2018-11-14 14:41:19 -05:00
structs Prepare for 0.9.4 dev cycle 2019-06-12 18:47:50 +00:00
taskenv client: handle 0.8 server network resources 2019-05-02 12:08:38 -04:00
testutil tests: expect Docker on AppVeyor 2019-02-20 07:41:47 -05:00
vaultclient vault: fix data races 2019-04-16 11:22:44 -07:00
acl.go aux: helper method that returns token as well as ACL policy 2019-04-30 10:23:56 -04:00
acl_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
alloc_endpoint.go client config flag to disable remote exec 2019-06-03 15:31:39 -04:00
alloc_endpoint_test.go client config flag to disable remote exec 2019-06-03 15:31:39 -04:00
alloc_watcher_e2e_test.go tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00
client.go client: defensive against getting stale alloc updates 2019-06-29 04:17:35 -05:00
client_stats_endpoint.go Server side impl + touch ups 2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
client_test.go Merge pull request #5664 from hashicorp/f-http-hcl-region 2019-06-13 12:25:01 -07:00
driver_manager_test.go tests: fix data race in client TestDriverManager_Fingerprint_Periodic 2019-05-21 09:49:56 -04:00
fingerprint_manager.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
fingerprint_manager_test.go client/drivermananger: add driver manager 2018-12-18 22:55:18 -05:00
fs_endpoint.go implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
fs_endpoint_test.go tests: fix client TestFS_Stream data race 2019-05-21 09:49:56 -04:00
gc.go Plugins use parent loggers 2019-01-11 11:36:37 -08:00
gc_test.go test: copy AR's Alloc before mutating 2018-12-19 15:48:02 -08:00
node_updater.go client: wait for batched driver updated 2019-04-19 09:00:24 -04:00
rpc.go implement client endpoint of nomad exec 2019-05-09 16:49:08 -04:00
rpc_test.go tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
testing.go goimports until make check is happy 2019-01-23 06:27:14 -08:00
util.go client: defensive against getting stale alloc updates 2019-06-29 04:17:35 -05:00
util_test.go Update state with server 2018-10-16 16:53:29 -07:00