open-nomad

History

Mahmood Ali c5f5a1fcb9 client: defensive against getting stale alloc updates When fetching node alloc assignments, be defensive against a stale read before killing local nodes allocs. The bug is when both client and servers are restarting and the client requests the node allocation for the node, it might get stale data as server hasn't finished applying all the restored raft transaction to store. Consequently, client would kill and destroy the alloc locally, just to fetch it again moments later when server store is up to date. The bug can be reproduced quite reliably with single node setup (configured with persistence). I suspect it's too edge-casey to occur in production cluster with multiple servers, but we may need to examine leader failover scenarios more closely. In this commit, we only remove and destroy allocs if the removal index is more recent than the alloc index. This seems like a cheap resiliency fix we already use for detecting alloc updates. A more proper fix would be to ensure that a nomad server only serves RPC calls when state store is fully restored or up to date in leadership transition cases.		2019-06-29 04:17:35 -05:00
..
allocdir	goimports	2019-01-22 15:44:31 -08:00
allochealth	client: fix setting alloc unhealthy at deadline	2019-02-19 07:44:14 -08:00
allocrunner	tr: Fetch Wait channel before killTask in restart	2019-06-26 15:20:57 +02:00
allocwatcher	goimports	2019-01-22 15:44:31 -08:00
config	client config flag to disable remote exec	2019-06-03 15:31:39 -04:00
consul	test: add some extra logging	2019-01-14 09:56:53 -08:00
devicemanager	test: fix NewMemDB API change	2019-03-04 13:37:20 -08:00
fingerprint	Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config	2019-04-26 12:55:34 -04:00
interfaces	Populate alloc stats API with device stats	2018-11-16 10:26:32 -05:00
lib	tests: fix fifo lib race	2019-05-21 09:49:56 -04:00
logmon	comment on use of init() for plugin handlers	2019-06-18 20:54:55 -04:00
pluginmanager	implement client endpoint of nomad exec	2019-05-09 16:49:08 -04:00
servers	client: drop unused DC field from servers list	2019-05-20 14:19:15 -07:00
state	test: fix NewMemDB API change	2019-03-04 13:37:20 -08:00
stats	Add Client Device Stats structs in `api` package	2018-11-14 14:41:19 -05:00
structs	Prepare for 0.9.4 dev cycle	2019-06-12 18:47:50 +00:00
taskenv	client: handle 0.8 server network resources	2019-05-02 12:08:38 -04:00
testutil	tests: expect Docker on AppVeyor	2019-02-20 07:41:47 -05:00
vaultclient	vault: fix data races	2019-04-16 11:22:44 -07:00
acl.go	aux: helper method that returns token as well as ACL policy	2019-04-30 10:23:56 -04:00
acl_test.go	tests: explicitly cleanup after clients	2018-10-17 10:06:59 -07:00
alloc_endpoint.go	client config flag to disable remote exec	2019-06-03 15:31:39 -04:00
alloc_endpoint_test.go	client config flag to disable remote exec	2019-06-03 15:31:39 -04:00
alloc_watcher_e2e_test.go	tests: enable and fix tests requiring mock driver	2019-01-10 10:10:11 -05:00
client.go	client: defensive against getting stale alloc updates	2019-06-29 04:17:35 -05:00
client_stats_endpoint.go	Server side impl + touch ups	2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go	tests: explicitly cleanup after clients	2018-10-17 10:06:59 -07:00
client_test.go	Merge pull request #5664 from hashicorp/f-http-hcl-region	2019-06-13 12:25:01 -07:00
driver_manager_test.go	tests: fix data race in client TestDriverManager_Fingerprint_Periodic	2019-05-21 09:49:56 -04:00
fingerprint_manager.go	goimports until make check is happy	2019-01-23 06:27:14 -08:00
fingerprint_manager_test.go	client/drivermananger: add driver manager	2018-12-18 22:55:18 -05:00
fs_endpoint.go	implement client endpoint of nomad exec	2019-05-09 16:49:08 -04:00
fs_endpoint_test.go	tests: fix client TestFS_Stream data race	2019-05-21 09:49:56 -04:00
gc.go	Plugins use parent loggers	2019-01-11 11:36:37 -08:00
gc_test.go	test: copy AR's Alloc before mutating	2018-12-19 15:48:02 -08:00
node_updater.go	client: wait for batched driver updated	2019-04-19 09:00:24 -04:00
rpc.go	implement client endpoint of nomad exec	2019-05-09 16:49:08 -04:00
rpc_test.go	tests: explicitly cleanup after clients	2018-10-17 10:06:59 -07:00
testing.go	goimports until make check is happy	2019-01-23 06:27:14 -08:00
util.go	client: defensive against getting stale alloc updates	2019-06-29 04:17:35 -05:00
util_test.go	Update state with server	2018-10-16 16:53:29 -07:00