open-nomad

History

Mahmood Ali 00be4fc63c tests: deflake TestTaskRunner_StatsHook_Periodic (#9734 ) This PR deflakes TestTaskRunner_StatsHook_Periodic tests and adds backoff when the driver closes the channel. TestTaskRunner_StatsHook_Periodic is currently the most flaky test - failing ~4% of the time (20 out of 486 workflows). A sample failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/14028/workflows/957b674f-cbcc-4228-96d9-1094fdee5b9c/jobs/128563 . This change has two components: First, it updates the StatsHook so that it backs off when stats channel is closed. In the context of the test where the mock driver emits a single stats update and closes the channel, the test may make tens of thousands update during the period. In real context, if a driver doesn't implement the stats handler properly or when a task finishes, we may generate way too many Stats queries in a tight loop. Here, the backoff reduces these queries. I've added a failing test that shows 154,458 stats updates within 500ms in https://app.circleci.com/pipelines/github/hashicorp/nomad/14092/workflows/50672445-392d-4661-b19e-e3561ed32746/jobs/129423 . Second, the test ignores the first stats update after a task exit. Due to the asynchronicity of updates and channel/context use, it's possible that an update is enqueued while the test marks the task as exited, resulting into a spurious update.		2021-01-06 16:03:00 -05:00
..
allocdir	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
allochealth	allochealth: Fix when check health preceeds task health	2020-05-13 07:44:39 -04:00
allocrunner	tests: deflake TestTaskRunner_StatsHook_Periodic (#9734 )	2021-01-06 16:03:00 -05:00
allocwatcher	client/allocwatcher: fix dropped test error (#6592 )	2019-10-31 08:29:25 -04:00
config	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
consul	consul/connect: dynamically select envoy sidecar at runtime	2020-10-13 09:14:12 -05:00
devicemanager	print the actual fingerprint error instead of an unrelated (and probably nil) error	2021-01-04 08:20:29 -05:00
dynamicplugins	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
fingerprint	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
interfaces	Populate alloc stats API with device stats	2018-11-16 10:26:32 -05:00
lib	ar: plumb client config for networking into the network hook	2019-07-31 01:04:06 -04:00
logmon	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
pluginmanager	pluginmanager: WaitForFirstFingerprint times out (#9597 )	2020-12-10 07:27:15 -08:00
servers	client: drop unused DC field from servers list	2019-05-20 14:19:15 -07:00
state	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
stats	Update gopsutil code	2020-03-15 09:37:05 +01:00
structs	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
taskenv	added documenting unit tests for new TaskEnv.ClientPath method	2021-01-04 22:25:38 +00:00
testutil	fixup! vendor: explicit use of hashicorp/go-msgpack	2020-03-31 09:48:07 -04:00
vaultclient	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
acl.go	Audit config, seams for enterprise audit features	2020-03-23 13:47:42 -04:00
acl_test.go	Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447 )	2020-12-01 11:11:34 -05:00
agent_endpoint.go	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
agent_endpoint_test.go	fix params for Agent.Host client RPC (#8795 )	2020-08-31 17:14:26 -04:00
alloc_endpoint.go	client: improve alloc GC API error messages (#9488 )	2021-01-04 11:34:12 -05:00
alloc_endpoint_test.go	client: improve alloc GC API error messages (#9488 )	2021-01-04 11:34:12 -05:00
alloc_watcher_e2e_test.go	tests: swap lib/freeport for tweaked helper/freeport	2019-12-09 08:37:32 -06:00
client.go	consul/connect: fix regression where client connect images ignored	2020-12-14 09:47:55 -06:00
client_stats_endpoint.go	Server side impl + touch ups	2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go	tests: swap lib/freeport for tweaked helper/freeport	2019-12-09 08:37:32 -06:00
client_test.go	Events/msgtype cleanup (#9117 )	2020-10-19 09:30:15 -04:00
csi_endpoint.go	csi: client RPCs should return wrapped errors for checking (#8605 )	2020-08-07 11:01:36 -04:00
csi_endpoint_test.go	csi: client RPCs should return wrapped errors for checking (#8605 )	2020-08-07 11:01:36 -04:00
driver_manager_test.go	tests: fix data race in client TestDriverManager_Fingerprint_Periodic	2019-05-21 09:49:56 -04:00
enterprise_client_oss.go	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
fingerprint_manager.go	s/0.13/1.0/g	2020-10-14 15:17:47 -07:00
fingerprint_manager_test.go	use allow/deny instead of the colored alternatives (#9019 )	2020-10-12 08:47:05 -04:00
fs_endpoint.go	fixup! vendor: explicit use of hashicorp/go-msgpack	2020-03-31 09:48:07 -04:00
fs_endpoint_test.go	client: fix test umask (#8987 )	2020-09-30 08:09:41 -04:00
gc.go	Plugins use parent loggers	2019-01-11 11:36:37 -08:00
gc_test.go	Events/msgtype cleanup (#9117 )	2020-10-19 09:30:15 -04:00
heartbeatstop.go	Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099 )	2020-06-03 09:48:38 -04:00
heartbeatstop_test.go	docs: s/hearbeat/heartbeat and fix link	2020-07-23 11:33:34 -07:00
node_updater.go	client: use NewNodeEvent builder for consistency (#7559 )	2020-03-31 10:02:16 -04:00
rpc.go	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
rpc_test.go	Simplify Bootstrap logic in tests	2020-03-02 13:47:43 -05:00
testing.go	consul/connect: dynamically select envoy sidecar at runtime	2020-10-13 09:14:12 -05:00
util.go	Revert "client: defensive against getting stale alloc updates"	2020-06-19 15:39:44 -04:00
util_test.go	Update state with server	2018-10-16 16:53:29 -07:00