open-nomad/client
Mahmood Ali 00be4fc63c
tests: deflake TestTaskRunner_StatsHook_Periodic (#9734)
This PR deflakes TestTaskRunner_StatsHook_Periodic tests and adds backoff when the driver closes the channel.

TestTaskRunner_StatsHook_Periodic is currently the most flaky test - failing ~4% of the time (20 out of 486 workflows). A sample failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/14028/workflows/957b674f-cbcc-4228-96d9-1094fdee5b9c/jobs/128563 .

This change has two components:

First, it updates the StatsHook so that it backs off when stats channel is closed. In the context of the test where the mock driver emits a single stats update and closes the channel, the test may make tens of thousands update during the period. In real context, if a driver doesn't implement the stats handler properly or when a task finishes, we may generate way too many Stats queries in a tight loop. Here, the backoff reduces these queries. I've added a failing test that shows 154,458 stats updates within 500ms in https://app.circleci.com/pipelines/github/hashicorp/nomad/14092/workflows/50672445-392d-4661-b19e-e3561ed32746/jobs/129423 .

Second, the test ignores the first stats update after a task exit. Due to the asynchronicity of updates and channel/context use, it's possible that an update is enqueued while the test marks the task as exited, resulting into a spurious update.
2021-01-06 16:03:00 -05:00
..
allocdir Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
allochealth allochealth: Fix when check health preceeds task health 2020-05-13 07:44:39 -04:00
allocrunner tests: deflake TestTaskRunner_StatsHook_Periodic (#9734) 2021-01-06 16:03:00 -05:00
allocwatcher client/allocwatcher: fix dropped test error (#6592) 2019-10-31 08:29:25 -04:00
config removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
consul consul/connect: dynamically select envoy sidecar at runtime 2020-10-13 09:14:12 -05:00
devicemanager print the actual fingerprint error instead of an unrelated (and probably nil) error 2021-01-04 08:20:29 -05:00
dynamicplugins Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
fingerprint Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
interfaces Populate alloc stats API with device stats 2018-11-16 10:26:32 -05:00
lib ar: plumb client config for networking into the network hook 2019-07-31 01:04:06 -04:00
logmon Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
pluginmanager pluginmanager: WaitForFirstFingerprint times out (#9597) 2020-12-10 07:27:15 -08:00
servers client: drop unused DC field from servers list 2019-05-20 14:19:15 -07:00
state Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
stats Update gopsutil code 2020-03-15 09:37:05 +01:00
structs Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
taskenv added documenting unit tests for new TaskEnv.ClientPath method 2021-01-04 22:25:38 +00:00
testutil fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
vaultclient Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
acl.go Audit config, seams for enterprise audit features 2020-03-23 13:47:42 -04:00
acl_test.go Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447) 2020-12-01 11:11:34 -05:00
agent_endpoint.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
agent_endpoint_test.go fix params for Agent.Host client RPC (#8795) 2020-08-31 17:14:26 -04:00
alloc_endpoint.go client: improve alloc GC API error messages (#9488) 2021-01-04 11:34:12 -05:00
alloc_endpoint_test.go client: improve alloc GC API error messages (#9488) 2021-01-04 11:34:12 -05:00
alloc_watcher_e2e_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
client.go consul/connect: fix regression where client connect images ignored 2020-12-14 09:47:55 -06:00
client_stats_endpoint.go Server side impl + touch ups 2018-02-15 13:59:02 -08:00
client_stats_endpoint_test.go tests: swap lib/freeport for tweaked helper/freeport 2019-12-09 08:37:32 -06:00
client_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
csi_endpoint.go csi: client RPCs should return wrapped errors for checking (#8605) 2020-08-07 11:01:36 -04:00
csi_endpoint_test.go csi: client RPCs should return wrapped errors for checking (#8605) 2020-08-07 11:01:36 -04:00
driver_manager_test.go tests: fix data race in client TestDriverManager_Fingerprint_Periodic 2019-05-21 09:49:56 -04:00
enterprise_client_oss.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
fingerprint_manager.go s/0.13/1.0/g 2020-10-14 15:17:47 -07:00
fingerprint_manager_test.go use allow/deny instead of the colored alternatives (#9019) 2020-10-12 08:47:05 -04:00
fs_endpoint.go fixup! vendor: explicit use of hashicorp/go-msgpack 2020-03-31 09:48:07 -04:00
fs_endpoint_test.go client: fix test umask (#8987) 2020-09-30 08:09:41 -04:00
gc.go Plugins use parent loggers 2019-01-11 11:36:37 -08:00
gc_test.go Events/msgtype cleanup (#9117) 2020-10-19 09:30:15 -04:00
heartbeatstop.go Delayed evaluations for stop_after_client_disconnect can cause unwanted extra followup evaluations around job garbage collection (#8099) 2020-06-03 09:48:38 -04:00
heartbeatstop_test.go docs: s/hearbeat/heartbeat and fix link 2020-07-23 11:33:34 -07:00
node_updater.go client: use NewNodeEvent builder for consistency (#7559) 2020-03-31 10:02:16 -04:00
rpc.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
rpc_test.go Simplify Bootstrap logic in tests 2020-03-02 13:47:43 -05:00
testing.go consul/connect: dynamically select envoy sidecar at runtime 2020-10-13 09:14:12 -05:00
util.go Revert "client: defensive against getting stale alloc updates" 2020-06-19 15:39:44 -04:00
util_test.go Update state with server 2018-10-16 16:53:29 -07:00