open-nomad/client/allocrunner
Mahmood Ali 00be4fc63c
tests: deflake TestTaskRunner_StatsHook_Periodic (#9734)
This PR deflakes TestTaskRunner_StatsHook_Periodic tests and adds backoff when the driver closes the channel.

TestTaskRunner_StatsHook_Periodic is currently the most flaky test - failing ~4% of the time (20 out of 486 workflows). A sample failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/14028/workflows/957b674f-cbcc-4228-96d9-1094fdee5b9c/jobs/128563 .

This change has two components:

First, it updates the StatsHook so that it backs off when stats channel is closed. In the context of the test where the mock driver emits a single stats update and closes the channel, the test may make tens of thousands update during the period. In real context, if a driver doesn't implement the stats handler properly or when a task finishes, we may generate way too many Stats queries in a tight loop. Here, the backoff reduces these queries. I've added a failing test that shows 154,458 stats updates within 500ms in https://app.circleci.com/pipelines/github/hashicorp/nomad/14092/workflows/50672445-392d-4661-b19e-e3561ed32746/jobs/129423 .

Second, the test ignores the first stats update after a task exit. Due to the asynchronicity of updates and channel/context use, it's possible that an update is enqueued while the test marks the task as exited, resulting into a spurious update.
2021-01-06 16:03:00 -05:00
..
interfaces
state
taskrunner
alloc_runner.go
alloc_runner_hooks.go
alloc_runner_test.go
alloc_runner_unix_test.go
allocdir_hook.go
config.go
consul_grpc_sock_hook.go
consul_grpc_sock_hook_test.go
consul_http_sock_hook.go
consul_http_sock_hook_test.go
csi_hook.go
groupservice_hook.go
groupservice_hook_test.go
health_hook.go
health_hook_test.go
migrate_hook.go
network_hook.go
network_hook_test.go
network_manager_linux.go
network_manager_linux_test.go
network_manager_nonlinux.go
networking.go
networking_bridge_linux.go
networking_cni.go
task_hook_coordinator.go
task_hook_coordinator_test.go
testing.go
upstream_allocs_hook.go
util.go