067fd86a8c
This commit ensures Nomad captures the task code more reliably even when the task is killed. This issue affect to `raw_exec` driver, as noted in https://github.com/hashicorp/nomad/issues/10430 . We fix this issue by ensuring that the TaskRunner only calls `driver.WaitTask` once. The TaskRunner monitors the completion of the task by calling `driver.WaitTask` which should return the task exit code on completion. However, it also could return a "context canceled" error if the agent/executor is shutdown. Previously, when a task is to be stopped, the killTask path makes two WaitTask calls, and the second returns "context canceled" occasionally because of a "race" in task shutting down and depending on driver, and how fast it shuts down after task completes. By having a single WaitTask call and consistently waiting for the task, we ensure we capture the exit code reliably before the executor is shutdown or the contexts expired. I opted to change the TaskRunner implementation to avoid changing the driver interface or requiring 3rd party drivers to update. Additionally, the PR ensures that attempts to kill the task terminate when the task "naturally" dies. Without this change, if the task dies at the right moment, the `killTask` call may retry to kill an already-dead task for up to 5 minutes before giving up. |
||
---|---|---|
.. | ||
interfaces | ||
state | ||
taskrunner | ||
alloc_runner.go | ||
alloc_runner_hooks.go | ||
alloc_runner_test.go | ||
alloc_runner_unix_test.go | ||
allocdir_hook.go | ||
cgroup_hook.go | ||
config.go | ||
consul_grpc_sock_hook.go | ||
consul_grpc_sock_hook_test.go | ||
consul_http_sock_hook.go | ||
consul_http_sock_hook_test.go | ||
csi_hook.go | ||
groupservice_hook.go | ||
groupservice_hook_test.go | ||
health_hook.go | ||
health_hook_test.go | ||
migrate_hook.go | ||
network_hook.go | ||
network_hook_test.go | ||
network_manager_linux.go | ||
network_manager_linux_test.go | ||
network_manager_nonlinux.go | ||
networking.go | ||
networking_bridge_linux.go | ||
networking_cni.go | ||
networking_cni_test.go | ||
task_hook_coordinator.go | ||
task_hook_coordinator_test.go | ||
testing.go | ||
upstream_allocs_hook.go | ||
util.go |