open-nomad/client/allocrunner
Mahmood Ali 067fd86a8c
drivers: Capture exit code when task is killed (#10494)
This commit ensures Nomad captures the task code more reliably even when the task is killed. This issue affect to `raw_exec` driver, as noted in https://github.com/hashicorp/nomad/issues/10430 .

We fix this issue by ensuring that the TaskRunner only calls `driver.WaitTask` once. The TaskRunner monitors the completion of the task by calling `driver.WaitTask` which should return the task exit code on completion. However, it also could return a "context canceled" error if the agent/executor is shutdown.

Previously, when a task is to be stopped, the killTask path makes two WaitTask calls, and the second returns "context canceled" occasionally because of a "race" in task shutting down and depending on driver, and how fast it shuts down after task completes.

By having a single WaitTask call and consistently waiting for the task, we ensure we capture the exit code reliably before the executor is shutdown or the contexts expired.

I opted to change the TaskRunner implementation to avoid changing the driver interface or requiring 3rd party drivers to update.

Additionally, the PR ensures that attempts to kill the task terminate when the task "naturally" dies. Without this change, if the task dies at the right moment, the `killTask` call may retry to kill an already-dead task for up to 5 minutes before giving up.
2021-05-04 10:54:00 -04:00
..
interfaces implement alloc runner task restart hook 2021-01-22 10:55:40 -05:00
state client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
taskrunner drivers: Capture exit code when task is killed (#10494) 2021-05-04 10:54:00 -04:00
alloc_runner.go testing fixes 2021-04-14 10:17:28 -04:00
alloc_runner_hooks.go client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
alloc_runner_test.go lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
alloc_runner_unix_test.go tests: restart restartpolicy for all tasks in tests 2020-03-24 21:52:48 -04:00
allocdir_hook.go client: cleanup and document context uses 2019-03-12 15:03:54 -07:00
cgroup_hook.go client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
config.go client/ar: thread through cpuset manager 2021-04-13 13:28:36 -04:00
consul_grpc_sock_hook.go consul/connect: add initial support for ingress gateways 2020-08-21 16:21:54 -05:00
consul_grpc_sock_hook_test.go consul/connect: add support for bridge networks with connect native tasks 2020-07-29 09:26:01 -05:00
consul_http_sock_hook.go consul/connect: fixup some spelling, comments, consts 2020-07-29 09:26:01 -05:00
consul_http_sock_hook_test.go consul/connect: add support for bridge networks with connect native tasks 2020-07-29 09:26:01 -05:00
csi_hook.go CSI: use AccessMode/AttachmentMode from CSIVolumeClaim 2021-04-07 11:24:09 -04:00
groupservice_hook.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
groupservice_hook_test.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
health_hook.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
health_hook_test.go health: fail health if any task is pending 2020-03-22 11:13:41 -04:00
migrate_hook.go client: cleanup and document context uses 2019-03-12 15:03:54 -07:00
network_hook.go client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
network_hook_test.go client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
network_manager_linux.go ar: isolate network actions performed by client 2021-02-02 23:24:57 -05:00
network_manager_linux_test.go ar: rearrange network hook to support building on windows 2019-07-31 01:03:19 -04:00
network_manager_nonlinux.go ar: refactor network bridge config to use go-cni lib (#6255) 2019-09-04 16:33:25 -04:00
networking.go ar: isolate network actions performed by client 2021-02-02 23:24:57 -05:00
networking_bridge_linux.go networking: Ensure CNI iptables rules are appended to chain and not forced to be first 2021-04-15 10:11:15 -04:00
networking_cni.go ar: refactor go-cni results processing & add test 2021-04-08 09:20:14 -07:00
networking_cni_test.go ar: refactor go-cni results processing & add test 2021-04-08 09:20:14 -07:00
task_hook_coordinator.go lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
task_hook_coordinator_test.go test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
testing.go testing fixes 2021-04-14 10:17:28 -04:00
upstream_allocs_hook.go client: cleanup and document context uses 2019-03-12 15:03:54 -07:00
util.go allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00