open-nomad/client/allocrunner/taskrunner
Mahmood Ali 067fd86a8c
drivers: Capture exit code when task is killed (#10494)
This commit ensures Nomad captures the task code more reliably even when the task is killed. This issue affect to `raw_exec` driver, as noted in https://github.com/hashicorp/nomad/issues/10430 .

We fix this issue by ensuring that the TaskRunner only calls `driver.WaitTask` once. The TaskRunner monitors the completion of the task by calling `driver.WaitTask` which should return the task exit code on completion. However, it also could return a "context canceled" error if the agent/executor is shutdown.

Previously, when a task is to be stopped, the killTask path makes two WaitTask calls, and the second returns "context canceled" occasionally because of a "race" in task shutting down and depending on driver, and how fast it shuts down after task completes.

By having a single WaitTask call and consistently waiting for the task, we ensure we capture the exit code reliably before the executor is shutdown or the contexts expired.

I opted to change the TaskRunner implementation to avoid changing the driver interface or requiring 3rd party drivers to update.

Additionally, the PR ensures that attempts to kill the task terminate when the task "naturally" dies. Without this change, if the task dies at the right moment, the `killTask` call may retry to kill an already-dead task for up to 5 minutes before giving up.
2021-05-04 10:54:00 -04:00
..
getter update template and artifact interpolation to use client-relative paths 2021-01-04 22:25:34 +00:00
interfaces template: trigger change_mode for dynamic secrets on restore (#9636) 2020-12-16 13:36:19 -05:00
restarts lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
state client: test logmon cleanup 2019-03-04 13:15:15 -08:00
template consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
testdata executor/linux: make chroot binary paths absolute 2019-04-01 15:45:31 -07:00
artifact_hook.go update template and artifact interpolation to use client-relative paths 2021-01-04 22:25:34 +00:00
artifact_hook_test.go update template and artifact interpolation to use client-relative paths 2021-01-04 22:25:34 +00:00
connect_native_hook.go Automatically populate CONSUL_HTTP_ADDR for connect native tasks in host networking mode. Fixes #10239 2021-03-28 14:34:31 +02:00
connect_native_hook_test.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
device_hook.go Store device envs separately and pass to drivers 2018-12-19 14:23:09 -08:00
device_hook_test.go Device hook and devices affect computed node class 2018-11-27 17:25:33 -08:00
dispatch_hook.go client/state: support upgrading from 0.8->0.9 2018-12-19 10:39:27 -08:00
dispatch_hook_test.go use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
driver_handle.go core: propagate remote task handles 2021-04-27 15:07:03 -07:00
envoy_bootstrap_hook.go connect: use exp backoff when waiting on consul envoy bootstrap 2021-04-27 09:21:50 -06:00
envoy_bootstrap_hook_test.go connect: use exp backoff when waiting on consul envoy bootstrap 2021-04-27 09:21:50 -06:00
envoy_version_hook.go consul/connect: fix regression where client connect images ignored 2020-12-14 09:47:55 -06:00
envoy_version_hook_test.go update template and artifact interpolation to use client-relative paths 2021-01-04 22:25:34 +00:00
errors.go client: artifact errors are retry-able 2019-02-20 07:21:27 -08:00
errors_test.go client: artifact errors are retry-able 2019-02-20 07:21:27 -08:00
lazy_handle.go executor: implement streaming stats API 2019-01-12 12:18:22 -05:00
lifecycle.go drivers: Capture exit code when task is killed (#10494) 2021-05-04 10:54:00 -04:00
logmon_hook.go address review comments 2019-12-13 11:21:00 -05:00
logmon_hook_test.go driver: allow disabling log collection 2019-12-08 14:15:03 -05:00
logmon_hook_unix_test.go deps: bump gopsutil to v3.21.2 2021-03-30 16:02:51 -04:00
plugin_supervisor_hook.go Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
remotetask_hook.go core: propagate remote task handles 2021-04-27 15:07:03 -07:00
script_check_hook.go e2e: consul namespace tests from nomad ent 2021-04-19 15:35:31 -06:00
script_check_hook_test.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
service_hook.go core: propagate remote task handles 2021-04-27 15:07:03 -07:00
service_hook_test.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
sids_hook.go client: PR cleanup - improved logging around kill task in SIDS hook 2020-01-31 19:05:23 -06:00
sids_hook_test.go tests: set consul token for nomad client for testing SIDS TR hook 2020-01-31 19:06:15 -06:00
stats_hook.go tests: deflake TestTaskRunner_StatsHook_Periodic (#9734) 2021-01-06 16:03:00 -05:00
stats_hook_test.go tests: deflake TestTaskRunner_StatsHook_Periodic (#9734) 2021-01-06 16:03:00 -05:00
task_dir_hook.go update template and artifact interpolation to use client-relative paths 2021-01-04 22:25:34 +00:00
task_runner.go drivers: Capture exit code when task is killed (#10494) 2021-05-04 10:54:00 -04:00
task_runner_getters.go lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
task_runner_hooks.go core: propagate remote task handles 2021-04-27 15:07:03 -07:00
task_runner_test.go drivers: Capture exit code when task is killed (#10494) 2021-05-04 10:54:00 -04:00
tasklet.go comments: cleanup some leftover debug comments and such 2020-01-31 19:04:35 -06:00
tasklet_test.go support script checks for task group services (#6197) 2019-09-03 15:09:04 -04:00
template_hook.go consul: plubming for specifying consul namespace in job/group 2021-04-05 10:03:19 -06:00
validate_hook.go s/0.13/1.0/g 2020-10-14 15:17:47 -07:00
validate_hook_test.go client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
vault_hook.go emit TaskRestartSignal event on vault restart 2019-02-22 15:56:14 -05:00
vault_hook_test.go client: support graceful shutdowns 2018-11-19 16:39:30 -08:00
volume_hook.go volumes: return better error messages for unsupported task drivers (#8030) 2020-05-21 09:18:02 -04:00
volume_hook_test.go volumes: return better error messages for unsupported task drivers (#8030) 2020-05-21 09:18:02 -04:00