Commit Graph

125 Commits

Author SHA1 Message Date
Alex Dadgar 79cfe26021 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Alex Dadgar f40f8ce02e Mock driver has recovery, stats 2019-01-07 14:49:40 -08:00
Alex Dadgar 3f24e4d6ca comments 2019-01-07 14:49:40 -08:00
Alex Dadgar 44dca19012 Fix hooks 2019-01-07 14:49:40 -08:00
Alex Dadgar c9825a9c36 recover 2019-01-07 14:49:40 -08:00
Mahmood Ali cd3c6cf60b taskrunner: emit TaskReceived event
Preserve pre-0.9, where task runner emits `Received: Task received by
client` event on task runner creation.
2019-01-04 14:32:29 -05:00
Danielle Tomlinson 35a4790740
Merge pull request #5142 from hashicorp/dani/cleanup-allocrunner-logs
allocrunner: Standardised discard logs
2019-01-03 18:40:48 +01:00
Danielle Tomlinson 29196ca70e allocrunner: Standardised discard logs
Follow up from https://github.com/hashicorp/nomad/pull/5007#pullrequestreview-186739124
2019-01-03 14:04:31 +01:00
Danielle Tomlinson 28aa34ea78 taskrunner: Persist environment from hooks
https://github.com/hashicorp/nomad/pull/5032 introduced a regression
where the origHookState was used in place of the response from the hook.
2019-01-03 13:13:57 +01:00
Alex Dadgar d7d32c2f61
Merge pull request #5032 from hashicorp/f-driver-env
Store device envs separately and pass to drivers
2018-12-20 13:38:27 -08:00
Nick Ethier 6c43ccf628
client: add proper build flag to allocrunner testing.go 2018-12-19 20:22:07 -05:00
Alex Dadgar 9d34802f7a Store device envs separately and pass to drivers 2018-12-19 14:23:09 -08:00
Michael Schurter c84998e996 tests: implement HasHealth for mock health 2018-12-19 10:39:27 -08:00
Michael Schurter d9ea8252a7 client/state: support upgrading from 0.8->0.9
Also persist and load DeploymentStatus to avoid rechecking health after
client restarts.
2018-12-19 10:39:27 -08:00
Michael Schurter 461599ff20 tr: fix HookState Copy() and Equal() methods
They did not take into account the Env field.
2018-12-19 09:58:06 -08:00
Danielle Tomlinson c580512d32 allocrunner: Close updates routine correctly 2018-12-19 18:32:51 +01:00
Nick Ethier ce1a5cba0e
drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Nick Ethier d8a0265e68
client: batch initial fingerprinting in plugin manangers
drivermanager: fix pr comments/feedback
2018-12-18 22:56:19 -05:00
Nick Ethier 7d23cbf448
client/drivermananger: fixup issues from rebase and address PR comments 2018-12-18 22:55:38 -05:00
Nick Ethier 1543335710
tr: deregister task handler on cleanup 2018-12-18 22:55:38 -05:00
Nick Ethier 82175d1328
client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar 9d1403d617
Merge pull request #5002 from hashicorp/b-task-config-resources
Convert driver resource to AllocatedTaskResource
2018-12-18 16:46:34 -08:00
Danielle Tomlinson 0edc65631a
Merge pull request #5007 from hashicorp/dani/f-allocrunner-async
allocrunner: Async api for shutdown/destroy/update
2018-12-19 01:26:41 +01:00
Alex Dadgar 8efac7ec81 Fix unit tests + upgrade pathing resources 2018-12-18 15:50:44 -08:00
Alex Dadgar b8268d9a46 Lint 2018-12-18 15:50:44 -08:00
Alex Dadgar 66cf3156b2 LinuxResources doesn't use task.Resources 2018-12-18 15:50:44 -08:00
Alex Dadgar 327b551b39 Drivers 2018-12-18 15:50:11 -08:00
Alex Dadgar b653ae2af7 utilities 2018-12-18 15:48:52 -08:00
Danielle Tomlinson 95a0c4fb29 taskrunner: Use a random suffix for Task Config
The RestartCount is not really suitable for use as a source of
uniqueness within task invocations as it is not monotonic, and interacts
with the restart stanza in a users config, so conflates restarts due to
task failures, with restarts due to enviromental changes, such as consul
template or vault secrets changing.

Here we instead use a substring from a uuid, which is more random than
we strictly need, but is nicer than rolling our own random string
generator here.
2018-12-19 00:38:54 +01:00
Danielle Tomlinson d6eb084d8a allocrunner: Drop and log updates after closing waitCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson 0d91285cd6 allocrunner: Documentation for ShutdownCh/DestroyCh 2018-12-18 23:38:34 +01:00
Danielle Tomlinson f2bb13818e fixup: Log when we detect out of order updates 2018-12-18 23:38:33 +01:00
Danielle Tomlinson 986fde0f5a allocrunner: Handle updates asynchronously
This creates a new buffered channel and goroutine on the allocrunner for
serializing updates to allocations. This allows us to take updates off
the routine that is used from processing updates from the server,
without having complicated machinery for tracking update lifetimes, or
other external synchronization.

This results in a nice performance improvement and signficantly better
throughput on batch changes such as preempting a large number of jobs
for a larger placement.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson d1fbac1aad allocrunner: Async shutdown and destroy
This commit reduces the locking required to shutdown or destroy
allocrunners, and allows parallel shutdown and destroy of allocrunners during
shutdown.
2018-12-18 23:38:33 +01:00
Danielle Tomlinson a50ea29da4 taskrunner: Use hook errors for artifacts 2018-12-17 10:39:38 +01:00
Danielle Tomlinson 3647b701a6 taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Alex Dadgar 20c59df8b9
Merge pull request #4969 from hashicorp/f-alloc-hooks
Make alloc health watcher a postrun hook rather than shutdown hook
2018-12-12 14:34:36 -08:00
Danielle Tomlinson 6fb5ca6ad5 allocrunner: Test alloc runners should include a noop migrator 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 83720575de client: Unify handling of previous and preempted allocs 2018-12-11 13:12:35 +01:00
Danielle Tomlinson dff7093243 client: Wait for preempted allocs to terminate
When starting an allocation that is preempting other allocs, we create a
new group allocation watcher, and then wait for the allocations to
terminate in the allocation PreRun hooks.

If there's no preempted allocations, then we simply provide a
NoopAllocWatcher.
2018-12-11 00:59:18 +01:00
Alex Dadgar c4b5f80918 Make alloc health watcher a postrun hook rather than shutdown hook 2018-12-06 12:30:31 -08:00
Danielle Tomlinson d043532cb0 allocrunner: Basic test alloc runner 2018-12-06 12:28:23 +01:00
Alex Dadgar b39c21d49c Fix various bugs with task events
Fixes the following:
* Emitting events when the task fails to start
* Don't double emit events on task shutdown (nomad stop)
* Don't emit a OOM kill metric unless actually OOM'd
2018-12-05 14:27:07 -08:00
Danielle Tomlinson 2db5ae38d8 client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson f3a77b8084 client: Merge driver/shared/structs and client/structs 2018-11-30 10:56:45 +01:00
Danielle Tomlinson d259c36844 driver: Flatten SetEnvvars into taskdirhook 2018-11-30 10:46:13 +01:00
Danielle Tomlinson 04c8851b4c client: Migrate DriverStats optout to drivers/shared/structs 2018-11-30 10:46:13 +01:00
Danielle Tomlinson 1a29811169 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Michael Schurter 3e56ee005a add nil check around task resources in device hook
Looking at NewTaskRunner I'm unsure whether TaskRunner.TaskResources
(from which req.TaskResources is set) is intended to be nil at times or
if the TODO in NewTaskRunner is intended to ensure it is always non-nil.
2018-11-27 17:25:33 -08:00