Commit graph

3528 commits

Author SHA1 Message Date
Michael Schurter a20ac7c1de client: restore Terminated event on every exit
v0.9.0-dev started emitting a Terminated event every time a task process
exited. While this wasn't true in previous versions, it's a useful task
event because it's the only place for job operators to view the task's
exit code.

This behavior is asserted in the e2e/taskevents tests.
2019-01-17 10:02:25 -08:00
Danielle Tomlinson a695b3562c
Merge pull request #5193 from hashicorp/dani/logmon-reattach
logmon: Reattach to existing loggers
2019-01-16 17:34:13 +01:00
Danielle Tomlinson 99da4c780d logmon: Reattach to existing loggers
This commit prevents us from creating duplicate logmon hooks when
restoring allocations by persisting the logmon reattach config using
HookData.
2019-01-16 14:56:10 +01:00
Michael Schurter daa7d029a1 test: porting TestTaskRunner_SimpleRun_Dispatch
Porting test from 0.8 to 0.9.
2019-01-15 15:22:13 -08:00
Michael Schurter 48afda786b
Merge pull request #5187 from hashicorp/test-consul
Port a bunch of pre-0.9 Consul tests to 0.9
2019-01-15 07:41:50 -08:00
Alex Dadgar 471fdb3ccf
Merge pull request #5173 from hashicorp/b-log-levels
Plugins use parent loggers
2019-01-14 16:14:30 -08:00
Nick Ethier c619e70d39
Merge pull request #5018 from hashicorp/f-executor-stats
executor: streaming stats api
2019-01-14 15:02:35 -05:00
Michael Schurter 4e7ea460e8 test: port some pre-0.9 DeploymentHealth tests
Skipping a failing one as I need to move to some other work and don't
want to leave this work orphaned on my machine.
2019-01-14 09:56:53 -08:00
Michael Schurter ff2f23f5f9 test: assert service interpolation behavior
Ported from pre-0.9 tests.
2019-01-14 09:56:53 -08:00
Michael Schurter 5746be5844 test: add some extra logging 2019-01-14 09:56:53 -08:00
Michael Schurter e877bb6370 test: assert shutdown delay deregs first
Restore a pre-0.9 test that asserts Consul services are deregistered
before a task's shutdown delay.
2019-01-14 09:56:53 -08:00
Michael Schurter 1ca858fa92
Update client/allocrunner/taskrunner/stats_hook.go
Co-Authored-By: nickethier <ncethier@gmail.com>
2019-01-14 12:31:27 -05:00
Nick Ethier fbd403df96
tr: stop stats collection on Exited hook 2019-01-14 12:30:14 -05:00
Nick Ethier 597b7b751d
tr: add retry /w backoff to stats_hook failure 2019-01-12 12:18:24 -05:00
Nick Ethier 7e306afde3
executor: fix failing stats related test 2019-01-12 12:18:23 -05:00
Nick Ethier 9fea54e0dc
executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Preetha Appan 9e8dbf6a4b
linting fixes 2019-01-12 10:38:20 -06:00
Preetha Appan c94179578d
Make unit test for allocrunner failure much nicer 2019-01-12 10:38:20 -06:00
Preetha Appan da0d083b03
Add unit test to simulate alloc runner creation failure 2019-01-12 10:38:20 -06:00
Preetha Appan e7b59ac08c
Only set deployment health if not already set 2019-01-12 10:38:20 -06:00
Michael Schurter dbf4c3a3c8
Apply suggestions from code review
Co-Authored-By: preetapan <preetha@hashicorp.com>
2019-01-12 10:38:20 -06:00
Preetha Appan 7bd1440710
REfactor statedb factory config to set it directly in client config 2019-01-12 10:38:20 -06:00
Preetha Appan e237f19b38
Remove invalid allocs 2019-01-12 10:38:20 -06:00
Preetha Appan f059ef8a47
Modified destroy failure handling to rely on allocrunner's destroy method
Added a unit test with custom statedb implementation that errors, to
use to verify destroy errors
2019-01-12 10:37:12 -06:00
Preetha Appan 6c95da8f67
Add back code to mark alloc as failed when restore fails
Also modify restore such that any handled errors don't propagate
back to the client
2019-01-12 10:37:12 -06:00
Preetha Appan 5fde0b0f5c
Revert code that made an alloc update when restore fails
Restore currently shuts down the client so the alloc update cant
always make it to the server
2019-01-12 10:37:12 -06:00
Preetha Appan 41bfdd764b
Handle client initialization errors when adding allocs or restoring allocs
We mark the alloc as failed and track failed allocs so that we don't send
updates after the first time
2019-01-12 10:37:12 -06:00
Alex Dadgar 14ed757a56 Plugins use parent loggers
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
2019-01-11 11:36:37 -08:00
Danielle Tomlinson 3e586e93da client: Cleanup allocrunner access 2019-01-11 18:39:18 +01:00
Mahmood Ali c3eaa0f4c8 tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00
Alex Dadgar bd12e0b1f7
Merge pull request #5168 from hashicorp/b-kill-race
Improve Kill handling on task runner
2019-01-09 12:05:10 -08:00
Alex Dadgar 069e181e8f add more comments 2019-01-09 12:04:22 -08:00
Michael Schurter e5ddff861c
Spelling fix
Co-Authored-By: dadgar <alex@hashicorp.com>
2019-01-09 11:42:40 -08:00
Mahmood Ali 90f3cea187
Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Mahmood Ali ff48dbb8a9
Merge pull request #5163 from hashicorp/r-minor-changes-20180108
Fix a panic on node.Deregister fail
2019-01-09 09:56:00 -05:00
Mahmood Ali 1f2473263e fix more cases of logging arity errors 2019-01-09 09:22:47 -05:00
Mahmood Ali 4952f2a182
Merge pull request #5159 from hashicorp/r-macos-tests
Fix Travis MacOS job
2019-01-09 08:22:30 -05:00
Alex Dadgar 149dec2169 Improve Kill handling on task runner
This PR improves how killing a task is handled. Before the kill function
directly orchestrated the killing and was only valid while the task was
running. The new behavior is to mark the desired state and wait for the
task runner to converge to that state.
2019-01-08 16:42:26 -08:00
Mahmood Ali 9f7eb1bdfa tests: fix a test job constaints failing in macOS
Allow scheduling mock job when running on MacOS (or Windows) hosts.
2019-01-08 12:37:42 -05:00
Michael Schurter c24f4f94c1
Merge pull request #5151 from hashicorp/b-task-events
Emit Killing task events and add e2e tests
2019-01-08 09:33:04 -08:00
Mahmood Ali 6d36b52412 run gofmt 2019-01-08 11:15:38 -05:00
Michael Schurter 92f9cda5f4
Merge pull request #5035 from hashicorp/test-client
test: re-eanble periodic fingerprint test
2019-01-08 07:37:39 -08:00
Michael Schurter 5925424c7c client: emit Killing/Killed task events
We were just emitting Killed/Terminated events before. In v0.8 we
emitted Killing/Killed, but lacked Terminated when explicitly stopping
a task. This change makes it so Terminated is always included, whether
explicitly stopping a task or it exiting on its own.

New output:

2019-01-04T14:58:51-08:00  Killed            Task successfully killed
2019-01-04T14:58:51-08:00  Terminated        Exit Code: 130, Signal: 2
2019-01-04T14:58:51-08:00  Killing           Sent interrupt
2019-01-04T14:58:51-08:00  Leader Task Dead  Leader Task in Group dead
2019-01-04T14:58:49-08:00  Started           Task started by client
2019-01-04T14:58:49-08:00  Task Setup        Building Task Directory
2019-01-04T14:58:49-08:00  Received          Task received by client

Old (v0.8.6) output:

2019-01-04T22:14:54Z  Killed            Task successfully killed
2019-01-04T22:14:54Z  Killing           Sent interrupt. Waiting 5s before force killing
2019-01-04T22:14:54Z  Leader Task Dead  Leader Task in Group dead
2019-01-04T22:14:53Z  Started           Task started by client
2019-01-04T22:14:53Z  Task Setup        Building Task Directory
2019-01-04T22:14:53Z  Received          Task received by client
2019-01-08 07:20:54 -08:00
Michael Schurter 324e989327
Merge pull request #5034 from hashicorp/test-fix-races
Test fix races
2019-01-08 07:04:09 -08:00
Mahmood Ali 916a40bb9e move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali 9369b123de use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Mahmood Ali c10a8fd7fe remove deprecated allocrunner 2019-01-08 09:11:47 -05:00
Mahmood Ali f475a56087 remove always false parameter
Simplify allocDir.Build() function to avoid depending on client/structs,
and remove a parameter that's always set to `false`.

The motivation here is to avoid a dependency cycle between
drivers/cstructs and alloc_dir.
2019-01-08 09:11:47 -05:00
Danielle Tomlinson 8df20f49f7 drivers: Add internal interface for Shutdown
This allows us to correctly terminate internal state during runs of the
nomad test suite, e.g closing eventer contexts correctly.
2019-01-08 13:48:49 +01:00
Alex Dadgar edf132758d
Merge pull request #5152 from hashicorp/f-recover
Task runner recovers from external plugin exiting
2019-01-07 15:27:33 -08:00