Commit graph

3289 commits

Author SHA1 Message Date
Nick Ethier b0ddc03409
Merge pull request #4765 from jippi/increase-line-scan-limit
fix: increase log rotator line scan limit
2018-10-29 18:46:30 -07:00
Nick Ethier 3fcf8ba7e6
Merge pull request #4795 from hashicorp/f-plugin-config
Pass client configuration to plugins through loader
2018-10-29 18:42:27 -07:00
Nick Ethier bda3b1d3b3
rename NomadConfig to ClientAgentConfig 2018-10-29 21:34:34 -04:00
Michael Schurter 6f2cffb196
Merge pull request #4803 from hashicorp/b-leader-fixes
AR Fixes: task leader handling, restoring, state updating, AR.Destroy deadlocks
2018-10-29 17:38:59 -05:00
Michael Schurter d71a1b4547 tests: more fixes due to api changes 2018-10-29 15:25:22 -07:00
Preetha Appan b85cc38f3d
Stat path to binary to handle raw exec driver interpolated binary path 2018-10-26 17:24:05 -05:00
Preetha Appan 55ac8d3d12
Fix test linting 2018-10-26 10:30:12 -05:00
Michael Schurter b7a9d61a38 ar: initialize allocwatcher on restore
Fixes a panic. Left a comment on how the behavior could be improved, but
this is what releases <0.9.0 did.
2018-10-19 09:45:45 -07:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Nick Ethier 58b430edae
added driver specific client config struct to plugin configuration 2018-10-18 23:31:01 -04:00
Michael Schurter cefbf00bf0 ar: refactor task killing into 1 method
Update comments and address some PR comments from #4775
2018-10-17 10:06:59 -07:00
Michael Schurter 21d78be961 tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
Michael Schurter 222f6b5741 ar: fix task leader, update, and stop handling 2018-10-17 10:06:59 -07:00
Michael Schurter 1badbb2fc4 tr: cleanup hook logs 2018-10-17 09:42:32 -07:00
Nick Ethier 65adb80ebf
plumb NomadConfig into plugins 2018-10-16 22:47:22 -04:00
Nick Ethier d94b631b6b
drivers/exec: add exec implementation 2018-10-16 22:45:28 -04:00
Michael Schurter 0baaba8b09 templates: fix tests 2018-10-16 16:56:57 -07:00
Michael Schurter 838ddf4d4a fix linter errors 2018-10-16 16:56:57 -07:00
Michael Schurter e27c82ea4d client: remove unused handleproxy 2018-10-16 16:56:56 -07:00
Michael Schurter 4ea5217d72 tr: remove unused DriverHandle interface
was causing typed nil interface panics and served no purpose
2018-10-16 16:56:56 -07:00
Michael Schurter 528c426c53 Port client portion of #4392 to new taskrunner
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
2018-10-16 16:56:56 -07:00
Michael Schurter f12501d4c3 tr: implement dispatch payload hook
Now passing the TaskDir struct to prestart hooks instead of just the
root task dir itself as dispatch needs local/.
2018-10-16 16:56:56 -07:00
Nick Ethier d9f0cbf4a9 client: log retry during driver fingerprint redispense 2018-10-16 16:56:56 -07:00
Nick Ethier c7ac1186c9 client: add test for driverfailure during fingerprinting 2018-10-16 16:56:56 -07:00
Nick Ethier 8cf669b5aa taskrunner: return error on waitCh 2018-10-16 16:56:56 -07:00
Nick Ethier 047fad2953 client: simplify driver plugin logic from review comments 2018-10-16 16:56:56 -07:00
Nick Ethier 9686e1b258 client: fix broked tests from refactoring 2018-10-16 16:56:56 -07:00
Nick Ethier 3183b33d24 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Nick Ethier f192c3752a client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Nick Ethier 4a4c7dbbfc client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Alex Dadgar 7946a14aa8 Fix lints 2018-10-16 16:56:56 -07:00
Alex Dadgar 89dafaaea9 compile on windows 2018-10-16 16:56:56 -07:00
Alex Dadgar ad4fac526c more test fixes 2018-10-16 16:56:56 -07:00
Alex Dadgar 45e41cca03 allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar 9baa7402ef fix test compiling 2018-10-16 16:56:55 -07:00
Alex Dadgar 7d9c069f09 skip building deprecated files 2018-10-16 16:56:55 -07:00
Alex Dadgar 6c9d9d5173 move files around 2018-10-16 16:56:55 -07:00
Michael Schurter 5f696608a6 tests: fix missing logger caused by bad merge 2018-10-16 16:56:55 -07:00
Michael Schurter 048510b13e tr: properly comment handle fields 2018-10-16 16:56:55 -07:00
Michael Schurter 9e49ed3464 ar: AllocState should not mutate ar.state
If ar.state.TaskStates has not been set, set it on the copy of ar.state.
That keeps ar.state manipulations in one location and allows AllocState
to only acquire read-locks.
2018-10-16 16:56:55 -07:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter dd4227f84a tests: make a test client/config easier to generate
Sadly can't move the fingerprint timeout tweak into the helper due to
circular imports.
2018-10-16 16:56:55 -07:00
Michael Schurter 1d747048ea tests: ensure task state is initialized in NewAR
Also expose NoopDB for use in tests.
2018-10-16 16:56:55 -07:00
Michael Schurter 960f3be76c client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter fb4aa74153 client: add comment 2018-10-16 16:56:55 -07:00
Michael Schurter 9a7e6be2b6 client: fix potentially dropped streaming errors 2018-10-16 16:56:55 -07:00
Michael Schurter 4b44b9039b tr: remove unneeded lock; chan synchronizes access 2018-10-16 16:56:55 -07:00
Michael Schurter 211b96bb5c tr: fix shutdown/destroy/WaitResult handling
Multiple receivers raced for the WaitResult when killing tasks which
could lead to a deadlock if the "wrong" receiver won.

Wrap handlers in an ugly little proxy to avoid this. At first I wanted
to push this into drivers, but the result is tied to the TR's handle
lifecycle -- not the lifecycle of an alloc or task.
2018-10-16 16:56:55 -07:00
Michael Schurter 951ed17436 client: do not inspect task state to follow logs
"Ask forgiveness, not permission."

Instead of peaking at TaskStates (which are no longer updated on the
AR.Alloc() view of the world) to only read logs for running tasks, just
try to read the logs and improve the error handling if they don't exist.

This should make log streaming less dependent on AR/TR behavior.

Also fixed a race where the log streamer could exit before reading an
error. This caused no logs or errors to be displayed sometimes when an
error occurred.
2018-10-16 16:56:55 -07:00
Michael Schurter 2325348053 mock_driver: close waitCh after exiting
mock_driver wasn't behaving like other driver handles.
2018-10-16 16:56:55 -07:00