Commit Graph

12815 Commits

Author SHA1 Message Date
Nick Ethier 5742a6b932 plugin/drivers: rework eventer and change naming stream -> consumer 2018-10-16 16:56:55 -07:00
Michael Schurter 5f696608a6 tests: fix missing logger caused by bad merge 2018-10-16 16:56:55 -07:00
Michael Schurter 048510b13e tr: properly comment handle fields 2018-10-16 16:56:55 -07:00
Michael Schurter 9e49ed3464 ar: AllocState should not mutate ar.state
If ar.state.TaskStates has not been set, set it on the copy of ar.state.
That keeps ar.state manipulations in one location and allows AllocState
to only acquire read-locks.
2018-10-16 16:56:55 -07:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter dd4227f84a tests: make a test client/config easier to generate
Sadly can't move the fingerprint timeout tweak into the helper due to
circular imports.
2018-10-16 16:56:55 -07:00
Michael Schurter 1d747048ea tests: ensure task state is initialized in NewAR
Also expose NoopDB for use in tests.
2018-10-16 16:56:55 -07:00
Michael Schurter 6bcf772f3c tests: test via ServeMux so http codes are set 2018-10-16 16:56:55 -07:00
Michael Schurter 960f3be76c client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter fb4aa74153 client: add comment 2018-10-16 16:56:55 -07:00
Michael Schurter 9a7e6be2b6 client: fix potentially dropped streaming errors 2018-10-16 16:56:55 -07:00
Michael Schurter 4b44b9039b tr: remove unneeded lock; chan synchronizes access 2018-10-16 16:56:55 -07:00
Michael Schurter 1c9ccdeab5 tests: fix races caused by sharing a buffer
httptest.ResponseRecorder exposes a bytes.Buffer which we were reading
and writing concurrently to test streaming log APIs. This is a race, so
I wrapped the struct in a lock with some helpers.
2018-10-16 16:56:55 -07:00
Michael Schurter 211b96bb5c tr: fix shutdown/destroy/WaitResult handling
Multiple receivers raced for the WaitResult when killing tasks which
could lead to a deadlock if the "wrong" receiver won.

Wrap handlers in an ugly little proxy to avoid this. At first I wanted
to push this into drivers, but the result is tied to the TR's handle
lifecycle -- not the lifecycle of an alloc or task.
2018-10-16 16:56:55 -07:00
Michael Schurter 951ed17436 client: do not inspect task state to follow logs
"Ask forgiveness, not permission."

Instead of peaking at TaskStates (which are no longer updated on the
AR.Alloc() view of the world) to only read logs for running tasks, just
try to read the logs and improve the error handling if they don't exist.

This should make log streaming less dependent on AR/TR behavior.

Also fixed a race where the log streamer could exit before reading an
error. This caused no logs or errors to be displayed sometimes when an
error occurred.
2018-10-16 16:56:55 -07:00
Michael Schurter 2325348053 mock_driver: close waitCh after exiting
mock_driver wasn't behaving like other driver handles.
2018-10-16 16:56:55 -07:00
Michael Schurter 8d1419c62b client: fix accessing alloc runners
* GetClientAlloc() gains nothing from using allAllocs()
* getAllocatedResources was calling getAllocRunners() twice
2018-10-16 16:56:55 -07:00
Michael Schurter 55ab491801 tr: remove wip comments 2018-10-16 16:56:55 -07:00
Michael Schurter 3ccc091a72 ar: lock around accessing tasks
Specify that Alloc() does not return updated task states.
2018-10-16 16:56:55 -07:00
Alex Dadgar 84ce8c3487 extra logging 2018-10-16 16:56:55 -07:00
Alex Dadgar 6f0ed6184b Fix client reloading and pass the plugin loaders to server and client 2018-10-16 16:56:55 -07:00
Alex Dadgar 183561cf82 Plugin loader initialization 2018-10-16 16:54:12 -07:00
Alex Dadgar cc76555814 Internal plugin catalog 2018-10-16 16:53:31 -07:00
Nick Ethier a95cbc38ba drivers/raw_exec: sync access to task state 2018-10-16 16:53:31 -07:00
Nick Ethier 28e8e5852c drivers/raw_exec: added unix specific tests 2018-10-16 16:53:31 -07:00
Nick Ethier 352c05cdf4 plugin/drivers: plumb in stdout/stderr paths 2018-10-16 16:53:31 -07:00
Nick Ethier 1f6873806e raw_exec: move package outside of plugins dir 2018-10-16 16:53:31 -07:00
Nick Ethier 8b876e1cce fix package references after drivers/base subpackage removed 2018-10-16 16:53:31 -07:00
Nick Ethier 0e3f85222a driver/raw_exec: port existing raw_exec tests and add some testing utilities 2018-10-16 16:53:31 -07:00
Nick Ethier 8644e8508c driver/raw_exec: export driver config fields so they are encoded 2018-10-16 16:53:31 -07:00
Nick Ethier 3c17f50b29 lint: remove unused code and fix spelling 2018-10-16 16:53:31 -07:00
Nick Ethier d9628ff394 driver/raw_exec: more tests and bug fixes
added wrapper struct for plugin.ReattachConfig to better handle serialization
2018-10-16 16:53:31 -07:00
Nick Ethier 5617f3615b driver/raw_exec: initial raw_exec implementation 2018-10-16 16:53:31 -07:00
Nick Ethier bcc5c4a8bd clientv2: base driver plugin (#4671)
Driver plugin framework to facilitate development of driver plugins.

Implementing plugins only need to implement the DriverPlugin interface.
The framework proxies this interface to the go-plugin GRPC interface generated
from the driver.proto spec.

A testing harness is provided to allow implementing drivers to test the full
lifecycle of the driver plugin. An example use:

func TestMyDriver(t *testing.T) {
    harness := NewDriverHarness(t, &MyDiverPlugin{})
    // The harness implements the DriverPlugin interface and can be used as such
    taskHandle, err := harness.StartTask(...)
}
2018-10-16 16:53:31 -07:00
Michael Schurter 62c1285afc tr: add comments and cleanup call signature
From review comments on #4649 left post-merge.
2018-10-16 16:53:31 -07:00
Nick Ethier 5dee1141d1 executor v2 (#4656)
* client/executor: refactor client to remove interpolation

* executor: POC libcontainer based executor

* vendor: use hashicorp libcontainer fork

* vendor: add libcontainer/nsenter dep

* executor: updated executor interface to simplify operations

* executor: implement logging pipe

* logmon: new logmon plugin to manage task logs

* driver/executor: use logmon for log management

* executor: fix tests and windows build

* executor: fix logging key names

* executor: fix test failures

* executor: add config field to toggle between using libcontainer and standard executors

* logmon: use discover utility to discover nomad executable

* executor: only call libcontainer-shim on main in linux

* logmon: use seperate path configs for stdout/stderr fifos

* executor: windows fixes

* executor: created reusable pid stats collection utility that can be used in an executor

* executor: update fifo.Open calls

* executor: fix build

* remove executor from docker driver

* executor: Shutdown func to kill and cleanup executor and its children

* executor: move linux specific universal executor funcs to seperate file

* move logmon initialization to a task runner hook

* client: doc fixes and renaming from code review


* taskrunner: use shared config struct for logmon fifo fields

* taskrunner: logmon only needs to be started once per task
2018-10-16 16:53:31 -07:00
Michael Schurter e6e2930a00 tr: implement stats collection hook
Tested except for the net/rpc specific error case which may need
changing in the gRPC world.
2018-10-16 16:53:31 -07:00
Michael Schurter 86bd329539 fix build errors post merges 2018-10-16 16:53:31 -07:00
Michael Schurter a977e22028 test: cleanup mock consul service client
Updated to hclog.

It exposed fields that required an unexported lock to access. Created a
getter methodn instead. Only old allocrunner currently used this
feature.
2018-10-16 16:53:31 -07:00
Michael Schurter 6f92b04226 health_hook: simplify locking; test thoroughly
Use doneCh like @dadgar suggested in the original PR.

Thoroughly test hook as concurrent Update calls make for a tricky
concurrency problem.
2018-10-16 16:53:30 -07:00
Alex Dadgar cebfead6bc add logger back 2018-10-16 16:53:30 -07:00
Nick Ethier 03422aa529 fifo: add new fifo package for named pipes (#4665)
* fifo: add new fifo package for named pipes
2018-10-16 16:53:30 -07:00
Alex Dadgar 8504505c0d client uses passed logger and fix fingerprinters 2018-10-16 16:53:30 -07:00
Nick Ethier 66ff12e5f7 Update runc/libcontainer and friends (#4655)
* vendor: bump libcontainer and docker to remove Sirupsen imports

* vendor: fix bad vendoring of archive package

* vendor: fix api changes to cgroups in executor

* vendor: fix docker api changes

* vendor: update github.com/Azure/go-ansiterm to use non capitalized logrus import
2018-10-16 16:53:30 -07:00
Michael Schurter 195b8127fb health_hook: fix panic and add tests
Still more testing to do, but I want to get this panic fixed ASAP.

All new tests pass with -race
2018-10-16 16:53:30 -07:00
Michael Schurter 64efc3d301 Emit events before long operations
Append when there's nothing blocking between appending and sending an
update to the server.
2018-10-16 16:53:30 -07:00
Michael Schurter a2b696c4cf Use a semaphore to block until watcher exits 2018-10-16 16:53:30 -07:00
Michael Schurter a73162c977 ar: use multierror in update hook loop
Make it match TaskRunner update hook behavior
2018-10-16 16:53:30 -07:00
Michael Schurter a7b427718c tr: refactor EmitEvents into Emit+Append
* UpdateState: set state, append event, persist, update servers
* EmitEvent: append event, persist, update servers
* AppendEvent: append event, persist

AppendEvent may not even have to persist, but for the sake of
correctness I'm going with that for now.
2018-10-16 16:53:30 -07:00
Michael Schurter 93f3ac9ed6 ar: create health setting shim for health watcher 2018-10-16 16:53:30 -07:00