The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.
However, that would lead to AR having two awkwardly similar methods:
- Alloc() - which returns the server-sent alloc
- ClientAlloc() - which returns the fully materialized client alloc
Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?
Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.
This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?
So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!
- AR.Alloc() is for static Allocation access (often for the Job)
- AR.AllocState() is for the dynamic AR & TR runtime state (deployment
status, task states, etc).
If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()
It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
httptest.ResponseRecorder exposes a bytes.Buffer which we were reading
and writing concurrently to test streaming log APIs. This is a race, so
I wrapped the struct in a lock with some helpers.
Multiple receivers raced for the WaitResult when killing tasks which
could lead to a deadlock if the "wrong" receiver won.
Wrap handlers in an ugly little proxy to avoid this. At first I wanted
to push this into drivers, but the result is tied to the TR's handle
lifecycle -- not the lifecycle of an alloc or task.
"Ask forgiveness, not permission."
Instead of peaking at TaskStates (which are no longer updated on the
AR.Alloc() view of the world) to only read logs for running tasks, just
try to read the logs and improve the error handling if they don't exist.
This should make log streaming less dependent on AR/TR behavior.
Also fixed a race where the log streamer could exit before reading an
error. This caused no logs or errors to be displayed sometimes when an
error occurred.
Driver plugin framework to facilitate development of driver plugins.
Implementing plugins only need to implement the DriverPlugin interface.
The framework proxies this interface to the go-plugin GRPC interface generated
from the driver.proto spec.
A testing harness is provided to allow implementing drivers to test the full
lifecycle of the driver plugin. An example use:
func TestMyDriver(t *testing.T) {
harness := NewDriverHarness(t, &MyDiverPlugin{})
// The harness implements the DriverPlugin interface and can be used as such
taskHandle, err := harness.StartTask(...)
}
* client/executor: refactor client to remove interpolation
* executor: POC libcontainer based executor
* vendor: use hashicorp libcontainer fork
* vendor: add libcontainer/nsenter dep
* executor: updated executor interface to simplify operations
* executor: implement logging pipe
* logmon: new logmon plugin to manage task logs
* driver/executor: use logmon for log management
* executor: fix tests and windows build
* executor: fix logging key names
* executor: fix test failures
* executor: add config field to toggle between using libcontainer and standard executors
* logmon: use discover utility to discover nomad executable
* executor: only call libcontainer-shim on main in linux
* logmon: use seperate path configs for stdout/stderr fifos
* executor: windows fixes
* executor: created reusable pid stats collection utility that can be used in an executor
* executor: update fifo.Open calls
* executor: fix build
* remove executor from docker driver
* executor: Shutdown func to kill and cleanup executor and its children
* executor: move linux specific universal executor funcs to seperate file
* move logmon initialization to a task runner hook
* client: doc fixes and renaming from code review
* taskrunner: use shared config struct for logmon fifo fields
* taskrunner: logmon only needs to be started once per task
Updated to hclog.
It exposed fields that required an unexported lock to access. Created a
getter methodn instead. Only old allocrunner currently used this
feature.
* vendor: bump libcontainer and docker to remove Sirupsen imports
* vendor: fix bad vendoring of archive package
* vendor: fix api changes to cgroups in executor
* vendor: fix docker api changes
* vendor: update github.com/Azure/go-ansiterm to use non capitalized logrus import
* UpdateState: set state, append event, persist, update servers
* EmitEvent: append event, persist, update servers
* AppendEvent: append event, persist
AppendEvent may not even have to persist, but for the sake of
correctness I'm going with that for now.
* Stopping an alloc is implemented via Updates but update hooks are
*not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
its own package to avoid cycles. Now only depends on alloc broadcaster
instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
latest version.
* Made AllocDir safe for concurrent use
Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.