This PR switches to using plain fifo files instead of golang structs
managed by containerd/fifo library.
The library main benefit is management of opening fifo files. In Linux,
a reader `open()` request would block until a writer opens the file (and
vice-versa). The library uses goroutines so that it's the first IO
operation that blocks.
This benefit isn't really useful for us: Given that logmon simply
streams output in a separate process, blocking of opening or first read
is effectively the same.
The library additionally makes further complications for managing state
and tracking read/write permission that seems overhead for our use,
compared to using a file directly.
Looking here, I made the following incidental changes:
* document that we do handle if fifo files are already created, as we
rely on that behavior for logmon restarts
* use type system to lock read vs write: currently, fifo library returns
`io.ReadWriteCloser` even if fifo is opened for writing only!
I chose to make them more of integration tests since there's a lot more
plumbing involved. The internal implementation details of how we craft
task envs can now change and these tests will still properly assert the
task runtime environment is setup properly.
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.
The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).
This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
Some of the context uses in TR hooks are useless (Killed during Stop
never seems meaningful).
None of the hooks are interruptable for graceful shutdown which is
unfortunate and probably needs fixing.
Builds upon earlier commit that cleans up restored handles of terminal
allocs by also emitting terminated events and calling exited hooks when
appropriate.
The test is sadly quite complicated and peeks into things (logmon's
reattach config) AR doesn't normally have access to.
However, I couldn't find another way of asserting logmon got cleaned up
without resorting to smaller unit tests. Smaller unit tests risk
re-implementing dependencies in an unrealistic way, so I opted for an
ugly integration test.
This commit is a significant change. TR.Run is now always executed, even
for terminal allocations. This was changed to allow TR.Run to cleanup
(run stop hooks) if a handle was recovered.
This is intended to handle the case of Nomad receiving a
DesiredStatus=Stop allocation update, persisting it, but crashing before
stopping AR/TR.
The commit also renames task runner hook data as it was very easy to
accidently set state on Requests instead of Responses using the old
field names.
Port some integration tests of driver fingerprinting.
Some tests (e.g. `TestFingerprintManager_Run_DriversInBlacklist`) have
been subsituted by more isolated tests in
`client/pluginmanager/drivermanager/manager_test.go`
This code chooses to be conservative as opposed to optimal: when failing
to reattach to logmon simply return a recoverable error instead of
immediately trying to restart logmon.
The recoverable error will cause the task's restart policy to be
applied and a new logmon will be launched upon restart.
Trying to do the optimal approach of simply starting a new logmon
requires error string comparison and should be tested against a task
actively logging to assert the behavior (are writes blocked? dropped?).