Remove runLaunched tracking as Run is *always* called for killable
TaskRunners. TaskRunners which fail before Run can be called (during
NewTaskRunner or Restore) are not killable as they're never added to the
client's alloc map.
This is a follow up work to https://github.com/hashicorp/nomad/pull/4813 to fix#4809 and fix a regression introduced in 0.9 in marking files in libcontainer executable.
#4809 bug is that `lookupBin` uses `exec.LookPath` when not inspecting task dir files. `exec.LookPath` only returns a file if it's already marked as an executable path in https://github.com/golang/go/blob/go1.12.1/src/os/exec/lp_unix.go#L24-L27 . This affects raw exec as if passed an absolute path to file, `lookupBin` returns an error if file isn't already an executable. This explains why the error manifests when an absolute interpolated path is used (e.g. `${NOMAD_TASK_DIR}/hellov1`) but not when using a task rel dir (e.g. `local/hellov1`) in the above examples used in ticket.
PR #4813 remedied this problem for raw exec but inadvertably broke libcontainer executor, as it made `lookupBin` returns the paths to host files rather than ones found inside chroot.
This PR reorders the evaluation, so we go back to 0.8 behavior of looking up task directories first, but then check for host paths before using `exec.LookPath`.
This PR is broken into three commits to illustrate evolution and confirming hypothesis:
1. 9adab75ac84b5c2a7b84702fae02c2484abb50ad : Adding a test illustrating how libcontainer executor fails at marking processes as executable in https://travis-ci.org/hashicorp/nomad/jobs/514942694 - note that the test doesn't depend on artifacts or interpolated paths
2. d441cdd52f68912e96dc7ee5baf2dcddb0ac8caa: reverting PR #4809 and showing the test fail now with raw_exec case (as expected) in https://travis-ci.org/hashicorp/nomad/jobs/514944065
2. 244544b735a8dbac0e14e7ee85e12a39780cbbb9: in where we add the check in appropriate place next to `exec.LookPath(...)` for absolute paths and have a green job in https://travis-ci.org/hashicorp/nomad/jobs/514945024
## Future work
Inspecting `lookupBin` in 0.8 and 0.9 case, we have a bug in using `exec.LookPath` for the libcontainer executor case. We should be looking up paths based on the container chroot and container PATH rather than the host's. However, this is not a 0.9.0 regression and was present in 0.8; so punting to fix it post 0.9.
destCh was being written to by one goroutine and closed by another
goroutine. This panic occurred in Travis:
```
=== FAIL: drivers/docker TestDockerCoordinator_ConcurrentPulls (117.66s)
=== PAUSE TestDockerCoordinator_ConcurrentPulls
=== CONT TestDockerCoordinator_ConcurrentPulls
panic: send on closed channel
goroutine 5358 [running]:
github.com/hashicorp/nomad/drivers/docker.dockerStatsCollector(0xc0003a4a20, 0xc0003a49c0, 0x3b9aca00)
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats.go:108 +0x167
created by
github.com/hashicorp/nomad/drivers/docker.TestDriver_DockerStatsCollector
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats_test.go:33 +0x1ab
```
The 2 ways to fix this kind of error are to either (1) add extra
coordination around multiple goroutines writing to a chan or (2) make it
so only one goroutines writes to a chan.
I implemented (2) first as it's simpler, but @notnoop pointed out since
the same destCh in reused in the stats loop there's now a double close
panic possible!
So this implements (1) by adding a *usageSender struct for handling
concurrent senders and closing.
This PR switches to using plain fifo files instead of golang structs
managed by containerd/fifo library.
The library main benefit is management of opening fifo files. In Linux,
a reader `open()` request would block until a writer opens the file (and
vice-versa). The library uses goroutines so that it's the first IO
operation that blocks.
This benefit isn't really useful for us: Given that logmon simply
streams output in a separate process, blocking of opening or first read
is effectively the same.
The library additionally makes further complications for managing state
and tracking read/write permission that seems overhead for our use,
compared to using a file directly.
Looking here, I made the following incidental changes:
* document that we do handle if fifo files are already created, as we
rely on that behavior for logmon restarts
* use type system to lock read vs write: currently, fifo library returns
`io.ReadWriteCloser` even if fifo is opened for writing only!
This capability will gate access to features that allow interacting with
a running allocation, for example, signalling, stopping, and rescheduling
specific allocations.