* navigation and initial steps of guide
* generate certs with appropriate token
* configure Nomad to use TLS
* add cli keys and certs
* add server gossip encryption section
* fix mislabeled steps
* vault paths formatting
* remove bit about cert revocation
* add clarification in challenge that we will be securing an existing Nomad cluster
* add some comments to consul-template.hcl to help user walk through it
* clarifying comments for CLI certs templates
* reorganize steps, change permissions on certs, and sub pkill command with systemctl reload nomad
* correct step reference
* add rpc upgrade mode instructions
* correct typo
This is a follow up work to https://github.com/hashicorp/nomad/pull/4813 to fix#4809 and fix a regression introduced in 0.9 in marking files in libcontainer executable.
#4809 bug is that `lookupBin` uses `exec.LookPath` when not inspecting task dir files. `exec.LookPath` only returns a file if it's already marked as an executable path in https://github.com/golang/go/blob/go1.12.1/src/os/exec/lp_unix.go#L24-L27 . This affects raw exec as if passed an absolute path to file, `lookupBin` returns an error if file isn't already an executable. This explains why the error manifests when an absolute interpolated path is used (e.g. `${NOMAD_TASK_DIR}/hellov1`) but not when using a task rel dir (e.g. `local/hellov1`) in the above examples used in ticket.
PR #4813 remedied this problem for raw exec but inadvertably broke libcontainer executor, as it made `lookupBin` returns the paths to host files rather than ones found inside chroot.
This PR reorders the evaluation, so we go back to 0.8 behavior of looking up task directories first, but then check for host paths before using `exec.LookPath`.
This PR is broken into three commits to illustrate evolution and confirming hypothesis:
1. 9adab75ac84b5c2a7b84702fae02c2484abb50ad : Adding a test illustrating how libcontainer executor fails at marking processes as executable in https://travis-ci.org/hashicorp/nomad/jobs/514942694 - note that the test doesn't depend on artifacts or interpolated paths
2. d441cdd52f68912e96dc7ee5baf2dcddb0ac8caa: reverting PR #4809 and showing the test fail now with raw_exec case (as expected) in https://travis-ci.org/hashicorp/nomad/jobs/514944065
2. 244544b735a8dbac0e14e7ee85e12a39780cbbb9: in where we add the check in appropriate place next to `exec.LookPath(...)` for absolute paths and have a green job in https://travis-ci.org/hashicorp/nomad/jobs/514945024
## Future work
Inspecting `lookupBin` in 0.8 and 0.9 case, we have a bug in using `exec.LookPath` for the libcontainer executor case. We should be looking up paths based on the container chroot and container PATH rather than the host's. However, this is not a 0.9.0 regression and was present in 0.8; so punting to fix it post 0.9.
destCh was being written to by one goroutine and closed by another
goroutine. This panic occurred in Travis:
```
=== FAIL: drivers/docker TestDockerCoordinator_ConcurrentPulls (117.66s)
=== PAUSE TestDockerCoordinator_ConcurrentPulls
=== CONT TestDockerCoordinator_ConcurrentPulls
panic: send on closed channel
goroutine 5358 [running]:
github.com/hashicorp/nomad/drivers/docker.dockerStatsCollector(0xc0003a4a20, 0xc0003a49c0, 0x3b9aca00)
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats.go:108 +0x167
created by
github.com/hashicorp/nomad/drivers/docker.TestDriver_DockerStatsCollector
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats_test.go:33 +0x1ab
```
The 2 ways to fix this kind of error are to either (1) add extra
coordination around multiple goroutines writing to a chan or (2) make it
so only one goroutines writes to a chan.
I implemented (2) first as it's simpler, but @notnoop pointed out since
the same destCh in reused in the stats loop there's now a double close
panic possible!
So this implements (1) by adding a *usageSender struct for handling
concurrent senders and closing.
This PR switches to using plain fifo files instead of golang structs
managed by containerd/fifo library.
The library main benefit is management of opening fifo files. In Linux,
a reader `open()` request would block until a writer opens the file (and
vice-versa). The library uses goroutines so that it's the first IO
operation that blocks.
This benefit isn't really useful for us: Given that logmon simply
streams output in a separate process, blocking of opening or first read
is effectively the same.
The library additionally makes further complications for managing state
and tracking read/write permission that seems overhead for our use,
compared to using a file directly.
Looking here, I made the following incidental changes:
* document that we do handle if fifo files are already created, as we
rely on that behavior for logmon restarts
* use type system to lock read vs write: currently, fifo library returns
`io.ReadWriteCloser` even if fifo is opened for writing only!