plugins/driver: update driver interface to support streaming stats
client/tr: use streaming stats api
TODO:
* how to handle errors and closed channel during stats streaming
* prevent tight loop if Stats(ctx) returns an error
drivers: update drivers TaskStats RPC to handle streaming results
executor: better error handling in stats rpc
docker: better control and error handling of stats rpc
driver: allow stats to return a recoverable error
Use `/tmp` as temporary directory for docker driver tests, so tests can
run out of the box without any intervention.
macOS sets tempdir as `/var`, which Docker does not whitelist as a path
that can be bind-mounted.
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.
I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle. Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
This implements the InternalPluginDriver interface in each driver, and
calls the cancellation fn for their respective eventers.
This fixes a per task goroutine leak during test suite execution.
We ultimately decided to provide a limited set of devices in exec/java
drivers instead of all of host ones. Pre-0.9, we made all host devices
available to exec tasks accidentally, yet most applications only use a
small subset, and this choice limits our ability to restrict/isolate GPU
and other devices.
Starting with 0.9, by default, we only provide the same subset of
devices Docker provides, and allow users to provide more devices as
needed on case-by-case basis.
This reverts commit 5805c64a9f1c3b409693493dfa30e7136b9f547b.
This reverts commit ff9a4a17e59388dcab067949e0664f645b2f5bcf.
Use a dedicated /dev mount so we can inject more devices if necessary,
and avoid allowing a container to contaminate host /dev.
Follow up to https://github.com/hashicorp/nomad/pull/5143 - and fixes master.
Restores pre-0.9 behavior, where Nomad makes /dev available to exec
task. Switching to libcontainer, we accidentally made only a small
subset available.
Here, we err on the side of preserving behavior of 0.8, instead of going
for the sensible route, where only a reasonable subset of devices is
mounted by default and user can opt to request more.
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).
Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.
We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.