Commit Graph

336 Commits

Author SHA1 Message Date
Alex Dadgar 72a5691897 Driver tests do not use hcl2/hcl, hclspec, or hclutils 2019-01-22 15:43:34 -08:00
Alex Dadgar b2c7268843 move reattach config 2019-01-22 15:11:58 -08:00
Alex Dadgar cdcd3c929c loader and singleton 2019-01-22 15:11:57 -08:00
Michael Schurter 9edff19625 test: port SignalFailure test from 0.8
Also fix signal error handling in mock_driver.
2019-01-22 08:08:08 -08:00
Nick Ethier b840a2eb7b
drivers: fix func naming 2019-01-18 18:31:02 -05:00
Nick Ethier e3c6f89b9a
drivers: use consts for task handle version 2019-01-18 18:31:01 -05:00
Nick Ethier 9dd4eb3581
drivers: add upgrade path for rawexec, java, rkt and qemu 2019-01-18 18:31:01 -05:00
Nick Ethier 6804450c69
cleanup code comments and small fixes from refactor 2019-01-18 18:31:01 -05:00
Nick Ethier 05bd369d1f
driver: add pre09 migration logic 2019-01-18 18:31:01 -05:00
Nick Ethier e5a6fc9271
executor: add pre 0.9 client and wrapper 2019-01-18 18:30:58 -05:00
Mahmood Ali 5df63fda7c
Merge pull request #5190 from hashicorp/f-memory-usage
Track Basic Memory Usage as reported by cgroups
2019-01-18 16:46:02 -05:00
Danielle Tomlinson b65bf78513 docker: Fix missing import 2019-01-17 18:44:27 +01:00
Danielle Tomlinson 7fca934509 chore: General Cleanup 2019-01-17 18:43:14 +01:00
Danielle Tomlinson e73962d8d6 docker: Only run Cleanup test on unix os' 2019-01-17 18:43:14 +01:00
Danielle Tomlinson af202f347f chore: goimports exec driver 2019-01-17 18:43:14 +01:00
Danielle Tomlinson 30a5e25d94 fixup: Typo in docker test 2019-01-17 18:43:14 +01:00
Danielle Tomlinson 82018cd030 chore: Fix docklog linting 2019-01-17 18:43:14 +01:00
Danielle Tomlinson 3b2ff2005b chore: Fix docker test linting
Due to https://github.com/tsenart/deadcode/issues/3 we can't specify
these consts on their own. This moves them into the _platform_test.go
files to avoid creating a package that only exposes a couple of values.
2019-01-17 18:43:14 +01:00
Danielle Tomlinson 15b1571882 drivers/exec: SIGINT unavailable on windows 2019-01-17 18:43:14 +01:00
Danielle Tomlinson d78120f890 rawexec: Fix Exec test on windows 2019-01-17 18:43:14 +01:00
Danielle Tomlinson 65457dd2f2 rawexec: SIGINT is not available on Windows 2019-01-17 18:43:14 +01:00
Danielle Tomlinson de86435cf8 docker: Test cleanup for windows
* Docker for Windows does not support ulimits
* Use filepath.ToSlash to test workdir
* Convert expected mount paths to system style
* Skip security-opt test on windows
  - Windows does not support seccomp, and it's unclear which options are
    available.
* Skip StartN due to lack of sigint
* docker: Use api to get image info on windows
* No bridge on windows
* Stop hardcoding /bin/
2019-01-17 18:43:14 +01:00
Danielle Tomlinson 3d2f0bb7da docker: ExpandPath tests validate slashpath 2019-01-17 18:43:13 +01:00
Danielle Tomlinson 37190a5595 dockerlogger: Fix tests on windows
Uses the home directory and windows path expansion, as c:\tmp doesn't
necessarily exist, and mktemp would involve unnecessarily complicating
the commands.
2019-01-17 18:43:13 +01:00
Danielle Tomlinson e6c0738b65 Expand unix build definition 2019-01-17 18:43:13 +01:00
Mahmood Ali 677e11da86 fix imports 2019-01-16 19:53:48 -05:00
Michael Schurter bea4d297b9
Merge pull request #5197 from hashicorp/b-rkt-cpu
rkt: revert to pre-0.9 --cpu flag
2019-01-16 15:05:21 -08:00
Preetha Appan 55319b05d1
clean up read access 2019-01-16 11:04:11 -06:00
Preetha Appan 469a286b1b
Refactor logging in drivers to use a tri-state boolean
Changes logging warnings/errors only if the state changes
from healthy to unhealthy
2019-01-16 10:19:31 -06:00
Preetha Appan 0c2c0a2d43
Make docker driver logging less redundant 2019-01-16 10:16:57 -06:00
Michael Schurter 1879a7f788 rkt: revert to pre-0.9 --cpu flag
See
https://github.com/hashicorp/nomad/issues/3394#issuecomment-453296121
for details. During 0.9 development we switched to shares, but we'd
prefer to maintain backward compat.
2019-01-15 13:15:28 -08:00
oleksii.shyman e41fbf7577 Add support for docker runtimes
- docker fingerprint issues a docker api system info call to get the
  list of supported OCI runtimes.
  - OCI runtimes are reported as comma separated list of names
  - docker driver is aware of GPU runtime presence
  - docker driver throws an error when user tries to run container with
  GPU, when GPU runtime is not present
  - docker GPU runtime name is configurable
2019-01-15 11:34:47 -08:00
Danielle Tomlinson b918d25e62
Merge pull request #5192 from hashicorp/dani/executor-close
executor: Always close stdout/stderr fifos
2019-01-15 17:49:04 +01:00
Danielle Tomlinson f120c8f8f6
Merge pull request #5184 from hashicorp/dani/b-logmon-reattach
docker: Terminate dockerlogger
2019-01-15 16:48:40 +01:00
Danielle Tomlinson 7f1ff3fab6 executor: Always close stdout/stderr fifos 2019-01-15 16:47:27 +01:00
Danielle Tomlinson 272a8726d7 docker: Terminate dockerlogger
Previously, we did not attempt to stop Docker Logger processes until
DestroyTask, which means that under many circumstances, we will never
successfully close the plugin client.

This commit terminates the plugin process when `run` terminates, or when
`DestroyTask` is called.

Steps to repro:

```
$ nomad agent -dev
$ nomad init
$ nomad run example.nomad
$ nomad stop example
$ ps aux | grep nomad # See docker logger process running
$ signal the dev agent
$ ps aux | grep nomad # See docker logger process running
```
2019-01-15 14:58:05 +01:00
Mahmood Ali 5649f72d27 propogate logs to executor plugin 2019-01-15 08:25:03 -05:00
Alex Dadgar 471fdb3ccf
Merge pull request #5173 from hashicorp/b-log-levels
Plugins use parent loggers
2019-01-14 16:14:30 -08:00
Mahmood Ali 9909d98bee Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00
Nick Ethier c619e70d39
Merge pull request #5018 from hashicorp/f-executor-stats
executor: streaming stats api
2019-01-14 15:02:35 -05:00
Nick Ethier a4534779d3
qemu: missing gofmt 2019-01-13 16:06:56 -05:00
Michael Schurter ff034ffbc9
Update drivers/qemu/driver_test.go
use t.Logf instead of fmt.Printf

Co-Authored-By: nickethier <ncethier@gmail.com>
2019-01-12 21:33:55 -05:00
Nick Ethier 3b395d7100
drivers: plumb grpc client logger 2019-01-12 12:18:23 -05:00
Nick Ethier 7e306afde3
executor: fix failing stats related test 2019-01-12 12:18:23 -05:00
Nick Ethier b0d9440474
docker: add test for stats collection 2019-01-12 12:18:22 -05:00
Nick Ethier 9fea54e0dc
executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Alex Dadgar bb6ea30f58 fix rkt use of executor 2019-01-11 11:36:37 -08:00
Alex Dadgar 14ed757a56 Plugins use parent loggers
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
2019-01-11 11:36:37 -08:00
Mahmood Ali 614f63f40d drivers/java: use libcontainer executor on java linux 2019-01-10 10:10:40 -05:00
Mahmood Ali 9740443703 tests: ignore _JAVA_OPTIONS line
ignore _JAVA_OPTIONS line in `java -version`, as it's relevant.
2019-01-10 10:10:40 -05:00
Mahmood Ali 5389ebae41
Merge pull request #5166 from hashicorp/b-docker-tests-mac
tests: run docker tests in macOS out of box
2019-01-09 13:07:37 -05:00
Mahmood Ali b08f59cdda
Merge pull request #5162 from hashicorp/f-extract-lxc
Extract LXC from nomad
2019-01-09 13:07:05 -05:00
Mahmood Ali 90f3cea187
Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Mahmood Ali ff48dbb8a9
Merge pull request #5163 from hashicorp/r-minor-changes-20180108
Fix a panic on node.Deregister fail
2019-01-09 09:56:00 -05:00
Mahmood Ali 1f2473263e fix more cases of logging arity errors 2019-01-09 09:22:47 -05:00
Mahmood Ali 4952f2a182
Merge pull request #5159 from hashicorp/r-macos-tests
Fix Travis MacOS job
2019-01-09 08:22:30 -05:00
Mahmood Ali c78ed7246f tests: run docker tests in macOS out of box
Use `/tmp` as temporary directory for docker driver tests, so tests can
run out of the box without any intervention.

macOS sets tempdir as `/var`, which Docker does not whitelist as a path
that can be bind-mounted.
2019-01-08 14:35:40 -05:00
Mahmood Ali 8f20bc8ce2
Merge pull request #5154 from hashicorp/f-revert-exec-devs
drivers/exec: restrict devices exposed to tasks
2019-01-08 12:43:06 -05:00
Mahmood Ali d19b92edec executor: add a comment detailing isolation 2019-01-08 12:10:26 -05:00
Mahmood Ali 62a7f951c0 remove lxc references 2019-01-08 09:28:20 -05:00
Mahmood Ali 426c981c34 Remove some dead code 2019-01-08 09:11:48 -05:00
Mahmood Ali 64f80343fc drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Mahmood Ali 916a40bb9e move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali 9369b123de use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Danielle Tomlinson a9b9ad34dc drivers: Implement InternalPluginDriver interface
This implements the InternalPluginDriver interface in each driver, and
calls the cancellation fn for their respective eventers.

This fixes a per task goroutine leak during test suite execution.
2019-01-08 13:49:31 +01:00
Alex Dadgar 0106f23aaa Review comments 2019-01-07 14:50:28 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Alex Dadgar f40f8ce02e Mock driver has recovery, stats 2019-01-07 14:49:40 -08:00
Alex Dadgar c9825a9c36 recover 2019-01-07 14:49:40 -08:00
Alex Dadgar 6c6e035dba add docker logger to separate main 2019-01-07 14:49:40 -08:00
Alex Dadgar 39542b4cf0 rkt fingerprint logs once 2019-01-07 14:49:40 -08:00
Alex Dadgar a6b36df4de remove nil logger 2019-01-07 14:48:01 -08:00
Mahmood Ali 58fb6812db tests: busybox only depends on arch
Busybox is compiled for linux only.  Making the file used in executor
tests even for non-linux targets, as having the file present has no
side-effects.
2019-01-07 08:36:32 -05:00
Mahmood Ali 0ba7b0c132 tests: helper function for checking docker presense 2019-01-07 08:27:06 -05:00
Mahmood Ali 796d625ab6 Skip tests requiring Docker deamon if not found. 2019-01-07 07:59:13 -05:00
Preetha Appan 2fb2de3cef
Standardize driver health description messages for all drivers 2019-01-06 22:06:38 -06:00
Preetha Appan 76c09c7cbf
remove unnecessary logging in rkt driver fingerprint method 2019-01-06 20:59:20 -06:00
Mahmood Ali 8797a4f0ea drivers/exec: restrict devices exposed to tasks
We ultimately decided to provide a limited set of devices in exec/java
drivers instead of all of host ones.  Pre-0.9, we made all host devices
available to exec tasks accidentally, yet most applications only use a
small subset, and this choice limits our ability to restrict/isolate GPU
and other devices.

Starting with 0.9, by default, we only provide the same subset of
devices Docker provides, and allow users to provide more devices as
needed on case-by-case basis.

This reverts commit 5805c64a9f1c3b409693493dfa30e7136b9f547b.
This reverts commit ff9a4a17e59388dcab067949e0664f645b2f5bcf.
2019-01-06 17:03:19 -05:00
Mahmood Ali 56e3171310
driver/exec: use dedicated /dev mount (#5147)
Use a dedicated /dev mount so we can inject more devices if necessary,
and avoid allowing a container to contaminate host /dev.

Follow up to https://github.com/hashicorp/nomad/pull/5143 - and fixes master.
2019-01-04 10:36:05 -05:00
Mahmood Ali 5b0702c9eb drivers/exec: bind mount /dev into rootfs
Restores pre-0.9 behavior, where Nomad makes /dev available to exec
task.  Switching to libcontainer, we accidentally made only a small
subset available.

Here, we err on the side of preserving behavior of 0.8, instead of going
for the sensible route, where only a reasonable subset of devices is
mounted by default and user can opt to request more.
2019-01-03 14:29:18 -05:00
Mahmood Ali d23f47736c drivers/exec: run as `nobody` by default
libcontainer based drivers (e.g. exec, java) should default to running
processes as `nobody` [1]; but libcontainer treats empty user as `root`
in our case (either because of default or due to `root` being current
user).

[1] 94c28a4c6c/website/source/docs/job-specification/task.html.md (task-parameters)
2019-01-03 14:29:18 -05:00
Danielle Tomlinson 6c9b9dc9f1 rkt: Return consistent error when not root 2018-12-20 13:02:46 +01:00
Danielle Tomlinson 6709de199b java: Return undetected when not running as root
This is an unrecoverable error, so we should only do this check once,
rather than returning unhealthy constantly.
2018-12-20 12:55:07 +01:00
Danielle Tomlinson 7b31027ea3 exec: Return undetected when not running as root
This is an unrecoverable error, so we should only do this check once,
rather than returning unhealthy constantly.
2018-12-20 12:54:19 +01:00
Nick Ethier ce1a5cba0e
drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Alex Dadgar bc55ec81b5 fix docker launching plugins 2018-12-18 16:48:01 -08:00
Alex Dadgar 730a6f5b9a lint 2018-12-18 16:48:00 -08:00
Alex Dadgar 4c57d2ec4d Add plugin API versioning to plugin loader and plugins 2018-12-18 16:48:00 -08:00
Alex Dadgar b8268d9a46 Lint 2018-12-18 15:50:44 -08:00
Alex Dadgar 327b551b39 Drivers 2018-12-18 15:50:11 -08:00
Alex Dadgar b9ee03b2c1 protos 2018-12-18 15:48:52 -08:00
Danielle Tomlinson b61da13c20 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Mahmood Ali 56dfdd0874
tests: fix rkt command environment (#5011)
The environment variables needed for envoking `rkt` command line
should include host PATH (to access `iptables`).

Given that the command runs outside the VM, untrusted task environment
variables should NOT be honored here.

We do this already with `rkt`, but the change is quite subtle to miss
when refactoring.
2018-12-15 20:25:36 -05:00
Mahmood Ali 168749ffd1
Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali e4f44b9be5 testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali 990a7d6776 driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali eaaaaf5c69 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali 6631d42bfa tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali 6b216a6015 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier 0c50a51c19
executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00