Commit graph

81 commits

Author SHA1 Message Date
Nick Ethier 7e306afde3
executor: fix failing stats related test 2019-01-12 12:18:23 -05:00
Nick Ethier b0d9440474
docker: add test for stats collection 2019-01-12 12:18:22 -05:00
Nick Ethier 9fea54e0dc
executor: implement streaming stats API
plugins/driver: update driver interface to support streaming stats

client/tr: use streaming stats api

TODO:
 * how to handle errors and closed channel during stats streaming
 * prevent tight loop if Stats(ctx) returns an error

drivers: update drivers TaskStats RPC to handle streaming results

executor: better error handling in stats rpc

docker: better control and error handling of stats rpc

driver: allow stats to return a recoverable error
2019-01-12 12:18:22 -05:00
Mahmood Ali 5389ebae41
Merge pull request #5166 from hashicorp/b-docker-tests-mac
tests: run docker tests in macOS out of box
2019-01-09 13:07:37 -05:00
Mahmood Ali 90f3cea187
Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Mahmood Ali 4952f2a182
Merge pull request #5159 from hashicorp/r-macos-tests
Fix Travis MacOS job
2019-01-09 08:22:30 -05:00
Mahmood Ali c78ed7246f tests: run docker tests in macOS out of box
Use `/tmp` as temporary directory for docker driver tests, so tests can
run out of the box without any intervention.

macOS sets tempdir as `/var`, which Docker does not whitelist as a path
that can be bind-mounted.
2019-01-08 14:35:40 -05:00
Mahmood Ali 64f80343fc drivers: re-export ResourceUsage structs
Re-export the ResourceUsage structs in drivers package to avoid drivers
directly depending on the internal client/structs package directly.

I attempted moving the structs to drivers, but that caused some import
cycles that was a bit hard to disentagle.  Alternatively, I added an
alias here that's sufficient for our purposes of avoiding external
drivers depend on internal packages, while allowing us to restructure
packages in future without breaking source compatibility.
2019-01-08 09:11:47 -05:00
Mahmood Ali 916a40bb9e move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Mahmood Ali 9369b123de use drivers.FSIsolation 2019-01-08 09:11:47 -05:00
Danielle Tomlinson a9b9ad34dc drivers: Implement InternalPluginDriver interface
This implements the InternalPluginDriver interface in each driver, and
calls the cancellation fn for their respective eventers.

This fixes a per task goroutine leak during test suite execution.
2019-01-08 13:49:31 +01:00
Alex Dadgar 0106f23aaa Review comments 2019-01-07 14:50:28 -08:00
Alex Dadgar 6c6e035dba add docker logger to separate main 2019-01-07 14:49:40 -08:00
Alex Dadgar a6b36df4de remove nil logger 2019-01-07 14:48:01 -08:00
Mahmood Ali 0ba7b0c132 tests: helper function for checking docker presense 2019-01-07 08:27:06 -05:00
Mahmood Ali 796d625ab6 Skip tests requiring Docker deamon if not found. 2019-01-07 07:59:13 -05:00
Preetha Appan 2fb2de3cef
Standardize driver health description messages for all drivers 2019-01-06 22:06:38 -06:00
Nick Ethier ce1a5cba0e
drivermanager: use allocID and task name to route task events 2018-12-18 23:01:51 -05:00
Alex Dadgar bc55ec81b5 fix docker launching plugins 2018-12-18 16:48:01 -08:00
Alex Dadgar 4c57d2ec4d Add plugin API versioning to plugin loader and plugins 2018-12-18 16:48:00 -08:00
Alex Dadgar 327b551b39 Drivers 2018-12-18 15:50:11 -08:00
Danielle Tomlinson b61da13c20 docker: Delete Task on Destroy
Currently the docker driver does not remove tasks from its state map
when destroying the task, which leads to issues when restarting tasks in
place, and leaks expired handles over time.
2018-12-18 15:53:31 +01:00
Mahmood Ali 168749ffd1
Merge pull request #5008 from hashicorp/b-docker-test-20181214
Fix flakiness in docker tests
2018-12-15 16:03:00 -05:00
Mahmood Ali e4f44b9be5 testes: remove TestDockerDriver_Kill
We already have two other Kill tests (e.g.
TestDockerDriver_Start_Kill_Wait and
TestDockerDriver_Start_KillTimeout), so don't need yet another flaky
test.
2018-12-15 15:03:56 -05:00
Mahmood Ali 990a7d6776 driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali eaaaaf5c69 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali 6631d42bfa tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali 6b216a6015 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier 0c50a51c19
executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Mahmood Ali 979a65486d tests: tag image explicitly 2018-12-11 17:59:45 -05:00
Mahmood Ali 84ded28c6d
drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Mahmood Ali 7d5b5bb5f9
Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Mahmood Ali 699875eb1c fixup: add missed docker utils test 2018-12-06 15:46:35 -05:00
Mahmood Ali e9557ae596 tests: ensure image is loaded as test setup 2018-12-06 15:36:43 -05:00
Mahmood Ali b55fb642f1 driver/docker: honor plugin devices 2018-12-04 21:31:28 -05:00
Mahmood Ali a580cef986 refactor device manipulation 2018-12-04 20:55:59 -05:00
Mahmood Ali c88e3723eb Fix docker tests
Some tests have containers that die almost immediately, and may die
and cleaned up before `driver.WaitUntilStarted` runs.

The causes for container dying seems special for each test:
* TestDockerDriver_Cleanup: `hello-world` image just emits a message and exits immediately
* TestDockerDriver_ForcePull_RepoDigest: the busybox image in `TestDockerDriver_ForcePull_RepoDigest` test didn't support `-p 0` argument
* TestDockerDriver_Entrypoint: with the entrypoint being `/bin/sh -c`, the command needs to be the entire string; otherwise, it ignores the comments
2018-12-03 23:08:52 -05:00
Danielle Tomlinson 393b76ed7f plugins: Move driver testing support to subpackage
this allows us to drop a cyclical import, but is subobptimal as it
requires BaseDriver tests to move. This falls firmly into the realm of
being a hack. Alternatives welcome.
2018-12-01 17:29:39 +01:00
Danielle Tomlinson 66c521ca17 client: Move fingerprint structs to pkg
This removes a cyclical dependency when importing client/structs from
dependencies of the plugin_loader, specifically, drivers. Due to
client/config also depending on the plugin_loader.

It also better reflects the ownership of fingerprint structs, as they
are fairly internal to the fingerprint manager.
2018-12-01 17:10:39 +01:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Mahmood Ali 84e04cfa40
Merge pull request #4926 from hashicorp/f-docker-image-ref
Use user provided image name to launch container
2018-11-30 07:27:39 -05:00
Mahmood Ali 94d43b8003
Merge pull request #4924 from hashicorp/f-docker-mounts
Support bind and tmpfs docker mounts
2018-11-30 07:27:17 -05:00
Danielle Tomlinson 2db5ae38d8 client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson f3a77b8084 client: Merge driver/shared/structs and client/structs 2018-11-30 10:56:45 +01:00
Danielle Tomlinson d582ea1d8b drivers: Create drivers/shared/structs
This creates a drivers/shared/structs package and moves the buffer size
checks into it.
2018-11-30 10:46:13 +01:00
Danielle Tomlinson 1a29811169 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Preetha Appan 9f4439243b
Fix docker driver to use new fingerprint typed attributes 2018-11-28 10:01:03 -06:00
Preetha Appan f89dbcd9cc
modify fingerprint interface to use typed attribute struct 2018-11-28 10:01:03 -06:00
Mahmood Ali 844fd47acc Use user provided image name to launch container
This allows the container to be tagged with a user friendly image name
(e.g. `redis:3.2`) rather than the image ID (e.g.
`sha256:87856cc39862cec77541d68382e4867d7ccb29a85a17221446c857ddaebca916`).

Useful for human debugging, as well as some debugging and image scanning
tools.

This risks two bad changes:
1. Discrepancy in image resolution between docker and Nomad's image
loader.
  * I checked the image creation paths in Nomad, and noticed that we
either pulled the image or inspect the image with the user provided
name.

2. A race in image tagging where the tag is modified between image
loading and container creation.
  * I, personally, don't think this case is cause for concern, as it is
analogous to the task running a bit later.  As long as the image is
still present, creating the container should be good.
2018-11-27 16:12:15 -05:00