Commit graph

241 commits

Author SHA1 Message Date
Mahmood Ali 990a7d6776 driver/docker: stopping a dead container not error 2018-12-15 15:03:56 -05:00
Mahmood Ali eaaaaf5c69 tests: assert docker containers start 2018-12-15 15:03:56 -05:00
Mahmood Ali 6631d42bfa tests: try deflake TestDockerDriver_OOMKilled
Noticed an issue in Docker daemon failing to handle the OOM test case
failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 ,
and I suspect it's related to the process dying so quickly, and
potentially the way we are starting the task, so added a start up delay
and made it more consistent with other tests that don't seem as flaky.

The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR.

```
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0"
    testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18
    testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be
    testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
    testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"}
        "
```
2018-12-15 15:03:56 -05:00
Mahmood Ali 6b216a6015 tests: pin busybox image to a specific point tag
Using `:latest` tag is typically a cause of pain, as underlying image
changes behavior.  Here, I'm switching to using a point release, and
re-updating the stored tarballs with it.

Sadly, when saving/loading images, the repo digeset is not supported:
https://github.com/moby/moby/issues/22011 ; but using point releases
should mitigate the problem.

The motivation here is that docker tests have some flakiness due to
accidental importing of `busybox:latest` which has `/bin/nc` that no
longer supports `-p 0`:

```
$ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
nc: bad local port '0'
```

Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0`
as the test expect, but I would rather update busybox to fix.
2018-12-15 15:03:56 -05:00
Nick Ethier 0c50a51c19
executor: encode mounts and devices correctly when using grpc 2018-12-15 00:08:23 -05:00
Nick Ethier a771ee59aa
rawexec: fix misleading log 2018-12-14 23:40:37 -05:00
Nick Ethier 49e03542cc
executor: use int when encoding signal in RPC 2018-12-14 22:20:01 -05:00
Nick Ethier 09dadf0a23
Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Mahmood Ali f0ec27da3c
tests: ensure exec tests pass valid task resources (#4992)
Prior to 97f33bb1537d04905cb84199672bcdf46ebb4e65, executor cgroup validation errors were
silently ignored.  Enforcing them reveals test cases that missed them.

This doesn't change customer facing contract, as resource struct is
is either configured or we default to 100 (much higher than 2).
2018-12-12 20:40:38 -05:00
Mahmood Ali 74bd0be6ea drivers/exec: support device binds and mounts 2018-12-11 18:35:21 -05:00
Mahmood Ali 8726ab3b9e
Merge pull request #4985 from hashicorp/test-with-xenial
ci: Test with Ubuntu 16.04 in TravisCI
2018-12-11 18:00:39 -05:00
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Mahmood Ali 979a65486d tests: tag image explicitly 2018-12-11 17:59:45 -05:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali e6e71fb47a tests: skip checking rdma cgroup
rdma was added in most recent kernels and libcontainer/docker
don't isolate them by default.
2018-12-11 15:49:11 -05:00
Mahmood Ali 84ded28c6d
drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Mahmood Ali f6f39f1314 add a note about busybox license 2018-12-11 09:35:26 -05:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Mahmood Ali 23c07b9afe tests: update stop/kill tests with new pattern
Update rawexec and rkt stop/kill tests with the patterns introduced in
7a49e9b68e519050a0c2ef0b67c33503bfbc51be.  This implementation should be
more resilient to discrepancy between task stopping and task being marked as exited.
2018-12-11 09:35:26 -05:00
Mahmood Ali 8453ce7d56 tests: setup libcontainer rootfs
Using statically linked busybox binary to setup a basic rootfs for
testing, by symlinking it to provide the basic commands used in tests.

I considered using a proper rootfs tarball, but the overhead of managing
tarfile and expanding it seems significant enough that I went with this
implementation.
2018-12-11 09:35:26 -05:00
Mahmood Ali 97829a3f02 fix dtestutil.NewDriverHarness ref 2018-12-08 09:58:23 -05:00
Mahmood Ali 021d3720b5
Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Nick Ethier 35268fdb54
executor: misspell 2018-12-08 01:52:06 -05:00
Nick Ethier 86e9c11ec2
executor: don't drop errors when configuring libcontainer cfg, add nil check on resources 2018-12-07 14:03:42 -05:00
Mahmood Ali 7d5b5bb5f9
Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Nick Ethier 47df1dde10
Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier 19a695308f
executor: fix tests 2018-12-06 21:39:53 -05:00
Nick Ethier 913efed9f5
executor: fix broken non-linux build 2018-12-06 21:33:20 -05:00
Nick Ethier 2283cb2c39
executor: use drivers.Resources as resource model 2018-12-06 21:22:02 -05:00
Nick Ethier 29ef54c0ee
executor: merge plugin shim with executor package 2018-12-06 21:13:45 -05:00
Nick Ethier 71353a88d4
executor: remove structs package 2018-12-06 20:54:14 -05:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Mahmood Ali a7b205daf2
Merge pull request #4955 from hashicorp/fix-docker-tests-20181203
Fix docker driver tests
2018-12-06 16:41:33 -05:00
Mahmood Ali bdc53b1d8e driver/rkt: mount plugin devices 2018-12-06 15:46:35 -05:00
Mahmood Ali 2c0fd2a902 driver/lxc: mount plugin devices
Also, LXC requires target paths to be relative.  Container paths in LXC
binds should never be absolute paths, so we strip any preceeding `/`,
even if a user sets one.
2018-12-06 15:46:35 -05:00
Mahmood Ali 699875eb1c fixup: add missed docker utils test 2018-12-06 15:46:35 -05:00
Mahmood Ali e9557ae596 tests: ensure image is loaded as test setup 2018-12-06 15:36:43 -05:00
Nick Ethier 57ffece7f8
executor: update test references 2018-12-05 11:07:48 -05:00
Nick Ethier 02f4b0fac5
executor: update driver references 2018-12-05 11:04:18 -05:00
Nick Ethier 8b20de4801
executor: use grpc instead of netrpc as plugin protocol
* Added protobuf spec for executor
 * Seperated executor structs into their own package
2018-12-05 11:03:56 -05:00
Mahmood Ali b55fb642f1 driver/docker: honor plugin devices 2018-12-04 21:31:28 -05:00
Mahmood Ali a580cef986 refactor device manipulation 2018-12-04 20:55:59 -05:00
Mahmood Ali 3a18105d06 drivers/exec: refactor stop/kill tests
Simplify the tests to do all assertions within the main goroutine and
account for status propagation delay.
2018-12-04 20:34:43 -05:00
Mahmood Ali 428d35a5a9 executor: Keep 0.8.6 exit code for wait() failures
0.8.6 uses exit code 1 when `proc.Wait()` fails: https://github.com/hashicorp/nomad/blob/v0.8.6/client/driver/executor/executor.go#L442
2018-12-04 19:38:25 -05:00
Mahmood Ali 8df9de6fd5 driver/rkt: use rkt environment
The rkt command itself needs an environment with PATH set to find iptables.
2018-12-04 14:00:45 -05:00
Mahmood Ali 06a5cadf35 drivers/rkt: use image isolation for rkt 2018-12-04 11:40:10 -05:00
Mahmood Ali 178365848e tests: don't assert in WaitForResult
WaitForResult expects body to fail and retries few times before giving
up.  Assertions inside the testfn body causes it to terminate abruptly
without retrying.
2018-12-04 11:40:10 -05:00
Mahmood Ali f8ceeebf11
no t.Parallel() in excutor table driven tests (#4948)
When `t.Parallel()` is used inside a `t.Run()` sub-set, the closure
doesn't behave as expected, and some cases effectively get skipped.
More details can be found in
https://gist.github.com/posener/92a55c4cd441fc5e5e85f27bca008721
2018-12-04 09:04:04 -05:00
Mahmood Ali 216a2566c7
Update LXC with drivers/testutils changes (#4951) 2018-12-04 08:57:54 -05:00
Mahmood Ali c88e3723eb Fix docker tests
Some tests have containers that die almost immediately, and may die
and cleaned up before `driver.WaitUntilStarted` runs.

The causes for container dying seems special for each test:
* TestDockerDriver_Cleanup: `hello-world` image just emits a message and exits immediately
* TestDockerDriver_ForcePull_RepoDigest: the busybox image in `TestDockerDriver_ForcePull_RepoDigest` test didn't support `-p 0` argument
* TestDockerDriver_Entrypoint: with the entrypoint being `/bin/sh -c`, the command needs to be the entire string; otherwise, it ignores the comments
2018-12-03 23:08:52 -05:00
Mahmood Ali 2516cb16b9 Kill all container processes on shutdown
Currently, libcontainer-based executor, upon shutdown, kills the
container initial process.  The children of the killed process remain
running, and the executor is never marked as terminated until they do.

Also, fix a case where we treat processes as successful, when
`proc.Wait()` fails.  In some attempts, I was getting "waitid no child
processes" errors and such error shouldn't get process to be considered
successful.
2018-12-03 20:40:49 -05:00
Mahmood Ali bd8e4f1c15 Test Stopping a multi-process exec
Ensure that exec children processes get killed as well.
2018-12-03 20:40:19 -05:00
Danielle Tomlinson 10b3e68a6d
Merge pull request #4925 from hashicorp/f-driver-plugins-dani
Third Party Driver Plugins Support
2018-12-03 20:48:19 +01:00
Mahmood Ali 88622b97bd
libcontainer to manage /dev and /proc (#4945)
libcontainer already manages `/dev`, overriding task_dir - so let's use it for `/proc` as well and remove deadcode.
2018-12-03 10:41:01 -05:00
Danielle Tomlinson 393b76ed7f plugins: Move driver testing support to subpackage
this allows us to drop a cyclical import, but is subobptimal as it
requires BaseDriver tests to move. This falls firmly into the realm of
being a hack. Alternatives welcome.
2018-12-01 17:29:39 +01:00
Danielle Tomlinson 66c521ca17 client: Move fingerprint structs to pkg
This removes a cyclical dependency when importing client/structs from
dependencies of the plugin_loader, specifically, drivers. Due to
client/config also depending on the plugin_loader.

It also better reflects the ownership of fingerprint structs, as they
are fairly internal to the fingerprint manager.
2018-12-01 17:10:39 +01:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Mahmood Ali 84e04cfa40
Merge pull request #4926 from hashicorp/f-docker-image-ref
Use user provided image name to launch container
2018-11-30 07:27:39 -05:00
Mahmood Ali 94d43b8003
Merge pull request #4924 from hashicorp/f-docker-mounts
Support bind and tmpfs docker mounts
2018-11-30 07:27:17 -05:00
Danielle Tomlinson 2db5ae38d8 client: Rename drivers/shared/env => client/taskenv 2018-11-30 12:18:39 +01:00
Danielle Tomlinson f3a77b8084 client: Merge driver/shared/structs and client/structs 2018-11-30 10:56:45 +01:00
Danielle Tomlinson fdfe93aa25 fixup: executorplugin: fix rkt build 2018-11-30 10:47:08 +01:00
Danielle Tomlinson 04c8851b4c client: Migrate DriverStats optout to drivers/shared/structs 2018-11-30 10:46:13 +01:00
Danielle Tomlinson d582ea1d8b drivers: Create drivers/shared/structs
This creates a drivers/shared/structs package and moves the buffer size
checks into it.
2018-11-30 10:46:13 +01:00
Danielle Tomlinson 0544a57abe drivers: Move client/drivers/executor to drivers/shared/executor 2018-11-30 10:46:13 +01:00
Danielle Tomlinson 1a29811169 drivers: Move client/drivers/env to drivers/shared/env
As part of deprecating legacy drivers, we're moving the env package to a
new drivers/shared tree, as it is used by the modern docker and rkt
driver packages, and is useful for 3rd party plugins.
2018-11-30 10:46:13 +01:00
Preetha Appan 0d90ba392e
Fix lxc test panic 2018-11-28 13:56:17 -06:00
Preetha Appan 924f1b69e9
Fix failing lxc test 2018-11-28 11:05:35 -06:00
Preetha Appan bf58c65ef7
Fix LXC driver fingerprint to use typedattributes 2018-11-28 10:09:10 -06:00
Preetha Appan 9f4439243b
Fix docker driver to use new fingerprint typed attributes 2018-11-28 10:01:03 -06:00
Preetha Appan f89dbcd9cc
modify fingerprint interface to use typed attribute struct 2018-11-28 10:01:03 -06:00
Mahmood Ali 9af8deabbf address review comments 2018-11-27 21:40:43 -05:00
Mahmood Ali 6d34d2fade Add Driver Plugin for LXC 2018-11-27 21:40:43 -05:00
Michael Schurter e565c63eed gofmt -s -w drivers/rkt/driver_test.go 2018-11-27 17:24:23 -08:00
Mahmood Ali 844fd47acc Use user provided image name to launch container
This allows the container to be tagged with a user friendly image name
(e.g. `redis:3.2`) rather than the image ID (e.g.
`sha256:87856cc39862cec77541d68382e4867d7ccb29a85a17221446c857ddaebca916`).

Useful for human debugging, as well as some debugging and image scanning
tools.

This risks two bad changes:
1. Discrepancy in image resolution between docker and Nomad's image
loader.
  * I checked the image creation paths in Nomad, and noticed that we
either pulled the image or inspect the image with the user provided
name.

2. A race in image tagging where the tag is modified between image
loading and container creation.
  * I, personally, don't think this case is cause for concern, as it is
analogous to the task running a bit later.  As long as the image is
still present, creating the container should be good.
2018-11-27 16:12:15 -05:00
Mahmood Ali f6d6a50c39 add support for tmpfs 2018-11-27 07:20:17 -05:00
Mahmood Ali 0a09f5521d Support docker bind mounts 2018-11-27 07:20:17 -05:00
Chris Baker 3dd6ba514a drivers/rkt: updated test to include new AllocID field in TaskConfig 2018-11-26 21:37:58 +00:00
Chris Baker 9bd4317139 modified TaskConfig to include AllocID
use this for volume names in drivers/rkt to address #1150
2018-11-26 18:54:26 +00:00
Mahmood Ali 141092e46d Formatting and typo fixes 2018-11-25 11:53:21 -05:00
Mahmood Ali c61d99b525
Merge pull request #4908 from hashicorp/f-docker-opts-storageopt
Add support for docker storage options
2018-11-20 21:08:27 -05:00
Nick Ethier 1f3fe02e62
docker: sync access to exit result within a handle 2018-11-20 20:41:32 -05:00
Michael Schurter 2275153875
Apply suggestions from code review
Co-Authored-By: nickethier <ncethier@gmail.com>
2018-11-20 20:33:31 -05:00
Mahmood Ali e9e415f186 Add support for storage opt 2018-11-20 16:11:02 -05:00
Nick Ethier 3ccd359735
docker: unexport new coordinator func 2018-11-19 23:07:07 -05:00
Nick Ethier 8b9b2b476e
docker: add default blocks for driver plugin config schema 2018-11-19 22:59:18 -05:00
Nick Ethier 2667f48a5d
docker: move config RPCs to config.go 2018-11-19 22:59:18 -05:00
Nick Ethier aa9f45ae47
docker: fix tests 2018-11-19 22:59:18 -05:00
Nick Ethier 0f03e8f520
docker: remove container pointer from task handle 2018-11-19 22:59:18 -05:00
Nick Ethier ce4b867d21
docker: move volume driver options to seperate block 2018-11-19 22:59:18 -05:00
Nick Ethier fca2df3c79
docker: group common config into blocks 2018-11-19 22:59:17 -05:00
Michael Schurter 813341dd59
Apply suggestions from code review
Co-Authored-By: nickethier <ncethier@gmail.com>
2018-11-19 22:59:17 -05:00
Nick Ethier b7bd36db30
docker: remove global pull coordinator 2018-11-19 22:59:17 -05:00
Nick Ethier f0a86859a0
docker: remove call to global metrics instance 2018-11-19 22:59:17 -05:00
Nick Ethier 8ef73e63ce
docker: moved fingerprint code to it's own file 2018-11-19 22:59:17 -05:00
Nick Ethier 4be8a86ef9
plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver 2018-11-19 22:59:17 -05:00
Nick Ethier ced5d5c445
docker: move recoverable error proto to shared structs 2018-11-19 22:59:16 -05:00
Nick Ethier 585e468085
docker: implement recover task logic 2018-11-19 22:59:16 -05:00
Nick Ethier ee51cb6a93
docker: finished porting tests 2018-11-19 22:59:16 -05:00
Nick Ethier 3d7cdea19e
drivers/docker: more work porting tests from old driver plugin 2018-11-19 22:59:16 -05:00