Commit Graph

578 Commits

Author SHA1 Message Date
Seth Hoenig a8869bd304 docs: document docker signal fix, add tests
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.

Closes #8932 which was fixed in #8933
2020-10-02 10:06:43 -05:00
Mahmood Ali f4450db775 tests: use system path
On host with systemd-resolved, we copy /run/systemd/resolve/resolv.conf
actually.
2020-10-01 10:23:19 -04:00
Mahmood Ali f4b0aa0c1c tests: copy permissions when copying files
On the failover path, copy the permission bits (a.k.a. file mode),
specially the execution bit.
2020-10-01 10:23:14 -04:00
Mahmood Ali cd060db42a tests: ignore empty cgroup
My latest Vagrant box contains an empty cgroup name that isn't used for
isolation:

```
$ cat /proc/self/cgroup  | grep ::
0::/user.slice/user-1000.slice/session-17.scope
```
2020-10-01 10:23:13 -04:00
Mahmood Ali 91376cccf2 tests: failover to copying when symlinking fails
Symlinking busybox may fail when the test code and the test temporary
directory live on different volumes/partitions; so we should copy
instead.  This situation arises in the Vagrant setup, where the code
repository live on special file sharing volume.

Somewhat unrelated, remove `f.Sync()` invocation from a test copyFile
helper function.  Sync is useful only for crash recovery, and isn't
necessary in our test setup.  The sync invocation is a significant
overhead as it requires the OS to flush any cached writes to disk.
2020-09-30 09:58:22 -04:00
Seth Hoenig 6d9a6786e5
Merge pull request #8933 from jf/fix_docker_stopsignal
drivers/docker/driver.go: change default signal for docker driver to SIGTERM?
2020-09-29 10:51:04 -05:00
Seth Hoenig fd2a31a331 drivers/docker: detect arch for default infra_image
The 'docker.config.infra_image' would default to an amd64 container.
It is possible to reference the correct image for a platform using
the `runtime.GOARCH` variable, eliminating the need to explicitly set
the `infra_image` on non-amd64 platforms.

Also upgrade to Google's pause container version 3.1 from 3.0, which
includes some enhancements around process management.

Fixes #8926
2020-09-23 13:54:30 -05:00
Jeffrey 'jf' Lim b84d63c4ba drivers/docker/driver.go: change default signal for docker driver to SIGTERM? 2020-09-20 03:09:07 +08:00
Mahmood Ali d4f385d6e1
Upgrade to golang 1.15 (#8858)
Upgrade to golang 1.15

Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 .
This commit lifts the fix in https://github.com/creack/pty/pull/97 .
2020-09-09 15:59:29 -04:00
Shengjing Zhu 7a4f48795d Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Nick Ethier 1849a20b66
docker: use Nomad managed resolv.conf when DNS options are set (#8600) 2020-08-17 10:22:08 -04:00
James Rasell dab8282be5
Merge pull request #8589 from hashicorp/f-gh-5718
driver/docker: allow configurable pull context timeout setting.
2020-08-14 16:07:59 +02:00
James Rasell bc42cd2e5e
driver/docker: allow configurable pull context timeout setting.
Pulling large docker containers can take longer than the default
context timeout. Without a way to change this it is very hard for
users to utilise Nomad properly without hacky work arounds.

This change adds an optional pull_timeout config parameter which
gives operators the possibility to account for increase pull times
where needed. The infra docker image also has the option to set a
custom timeout to keep consistency.
2020-08-12 08:58:07 +01:00
Nick Ethier e39574be59
docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Drew Bailey 27b8cadcc4
removes nvidia import from docker test (#8312) 2020-06-30 09:34:59 -04:00
Shishir Mahajan 182e68ca7a
Add notes. 2020-06-25 13:46:45 -07:00
Shishir Mahajan 0bc2c835fe
Remove dead tests. 2020-06-25 13:22:46 -07:00
Mahmood Ali 998f80d4cb add a allowlist for qemu image paths 2020-06-24 08:03:19 -04:00
Mahmood Ali 5796719124 docker: disable host volume binding by default 2020-06-23 13:43:37 -04:00
Nick Ethier 1e4ea699ad fix test failures from rebase 2020-06-18 11:05:32 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Niam Jen Wei d2de515f0c
Fix docker driver MemorySwap value
Fixes an incorrect value being assigned to MemorySwap when `memory_hard_limit` flag is being used.

Issue raised in https://github.com/hashicorp/nomad/issues/8153
2020-06-12 20:11:28 +01:00
Seth Hoenig 4bfa0548d9
Merge pull request #8087 from hashicorp/f-docker-mem-config
driver/docker: enable setting hard/soft memory limits
2020-06-01 12:16:55 -05:00
Seth Hoenig a792c64f57 driver/docker: add integration test around setting memory_hard_limit 2020-06-01 12:00:47 -05:00
Seth Hoenig 675f50b502 driver/docker: use pointer parameter on driver because locks 2020-06-01 09:35:17 -05:00
Seth Hoenig ad91ba865c driver/docker: enable setting hard/soft memory limits
Fixes #2093

Enable configuring `memory_hard_limit` in the docker config stanza for tasks.
If set, this field will be passed to the container runtime as `--memory`, and
the `memory` configuration from the task resource configuration will be passed
as `--memory_reservation`, creating hard and soft memory limits for tasks using
the docker task driver.
2020-06-01 09:22:45 -05:00
Mahmood Ali 1fcc7970e4 tests: ensure that test is long enough to configure cgroups 2020-05-31 10:42:06 -04:00
Mahmood Ali 8ef1b85ce9 don't GC images in tests by default 2020-05-26 21:24:55 -04:00
Mahmood Ali d9543a1a80 tests: don't delete images after tests complete
Fix some docker test flakiness where image cleanup process may
contaminate other tests. A clean up process may attempt to delete an
image while it's used by another test.
2020-05-26 18:53:24 -04:00
Mahmood Ali 2588b3bc98 cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross aa8927abb4
volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Mahmood Ali 34b22047b7 Use an image managed by nomad account
This is a retag of stefanscherer/busybox-windows@sha256:af396324c4c62e369a388ebb38d4efd44211dc7c95a438e6feb62b4ae4194c5b
2020-05-15 12:55:22 -04:00
Mahmood Ali 766104c7a7 Use a pinned tag of stefanscherer/busybox-windows 2020-05-15 12:20:37 -04:00
Michele 0150fc4c54 Move appveyor tests to circle 2020-05-15 10:15:37 -04:00
Mahmood Ali 9721fd22f9 docker: Fix docker image gc tracking
This fixes a bug where docker images may not be GCed.  The cause of the
bug is that we track the task using `task.ID+task.Name` on task start
but remove on plain `task.ID`.

This haromize the two paths by using `task.ID`, as it's unique enough
and it's also used in the `loadImage` path (path when loading an image
from a local tarball instead of dockerhub).
2020-05-13 12:33:17 -04:00
Mahmood Ali 04a3cfbeff
Merge pull request #7932 from hashicorp/f-docker-custom-runtimes
Docker runtimes
2020-05-12 11:59:36 -04:00
Mahmood Ali 9f95a50129 update tests 2020-05-12 11:39:09 -04:00
Mahmood Ali 182b95f7b1 use allow_runtimes for consistency
Other allow lists use allow_ prefix (e.g. allow_caps, allow_privileged).
2020-05-12 11:03:08 -04:00
Mahmood Ali 54565e3836
Apply suggestions from code review
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-05-12 10:56:47 -04:00
Mahmood Ali 06c672cbf2 more tests 2020-05-12 10:14:54 -04:00
Mahmood Ali 0d692f0931 Add a knob to restrict docker runtimes 2020-05-12 10:14:43 -04:00
Juan Larriba a0df437c62
Run Linux Images (LCOW) and Windows Containers side by side (#7850)
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
2020-05-04 13:08:47 -04:00
Mahmood Ali dff071c3b9 driver/docker: protect against nil container
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one.  If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.

I'm not certain how we get into the state, except for being very
unlucky.  I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).

If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
2020-04-19 15:34:45 -04:00
Ben Buzbee 769a3cd8b3 Rename OCIRuntime to Runtime; allow gpu conflicts is they are the same runtime; add conflict test 2020-04-03 12:15:11 -07:00
Ben Buzbee d4f26d1eee Support custom docker runtimes
This enables customers who want to use gvisor and have it configured on their clients.
2020-04-03 11:07:37 -07:00
Mahmood Ali db4c263180
Merge pull request #7554 from benbuzbee/benbuz/fix-seccomp-file
Parse security_opts before sending them to docker daemon
2020-03-31 11:54:17 -04:00
Ben Buzbee 4f6ea87ec4 Parse security_opts before sending them to docker daemon
Fixes #6720

Copy the parsing function from the docker CLI. Docker daemon expects to see JSON for seccomp file not a path.
2020-03-31 08:34:41 -07:00
Mahmood Ali 7225055e80
Merge pull request #7550 from hashicorp/vendor-fsouza-go-docker-client-20200330
Vendor fsouza/go-docker-client update
2020-03-31 08:46:30 -04:00
Mahmood Ali 452a057a8c driver/docker: fix memory swapping
MemorySwappiness can only be set in non-Windows options: https://ci.appveyor.com/project/hashicorp/nomad/builds/31832149

Also fixes https://github.com/hashicorp/nomad/issues/6085
2020-03-30 16:51:16 -04:00
Mahmood Ali 4b6aee24bd
Merge pull request #7508 from greut/docker-drain-timer
docker: drain fingerprint timer
2020-03-30 16:37:53 -04:00