Commit Graph

605 Commits

Author SHA1 Message Date
Seth Hoenig a792c64f57 driver/docker: add integration test around setting memory_hard_limit 2020-06-01 12:00:47 -05:00
Seth Hoenig 675f50b502 driver/docker: use pointer parameter on driver because locks 2020-06-01 09:35:17 -05:00
Seth Hoenig ad91ba865c driver/docker: enable setting hard/soft memory limits
Fixes #2093

Enable configuring `memory_hard_limit` in the docker config stanza for tasks.
If set, this field will be passed to the container runtime as `--memory`, and
the `memory` configuration from the task resource configuration will be passed
as `--memory_reservation`, creating hard and soft memory limits for tasks using
the docker task driver.
2020-06-01 09:22:45 -05:00
Mahmood Ali 1fcc7970e4 tests: ensure that test is long enough to configure cgroups 2020-05-31 10:42:06 -04:00
Mahmood Ali 8ef1b85ce9 don't GC images in tests by default 2020-05-26 21:24:55 -04:00
Mahmood Ali d9543a1a80 tests: don't delete images after tests complete
Fix some docker test flakiness where image cleanup process may
contaminate other tests. A clean up process may attempt to delete an
image while it's used by another test.
2020-05-26 18:53:24 -04:00
Mahmood Ali 2588b3bc98 cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross aa8927abb4
volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Mahmood Ali 34b22047b7 Use an image managed by nomad account
This is a retag of stefanscherer/busybox-windows@sha256:af396324c4c62e369a388ebb38d4efd44211dc7c95a438e6feb62b4ae4194c5b
2020-05-15 12:55:22 -04:00
Mahmood Ali 766104c7a7 Use a pinned tag of stefanscherer/busybox-windows 2020-05-15 12:20:37 -04:00
Michele 0150fc4c54 Move appveyor tests to circle 2020-05-15 10:15:37 -04:00
Mahmood Ali 9721fd22f9 docker: Fix docker image gc tracking
This fixes a bug where docker images may not be GCed.  The cause of the
bug is that we track the task using `task.ID+task.Name` on task start
but remove on plain `task.ID`.

This haromize the two paths by using `task.ID`, as it's unique enough
and it's also used in the `loadImage` path (path when loading an image
from a local tarball instead of dockerhub).
2020-05-13 12:33:17 -04:00
Mahmood Ali 04a3cfbeff
Merge pull request #7932 from hashicorp/f-docker-custom-runtimes
Docker runtimes
2020-05-12 11:59:36 -04:00
Mahmood Ali 9f95a50129 update tests 2020-05-12 11:39:09 -04:00
Mahmood Ali 182b95f7b1 use allow_runtimes for consistency
Other allow lists use allow_ prefix (e.g. allow_caps, allow_privileged).
2020-05-12 11:03:08 -04:00
Mahmood Ali 54565e3836
Apply suggestions from code review
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-05-12 10:56:47 -04:00
Mahmood Ali 06c672cbf2 more tests 2020-05-12 10:14:54 -04:00
Mahmood Ali 0d692f0931 Add a knob to restrict docker runtimes 2020-05-12 10:14:43 -04:00
Juan Larriba a0df437c62
Run Linux Images (LCOW) and Windows Containers side by side (#7850)
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
2020-05-04 13:08:47 -04:00
Mahmood Ali dff071c3b9 driver/docker: protect against nil container
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one.  If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.

I'm not certain how we get into the state, except for being very
unlucky.  I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).

If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
2020-04-19 15:34:45 -04:00
Ben Buzbee 769a3cd8b3 Rename OCIRuntime to Runtime; allow gpu conflicts is they are the same runtime; add conflict test 2020-04-03 12:15:11 -07:00
Ben Buzbee d4f26d1eee Support custom docker runtimes
This enables customers who want to use gvisor and have it configured on their clients.
2020-04-03 11:07:37 -07:00
Mahmood Ali db4c263180
Merge pull request #7554 from benbuzbee/benbuz/fix-seccomp-file
Parse security_opts before sending them to docker daemon
2020-03-31 11:54:17 -04:00
Ben Buzbee 4f6ea87ec4 Parse security_opts before sending them to docker daemon
Fixes #6720

Copy the parsing function from the docker CLI. Docker daemon expects to see JSON for seccomp file not a path.
2020-03-31 08:34:41 -07:00
Mahmood Ali 7225055e80
Merge pull request #7550 from hashicorp/vendor-fsouza-go-docker-client-20200330
Vendor fsouza/go-docker-client update
2020-03-31 08:46:30 -04:00
Mahmood Ali 452a057a8c driver/docker: fix memory swapping
MemorySwappiness can only be set in non-Windows options: https://ci.appveyor.com/project/hashicorp/nomad/builds/31832149

Also fixes https://github.com/hashicorp/nomad/issues/6085
2020-03-30 16:51:16 -04:00
Mahmood Ali 4b6aee24bd
Merge pull request #7508 from greut/docker-drain-timer
docker: drain fingerprint timer
2020-03-30 16:37:53 -04:00
Yoan Blanc c9f6cf385a
Update drivers/docker/fingerprint.go
Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>
2020-03-30 22:11:42 +02:00
Mahmood Ali 8f57f78087 vendors: update fsouza/go-docker-client to v.1.6.3 2020-03-30 15:10:53 -04:00
Mahmood Ali 65d2fb5e32
Merge pull request #7531 from greut/docker-v19.03.8
Docker v19.03.8
2020-03-30 14:45:10 -04:00
Mahmood Ali 254fcd6c06 tests: attempt to deflake TestDockerDriver_PidsLimit
This is an attemp to deflake TestDockerDriver_PidsLimit by having one
more process and ensuring they run for longer.
2020-03-30 07:06:52 -04:00
Mahmood Ali 887292d757
Resolve docker types conflict
Looks like the latest `github.com/docker/docker/registry.ResolveAuthConfig` expect
`github.com/docker/docker/api/types.AuthConfig` rather than
`github.com/docker/cli/cli/config/types.AuthConfig`. The two types are
identical but live in different packages.

Here, we embed `registry.ResolveAuthConfig` from upstream repo, but with
the signature we need.
2020-03-28 17:29:06 +01:00
Yoan Blanc 1d92edbbbe
docker: v19.03.8
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-28 17:29:04 +01:00
Mahmood Ali 6283a44870
Merge pull request #7257 from bbckr/avoid-resolving-dot-in-named-pipe
Avoid resolving dotted segments when host path for volume is named pipe
2020-03-26 16:59:29 -04:00
Yoan Blanc 139a0ae451
fixup! docker: drain fingerprint timer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-26 16:02:20 +01:00
Yoan Blanc 5f0b3234f0
docker: drain fingerprint timer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-26 16:00:53 +01:00
Mahmood Ali fd5d033e32
Revert "vendor: fsouza/go-docker-client v1.6.3" 2020-03-23 10:48:47 -04:00
Yoan Blanc ed8dcccb54
docker: disable swap in Windows only
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-23 08:35:09 +01:00
Yoan Blanc d9ea68e807
fixup! fixup! vendor: fsouza/go-docker-client v1.6.3
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-22 10:04:52 +01:00
Yoan Blanc 8e744d1877
vendor: fsouza/go-docker-client v1.6.3
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-22 09:25:46 +01:00
Mahmood Ali 92712c48eb
Merge pull request #7236 from hashicorp/b-remove-rkt
Remove rkt as a built-in driver
2020-03-17 09:07:35 -04:00
Yoan Blanc c8e69a0427
docker: v18.09.9
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-15 08:23:11 +01:00
bckr 977e7ac8b3 Remove argument passing runtime GOOS 2020-03-03 15:39:43 +01:00
bckr 86a5ff9cb9 Fix too many arguments 2020-03-03 15:38:38 +01:00
Mahmood Ali 88cfe504a0 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
bckr fe6da3df88 Avoid resolving dotted segments when host path for volume is named pipe 2020-03-03 14:00:19 +01:00
Mahmood Ali a8d6950007 Remove rkt as a built-in driver
Rkt has been archived and is no longer an active project:
* https://github.com/rkt/rkt
* https://github.com/rkt/rkt/issues/4024

The rkt driver will continue to live as an external plugin.
2020-02-26 22:16:41 -05:00
Thomas Lefebvre 84baa950ce client: support no_pivot_root in exec driver configuration 2020-02-18 09:27:16 -08:00
Mahmood Ali ac80d62c84 Pass stats interval colleciton to executor
This fixes a bug where executor based drivers emit stats every second,
regardless of user configuration.

When serializing the Stats request across grpc, the nomad agent dropped
the Interval value, and then executor uses 1s as a default value.
2020-01-31 14:17:15 -05:00
John Schlederer 8b35c75206 Making pull activity timeout configurable in Docker
* Making pull activity timeout configurable in Docker plugin config, first pass

* Fixing broken function call

* Fixing broken tests

* Fixing linter suggestion

* Adding documentation on new parameter in Docker plugin config

* Adding unit test

* Setting min value for pull_activity_timeout, making pull activity duration a private var
2019-12-18 12:58:53 +01:00
Mahmood Ali 4a1cc67f58
Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob
driver: allow disabling log collection
2019-12-13 11:41:20 -05:00
Mahmood Ali 46bc3b57e6 address review comments 2019-12-13 11:21:00 -05:00
Mahmood Ali d80ae6765b simplify cgroup path lookup 2019-12-11 12:43:25 -05:00
Mahmood Ali 94ab62dfb4 executor: stop joining executor to container cgroup
Stop joining libcontainer executor process into the newly created task
container cgroup, to ensure that the cgroups are fully destroyed on
shutdown, and to make it consistent with other plugin processes.

Previously, executor process is added to the container cgroup so the
executor process resources get aggregated along with user processes in
our metric aggregation.

However, adding executor process to container cgroup adds some
complications with much benefits:

First, it complicates cleanup.  We must ensure that the executor is
removed from container cgroup on shutdown.  Though, we had a bug where
we missed removing it from the systemd cgroup.  Because executor uses
`containerState.CgroupPaths` on launch, which includes systemd, but
`cgroups.GetAllSubsystems` which doesn't.

Second, it may have advese side-effects.  When a user process is cpu
bound or uses too much memory, executor should remain functioning
without risk of being killed (by OOM killer) or throttled.

Third, it is inconsistent with other drivers and plugins.  Logmon and
DockerLogger processes aren't in the task cgroups.  Neither are
containerd processes, though it is equivalent to executor in
responsibility.

Fourth, in my experience when executor process moves cgroup while it's
running, the cgroup aggregation is odd.  The cgroup
`memory.usage_in_bytes` doesn't seem to capture the full memory usage of
the executor process and becomes a red-harring when investigating memory
issues.

For all the reasons above, I opted to have executor remain in nomad
agent cgroup and we can revisit this when we have a better story for
plugin process cgroup management.
2019-12-11 11:28:09 -05:00
Mahmood Ali 739e5e8811 drivers/exec: test all cgroups are destroyed 2019-12-11 11:12:29 -05:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Mahmood Ali 0b7085ba3a driver: allow disabling log collection
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs.  Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.

Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.

The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.

This is meant as a stop gap measure.  Ideally, we'd follow up with at
least two changes:

First, we should optimize behavior when we can such that operators don't
need to disable docker log collection.  Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.

Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand.  This ensures that the cost of log collection
is paid sparingly.
2019-12-08 14:15:03 -05:00
Mahmood Ali aa1c83871b drivers: always initialize taskHandle.logger
Looks like the RecoverTask doesn't set taskHandle.logger field causing
a panic when the handle attempts to log (e.g. when Shutdown or Signaling
fails).
2019-11-22 10:44:59 -05:00
Nick Ethier 729dd9018c
docker: set default cpu cfs period (#6737)
* docker: set default cpu cfs period

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2019-11-19 19:05:15 -05:00
Mahmood Ali bc893829bc changelog and comment 2019-11-19 15:51:08 -05:00
Mahmood Ali ea221cfe87 always destroy 2019-11-18 21:31:29 -05:00
Mahmood Ali abd700bf8f Add tests for orphaned processes 2019-11-18 21:31:29 -05:00
Tim Gross b1b20cd479
remove misleading networking log line (#6588)
When a job has a task group network, this log line ends up being
misleading if you're trying to debug networking issues. We really only
care about this when there's no port map set, in which case we get the
error returned anyways.
2019-10-30 13:23:33 -04:00
Mahmood Ali fe14993582 docs: Docker driver supports task user option
Also, add a test case.
2019-10-24 14:00:37 -04:00
Mahmood Ali 977b86f924 driver/docker: ensure that defaults are populated
Looks like we may need to pass default literal at each layer to be able,
so defaults are set properly.
2019-10-18 18:27:28 -04:00
Mahmood Ali 1bdfcdcab7 add timeouts for docker reconciler docker calls 2019-10-18 15:31:13 -04:00
Mahmood Ali 414e01b6a6 only set a single label for now
Other labels aren't strictly necessary here, and we may follow up with a
better way to customize.
2019-10-18 15:31:13 -04:00
Mahmood Ali 3aec7b56ea Only start reconciler once in main driver
driver.SetConfig is not appropriate for starting up reconciler
goroutine.  Some ephemeral driver instances are created for validating
config and we ought not to side-effecting goroutines for those.

We currently lack a lifecycle hook to inject these, so I picked the
`Fingerprinter` function for now, and reconciler should only run after
fingerprinter started.

Use `sync.Once` to ensure that we only start reconciler loop once.
2019-10-18 14:43:23 -04:00
Mahmood Ali ac3b555cc8 docker label refactoring and additional tests 2019-10-17 10:45:13 -04:00
Mahmood Ali e24c3fac56 add docker labels 2019-10-17 10:45:12 -04:00
Mahmood Ali 8739cc2a62 refactor reconciler code and address comments 2019-10-17 09:42:23 -04:00
Mahmood Ali c01c6de481 address code review comments 2019-10-17 08:36:02 -04:00
Mahmood Ali 2a63caafba docker: explicit grace period for initial container reconcilation
Ensure we wait for some grace period before killing docker containers
that may have launched in earlier nomad restore.
2019-10-17 08:36:02 -04:00
Mahmood Ali aa59280edc docker: periodically reconcile containers
When running at scale, it's possible that Docker Engine starts
containers successfully but gets wedged in a way where API call fails.
The Docker Engine may remain unavailable for arbitrary long time.

Here, we introduce a periodic reconcilation process that ensures that any
container started by nomad is tracked, and killed if is running
unexpectedly.

Basically, the periodic job inspects any container that isn't tracked in
its handlers.  A creation grace period is used to prevent killing newly
created containers that aren't registered yet.

Also, we aim to avoid killing unrelated containters started by host or
through raw_exec drivers.  The logic is to pattern against containers
environment variables and mounts to infer if they are an alloc docker
container.

Lastly, the periodic job can be disabled to avoid any interference if
need be.
2019-10-17 08:36:01 -04:00
Danielle Lancashire 4fbcc668d0
volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Nick Ethier 0c19bf6f04
executor: run exec commands in netns if set (#6405)
executor: run exec commands in netns if set
2019-10-01 14:45:43 -04:00
Nick Ethier 8b881d83d5
executor: rename wrapNetns to withNetworkIsolation 2019-09-30 21:38:31 -04:00
Nick Ethier 5127caef11
comment wrapNetns 2019-09-30 12:06:52 -04:00
Nick Ethier 67ac161565
executor: removed unused field from exec_utils.go 2019-09-30 11:57:34 -04:00
Nick Ethier 6fd773eb88
executor: run exec commands in netns if set 2019-09-30 11:50:22 -04:00
Tim Gross 9efca131be driver/java: pass task network isolation to executor
Without passing the network isolation configuration to the executor,
java tasks are not placed in the same network namespace as the other
processes in their task group, which breaks Consul Connect.
2019-09-27 08:26:54 -04:00
Tim Gross d965a15490 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Nick Ethier 53d3ea8ebd
driver: set correct network isolation caps for exec and java dr… (#6368) 2019-09-25 11:48:14 -04:00
rpramodd 0d09b564fa utils: add missing error info in case of cmd failure (#6355) 2019-09-24 09:33:27 -04:00
Mahmood Ali 1d945994d0 docker: remove containers on creation failures
The docker creation API calls may fail with http errors (e.g. timeout)
even if container was successfully created.

Here, we force remove container if we got unexpected failure.  We
already do this in some error handlers, and this commit updates all
paths.

I stopped short from a more aggressive refactoring, as the code is ripe
for refactoring and would rather do that in another PR.
2019-09-18 08:45:59 -04:00
Mahmood Ali 75ede5a685 add exponential backoff for docker api calls 2019-09-18 08:12:54 -04:00
Mahmood Ali ac329a5e07 retry transient docker errors within function 2019-09-13 15:25:31 -04:00
Mahmood Ali e8d73e3d72 docker: defensive against failed starts
This handles a bug where we may start a container successfully, yet we
fail due to retries and startContainer not being idempotent call.

Here, we ensure that when starting a container fails with 500 error,
the retry succeeds if container was started successfully.
2019-09-13 13:02:35 -04:00
Mahmood Ali 87f0457973 fix qemu and update docker with tests 2019-09-04 11:27:51 -04:00
Jasmine Dahilig 5b6e39b37c fix portmap envvars in docker driver 2019-09-04 11:26:13 -04:00
Michael Schurter 8fe42fccb0
Merge pull request #6000 from Iqoqo/docker-convert-host-paths-to-host-native
driver/docker: convert host bind path to os native
2019-09-03 09:34:56 -07:00
Danielle Lancashire 724586ba1d
docker: Fix driver spec
hclspec.NewLiteral does not quote its values, which caused `3m` to be
parsed as a nonsensical literal which broke the plugin loader during
initialization. By quoting the value here, it starts correctly.
2019-09-03 08:53:37 +02:00
Zhiguang Wang 832df1091b Add default value "3m" to image_delay, making it consistent with docs. 2019-09-02 16:40:00 +08:00
Mahmood Ali f98d4ee3f1 tests: enable raw_exec driver 2019-08-29 20:26:50 -04:00
Mahmood Ali 28e473aaff raw_exec: be defensive when disabled
Ensure that no raw_exec task can run on a client where it's disabled,
even if a flaw lead to client being assigned a raw_exec task
unexpectedly.
2019-08-29 09:09:40 -04:00
Danielle Lancashire fb63259921
docker: Fix issue where an exec may never timeout 2019-08-16 15:40:03 +02:00
Michael Schurter 83dbac65b2 docker: reword FromSlash(hostPath) comment 2019-08-12 14:38:31 -07:00
ilya guterman 92ce8a0a49 Update utils.go 2019-08-12 19:31:34 +03:00
Ilya Guterman c4b4d7fa43 add comment 2019-08-12 19:31:33 +03:00
Ilya Guterman 52aab40fb3 driver/docker: convert host bind path to os native
relative mounting can be specified using backslashes or forward slashes.
so no prior knowledge of host OS is needed for relative volumes mounting
2019-08-12 19:31:33 +03:00