open-nomad

Author	SHA1	Message	Date
Seth Hoenig	1e75f99839	drivers/docker+exec+java: disable net_raw capability by default The default Linux Capabilities set enabled by the docker, exec, and java task drivers includes CAP_NET_RAW (for making ping just work), which has the side affect of opening an ARP DoS/MiTM attack between tasks using bridge networking on the same host network. https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities This PR disables CAP_NET_RAW for the docker, exec, and java task drivers. The previous behavior can be restored for docker using the allow_caps docker plugin configuration option. A future version of nomad will enable similar configurability for the exec and java task drivers.	2021-05-12 13:22:09 -07:00
Isabel Suchanek	ed9e12cdc7	Clean up docker driver test to make it less flaky (#10559 ) Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2021-05-10 14:58:19 -07:00
Isabel Suchanek	b5a2f48c78	Fix test panic in docker driver test	2021-05-07 12:12:33 -07:00
Isabel Suchanek	cb4fc53353	drivers/docker: add support for STOPSIGNAL This fixes a bug where Nomad overrides a Dockerfile's STOPSIGNAL with the default kill_signal (SIGTERM). This adds a check for kill_signal. If it's not set, it calls StopContainer instead of Signal, which uses STOPSIGNAL if it's specified. If both kill_signal and STOPSIGNAL are set, Nomad tries to stop the container with kill_signal first, before then calling StopContainer. Fixes #9989	2021-05-05 10:27:58 -07:00
Tim Gross	cf838f49e1	docker: improve error message for auth helper The error returned from the stdlib's `exec` package is always a message with the exit code of the exec'd process, not any error message that process might have given us. This results in opaque failures for the Nomad user. Cast to an `ExitError` so that we can access the output from stderr.	2021-05-03 11:30:12 -04:00
Nick Ethier	9d194bb2d9	driver/docker: ignore error if container exists before cgroup can be written	2021-04-19 23:38:35 -04:00
Nick Ethier	c9216ba7d9	drivers/docker: move cgroups logic to linux build file	2021-04-15 10:39:11 -04:00
Nick Ethier	390c4c5119	docker: add support for cpuset cgroup management	2021-04-15 10:24:31 -04:00
Yoan Blanc	ac0d5d8bd3	chore: bump golangci-lint from v1.24 to v1.39 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2021-04-03 09:50:23 +02:00
Mahmood Ali	9ff7220588	reuse existing function and typo fix	2021-04-02 11:56:27 -04:00
Mahmood Ali	565496e6ba	drivers/docker: account for cgroup-v2 memory stats If the docker engine is running on cgroup-v2 host, then RSS and Max Usage doesn't get reported. Using a heauristic here to avoid adding more API calls to the Docker Engine to infer cgroups version. Also, opted to avoid coordinating stats collection with fingerprinting, which adds concurrency complexities.	2021-04-01 12:23:57 -04:00
Tim Gross	e76eeeb848	drivers/docker: fix flaky image coordinator test The test assertion that we don't have a delete future remaining races with the code its testing, because the removal of the image and the removal of the future are not atomic. Move this assertion into a `WaitForResult` to avoid test flakes which we're seeing on CI on Windows in particular.	2021-03-31 15:59:01 -04:00
Mahmood Ali	275feb5bec	oversubscription: docker to honor MemoryMaxMB values	2021-03-30 16:55:58 -04:00
Florian Apolloner	a0873d5da4	docker: support configuring default log driver in plugin options	2021-03-12 16:04:33 -05:00
Adrian Todorov	47e1cb11df	driver/docker: add extra labels ( job name, task and task group name)	2021-03-08 08:59:52 -05:00
Nick Ethier	d2f192821e	drivers/docker: support mapping multiple host ports to the same container port	2021-02-02 22:54:23 -05:00
Tim Gross	987cdb3a69	prefer TrimPrefix to checking HasPrefix first	2021-01-22 13:41:28 -05:00
Huan Wang	ba8b2297b1	fix the inconsistency handling between infra image and normal task image	2021-01-22 13:41:28 -05:00
Kris Hicks	7694a66414	Don't prepend https to docker cred helper call (#9852 ) Some credential helpers, like the ECR helper, will strip the protocol if given. Others, like the linux "pass" helper, do not.	2021-01-21 11:46:59 -08:00
Mahmood Ali	de954da350	docker: introduce a new hcl2-friendly `mount` syntax (#9635 ) Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label: ```hcl config { image = "..." mount { type = "..." target = "target-path" volume_options { ... } } } ``` The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 . The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could. This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .	2020-12-15 14:13:50 -05:00
Kris Hicks	0cf9cae656	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Kris Hicks	0a3a748053	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Kris Hicks	93155ba3da	Add gocritic to golangci-lint config (#9556 )	2020-12-08 12:47:04 -08:00
Tim Gross	d286d941dc	docker: kill signal API should include timeout context When the Docker driver kills as task, we send a request via the Docker API for dockerd to fire the signal. We send that signal and then block for the `kill_timeout` waiting for the container to exit. But if the Docker API blocks, we will block indefinitely because we haven't configured the API call with the same timeout. This changeset is a minimal intervention to add the timeout to the Docker API call _only_ when we have the `kill_timeout` set. Future work should examine whether we should be threading contexts through other `go-dockerclient` API calls.	2020-12-02 16:51:57 -05:00
Nick Ethier	c9bd7e89ca	command: use correct port mapping syntax in examples	2020-11-23 10:25:30 -06:00
Shishir Mahajan	572c398187	Fix review comments.	2020-11-11 12:30:00 -08:00
Shishir Mahajan	9192100d4e	Fix circleci.	2020-11-11 12:30:00 -08:00
Shishir Mahajan	c30fea5cd3	Add cpuset_cpus to docker driver.	2020-11-11 12:30:00 -08:00
Tim Gross	0ef0b17b82	docker: disallow volume mounts from host by default (#9321 ) The default behavior for `docker.volumes.enabled` is intended to be `false`, but the HCL schema defaults to `true` if the value is unset. Set the default literal value to `true`. Additionally, Docker driver mounts of type "volume" (but not "bind") are not being properly sandboxed with that setting. Disable Docker mounts with type "volume" entirely whenever the `docker.volumes.enabled` flag is set to false. Note this is unrelated to the `volume_mount` feature, which is constrained to preconfigured host volumes or whatever is mounted by a CSI plugin. This changeset includes updates to unit tests that should have been failing under the documented behavior but were not.	2020-11-11 10:03:46 -05:00
Russell Rollins	538aa90d92	Use Dockerhub Mirror. (#9220 ) Dockerhub is going to rate limit unauthenticated pulls. Use our HashiCorp internal mirror for builds run through CircleCI. Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2020-11-02 09:28:02 -05:00
Tim Gross	f9e659164f	docker: image_delay default missing without gc stanza (#9101 ) In the Docker driver plugin config for garbage collection, the `image_delay` field was missing from the default we set if the entire `gc` stanza is missing. This results in a default of 0s and immediate GC of Docker images. Expanded docker gc config test fields.	2020-10-15 12:36:01 -04:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Seth Hoenig	a8869bd304	docs: document docker signal fix, add tests This PR adds a version specific upgrade note about the docker stop signal behavior. Also adds test for the signal logic in docker driver. Closes #8932 which was fixed in #8933	2020-10-02 10:06:43 -05:00
Seth Hoenig	6d9a6786e5	Merge pull request #8933 from jf/fix_docker_stopsignal drivers/docker/driver.go: change default signal for docker driver to SIGTERM?	2020-09-29 10:51:04 -05:00
Seth Hoenig	fd2a31a331	drivers/docker: detect arch for default infra_image The 'docker.config.infra_image' would default to an amd64 container. It is possible to reference the correct image for a platform using the `runtime.GOARCH` variable, eliminating the need to explicitly set the `infra_image` on non-amd64 platforms. Also upgrade to Google's pause container version 3.1 from 3.0, which includes some enhancements around process management. Fixes #8926	2020-09-23 13:54:30 -05:00
Jeffrey 'jf' Lim	b84d63c4ba	drivers/docker/driver.go: change default signal for docker driver to SIGTERM?	2020-09-20 03:09:07 +08:00
Nick Ethier	1849a20b66	docker: use Nomad managed resolv.conf when DNS options are set (#8600 )	2020-08-17 10:22:08 -04:00
James Rasell	dab8282be5	Merge pull request #8589 from hashicorp/f-gh-5718 driver/docker: allow configurable pull context timeout setting.	2020-08-14 16:07:59 +02:00
James Rasell	bc42cd2e5e	driver/docker: allow configurable pull context timeout setting. Pulling large docker containers can take longer than the default context timeout. Without a way to change this it is very hard for users to utilise Nomad properly without hacky work arounds. This change adds an optional pull_timeout config parameter which gives operators the possibility to account for increase pull times where needed. The infra docker image also has the option to set a custom timeout to keep consistency.	2020-08-12 08:58:07 +01:00
Nick Ethier	e39574be59	docker: support group allocated ports and host_networks (#8623 ) * docker: support group allocated ports * docker: add new ports driver config to specify which group ports are mapped * docker: update port mapping docs	2020-08-11 18:30:22 -04:00
Drew Bailey	27b8cadcc4	removes nvidia import from docker test (#8312 )	2020-06-30 09:34:59 -04:00
Shishir Mahajan	182e68ca7a	Add notes.	2020-06-25 13:46:45 -07:00
Shishir Mahajan	0bc2c835fe	Remove dead tests.	2020-06-25 13:22:46 -07:00
Mahmood Ali	5796719124	docker: disable host volume binding by default	2020-06-23 13:43:37 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Niam Jen Wei	d2de515f0c	Fix docker driver MemorySwap value Fixes an incorrect value being assigned to MemorySwap when `memory_hard_limit` flag is being used. Issue raised in https://github.com/hashicorp/nomad/issues/8153	2020-06-12 20:11:28 +01:00
Seth Hoenig	a792c64f57	driver/docker: add integration test around setting memory_hard_limit	2020-06-01 12:00:47 -05:00
Seth Hoenig	675f50b502	driver/docker: use pointer parameter on driver because locks	2020-06-01 09:35:17 -05:00
Seth Hoenig	ad91ba865c	driver/docker: enable setting hard/soft memory limits Fixes #2093 Enable configuring `memory_hard_limit` in the docker config stanza for tasks. If set, this field will be passed to the container runtime as `--memory`, and the `memory` configuration from the task resource configuration will be passed as `--memory_reservation`, creating hard and soft memory limits for tasks using the docker task driver.	2020-06-01 09:22:45 -05:00
Mahmood Ali	8ef1b85ce9	don't GC images in tests by default	2020-05-26 21:24:55 -04:00
Mahmood Ali	d9543a1a80	tests: don't delete images after tests complete Fix some docker test flakiness where image cleanup process may contaminate other tests. A clean up process may attempt to delete an image while it's used by another test.	2020-05-26 18:53:24 -04:00
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	aa8927abb4	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Mahmood Ali	34b22047b7	Use an image managed by nomad account This is a retag of stefanscherer/busybox-windows@sha256:af396324c4c62e369a388ebb38d4efd44211dc7c95a438e6feb62b4ae4194c5b	2020-05-15 12:55:22 -04:00
Mahmood Ali	766104c7a7	Use a pinned tag of stefanscherer/busybox-windows	2020-05-15 12:20:37 -04:00
Michele	0150fc4c54	Move appveyor tests to circle	2020-05-15 10:15:37 -04:00
Mahmood Ali	9721fd22f9	docker: Fix docker image gc tracking This fixes a bug where docker images may not be GCed. The cause of the bug is that we track the task using `task.ID+task.Name` on task start but remove on plain `task.ID`. This haromize the two paths by using `task.ID`, as it's unique enough and it's also used in the `loadImage` path (path when loading an image from a local tarball instead of dockerhub).	2020-05-13 12:33:17 -04:00
Mahmood Ali	04a3cfbeff	Merge pull request #7932 from hashicorp/f-docker-custom-runtimes Docker runtimes	2020-05-12 11:59:36 -04:00
Mahmood Ali	9f95a50129	update tests	2020-05-12 11:39:09 -04:00
Mahmood Ali	182b95f7b1	use allow_runtimes for consistency Other allow lists use allow_ prefix (e.g. allow_caps, allow_privileged).	2020-05-12 11:03:08 -04:00
Mahmood Ali	54565e3836	Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-05-12 10:56:47 -04:00
Mahmood Ali	06c672cbf2	more tests	2020-05-12 10:14:54 -04:00
Mahmood Ali	0d692f0931	Add a knob to restrict docker runtimes	2020-05-12 10:14:43 -04:00
Juan Larriba	a0df437c62	Run Linux Images (LCOW) and Windows Containers side by side (#7850 ) Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers	2020-05-04 13:08:47 -04:00
Mahmood Ali	dff071c3b9	driver/docker: protect against nil container Protect against a panic when we attempt to start a container with a name that conflicts with an existing one. If the existing one is being deleted while nomad first attempts to create the container, the createContainer will fail with `container already exists`, but we get nil container reference from the `containerByName` lookup, and cause a crash. I'm not certain how we get into the state, except for being very unlucky. I suspect that this case may be the result of a concurrent restart or the docker engine API not being fully consistent (e.g. an earlier call purged the container, but docker didn't free up resources yet to create a new container with the same name immediately yet). If that's the case, then re-attempting creation will hopefully succeed, or we'd at least fail enough times for the alloc to be rescheduled to another node.	2020-04-19 15:34:45 -04:00
Ben Buzbee	769a3cd8b3	Rename OCIRuntime to Runtime; allow gpu conflicts is they are the same runtime; add conflict test	2020-04-03 12:15:11 -07:00
Ben Buzbee	d4f26d1eee	Support custom docker runtimes This enables customers who want to use gvisor and have it configured on their clients.	2020-04-03 11:07:37 -07:00
Mahmood Ali	db4c263180	Merge pull request #7554 from benbuzbee/benbuz/fix-seccomp-file Parse security_opts before sending them to docker daemon	2020-03-31 11:54:17 -04:00
Ben Buzbee	4f6ea87ec4	Parse security_opts before sending them to docker daemon Fixes #6720 Copy the parsing function from the docker CLI. Docker daemon expects to see JSON for seccomp file not a path.	2020-03-31 08:34:41 -07:00
Mahmood Ali	7225055e80	Merge pull request #7550 from hashicorp/vendor-fsouza-go-docker-client-20200330 Vendor fsouza/go-docker-client update	2020-03-31 08:46:30 -04:00
Mahmood Ali	452a057a8c	driver/docker: fix memory swapping MemorySwappiness can only be set in non-Windows options: https://ci.appveyor.com/project/hashicorp/nomad/builds/31832149 Also fixes https://github.com/hashicorp/nomad/issues/6085	2020-03-30 16:51:16 -04:00
Mahmood Ali	4b6aee24bd	Merge pull request #7508 from greut/docker-drain-timer docker: drain fingerprint timer	2020-03-30 16:37:53 -04:00
Yoan Blanc	c9f6cf385a	Update drivers/docker/fingerprint.go Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>	2020-03-30 22:11:42 +02:00
Mahmood Ali	8f57f78087	vendors: update fsouza/go-docker-client to v.1.6.3	2020-03-30 15:10:53 -04:00
Mahmood Ali	65d2fb5e32	Merge pull request #7531 from greut/docker-v19.03.8 Docker v19.03.8	2020-03-30 14:45:10 -04:00
Mahmood Ali	254fcd6c06	tests: attempt to deflake TestDockerDriver_PidsLimit This is an attemp to deflake TestDockerDriver_PidsLimit by having one more process and ensuring they run for longer.	2020-03-30 07:06:52 -04:00
Mahmood Ali	887292d757	Resolve docker types conflict Looks like the latest `github.com/docker/docker/registry.ResolveAuthConfig` expect `github.com/docker/docker/api/types.AuthConfig` rather than `github.com/docker/cli/cli/config/types.AuthConfig`. The two types are identical but live in different packages. Here, we embed `registry.ResolveAuthConfig` from upstream repo, but with the signature we need.	2020-03-28 17:29:06 +01:00
Yoan Blanc	1d92edbbbe	docker: v19.03.8 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-28 17:29:04 +01:00
Mahmood Ali	6283a44870	Merge pull request #7257 from bbckr/avoid-resolving-dot-in-named-pipe Avoid resolving dotted segments when host path for volume is named pipe	2020-03-26 16:59:29 -04:00
Yoan Blanc	139a0ae451	fixup! docker: drain fingerprint timer Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-26 16:02:20 +01:00
Yoan Blanc	5f0b3234f0	docker: drain fingerprint timer Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-26 16:00:53 +01:00
Mahmood Ali	fd5d033e32	Revert "vendor: fsouza/go-docker-client v1.6.3"	2020-03-23 10:48:47 -04:00
Yoan Blanc	ed8dcccb54	docker: disable swap in Windows only Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-23 08:35:09 +01:00
Yoan Blanc	d9ea68e807	fixup! fixup! vendor: fsouza/go-docker-client v1.6.3 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-22 10:04:52 +01:00
Yoan Blanc	8e744d1877	vendor: fsouza/go-docker-client v1.6.3 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-22 09:25:46 +01:00
Yoan Blanc	c8e69a0427	docker: v18.09.9 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-15 08:23:11 +01:00
bckr	977e7ac8b3	Remove argument passing runtime GOOS	2020-03-03 15:39:43 +01:00
bckr	86a5ff9cb9	Fix too many arguments	2020-03-03 15:38:38 +01:00
Mahmood Ali	88cfe504a0	update grpc Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.	2020-03-03 08:39:54 -05:00
bckr	fe6da3df88	Avoid resolving dotted segments when host path for volume is named pipe	2020-03-03 14:00:19 +01:00
John Schlederer	8b35c75206	Making pull activity timeout configurable in Docker * Making pull activity timeout configurable in Docker plugin config, first pass * Fixing broken function call * Fixing broken tests * Fixing linter suggestion * Adding documentation on new parameter in Docker plugin config * Adding unit test * Setting min value for pull_activity_timeout, making pull activity duration a private var	2019-12-18 12:58:53 +01:00
Mahmood Ali	4a1cc67f58	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	46bc3b57e6	address review comments	2019-12-13 11:21:00 -05:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Nick Ethier	729dd9018c	docker: set default cpu cfs period (#6737 ) * docker: set default cpu cfs period Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-11-19 19:05:15 -05:00
Tim Gross	b1b20cd479	remove misleading networking log line (#6588 ) When a job has a task group network, this log line ends up being misleading if you're trying to debug networking issues. We really only care about this when there's no port map set, in which case we get the error returned anyways.	2019-10-30 13:23:33 -04:00
Mahmood Ali	fe14993582	docs: Docker driver supports task user option Also, add a test case.	2019-10-24 14:00:37 -04:00
Mahmood Ali	977b86f924	driver/docker: ensure that defaults are populated Looks like we may need to pass default literal at each layer to be able, so defaults are set properly.	2019-10-18 18:27:28 -04:00

1 2 3 4 5 ...

364 commits