open-nomad

Author	SHA1	Message	Date
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
Shishir	65eab35412	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
Shishir Mahajan	d4daef7ebf	Add support for --init to docker driver. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-15 12:53:25 -07:00
Seth Hoenig	e365652e81	drivers: fixup linux version dependent test cases The error output being checked depends on the linux caps supported by the particular operating system. Fix these test cases to just check that an error did occur.	2021-05-17 12:37:40 -06:00
Seth Hoenig	5b8a32f23d	drivers/exec: enable setting allow_caps on exec driver This PR enables setting allow_caps on the exec driver plugin configuration, as well as cap_add and cap_drop in exec task configuration. These options replicate the functionality already present in the docker task driver. Important: this change also reduces the default set of capabilities enabled by the exec driver to match the default set enabled by the docker driver. Until v1.0.5 the exec task driver would enable all capabilities supported by the operating system. v1.0.5 removed NET_RAW from that list of default capabilities, but left may others which could potentially also be leveraged by compromised tasks. Important: the "root" user is still special cased when used with the exec driver. Older versions of Nomad enabled enabled all capabilities supported by the operating system for tasks set with the root user. To maintain compatibility with existing clusters we continue supporting this "feature", however we maintain support for the legacy set of capabilities rather than enabling all capabilities now supported on modern operating systems.	2021-05-17 12:37:40 -06:00
Seth Hoenig	1e75f99839	drivers/docker+exec+java: disable net_raw capability by default The default Linux Capabilities set enabled by the docker, exec, and java task drivers includes CAP_NET_RAW (for making ping just work), which has the side affect of opening an ARP DoS/MiTM attack between tasks using bridge networking on the same host network. https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities This PR disables CAP_NET_RAW for the docker, exec, and java task drivers. The previous behavior can be restored for docker using the allow_caps docker plugin configuration option. A future version of nomad will enable similar configurability for the exec and java task drivers.	2021-05-12 13:22:09 -07:00
Florian Apolloner	a0873d5da4	docker: support configuring default log driver in plugin options	2021-03-12 16:04:33 -05:00
Adrian Todorov	47e1cb11df	driver/docker: add extra labels ( job name, task and task group name)	2021-03-08 08:59:52 -05:00
Tim Gross	987cdb3a69	prefer TrimPrefix to checking HasPrefix first	2021-01-22 13:41:28 -05:00
Huan Wang	ba8b2297b1	fix the inconsistency handling between infra image and normal task image	2021-01-22 13:41:28 -05:00
Mahmood Ali	de954da350	docker: introduce a new hcl2-friendly `mount` syntax (#9635 ) Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label: ```hcl config { image = "..." mount { type = "..." target = "target-path" volume_options { ... } } } ``` The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 . The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could. This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .	2020-12-15 14:13:50 -05:00
Shishir Mahajan	c30fea5cd3	Add cpuset_cpus to docker driver.	2020-11-11 12:30:00 -08:00
Tim Gross	0ef0b17b82	docker: disallow volume mounts from host by default (#9321 ) The default behavior for `docker.volumes.enabled` is intended to be `false`, but the HCL schema defaults to `true` if the value is unset. Set the default literal value to `true`. Additionally, Docker driver mounts of type "volume" (but not "bind") are not being properly sandboxed with that setting. Disable Docker mounts with type "volume" entirely whenever the `docker.volumes.enabled` flag is set to false. Note this is unrelated to the `volume_mount` feature, which is constrained to preconfigured host volumes or whatever is mounted by a CSI plugin. This changeset includes updates to unit tests that should have been failing under the documented behavior but were not.	2020-11-11 10:03:46 -05:00
Tim Gross	f9e659164f	docker: image_delay default missing without gc stanza (#9101 ) In the Docker driver plugin config for garbage collection, the `image_delay` field was missing from the default we set if the entire `gc` stanza is missing. This results in a default of 0s and immediate GC of Docker images. Expanded docker gc config test fields.	2020-10-15 12:36:01 -04:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Seth Hoenig	fd2a31a331	drivers/docker: detect arch for default infra_image The 'docker.config.infra_image' would default to an amd64 container. It is possible to reference the correct image for a platform using the `runtime.GOARCH` variable, eliminating the need to explicitly set the `infra_image` on non-amd64 platforms. Also upgrade to Google's pause container version 3.1 from 3.0, which includes some enhancements around process management. Fixes #8926	2020-09-23 13:54:30 -05:00
James Rasell	dab8282be5	Merge pull request #8589 from hashicorp/f-gh-5718 driver/docker: allow configurable pull context timeout setting.	2020-08-14 16:07:59 +02:00
James Rasell	bc42cd2e5e	driver/docker: allow configurable pull context timeout setting. Pulling large docker containers can take longer than the default context timeout. Without a way to change this it is very hard for users to utilise Nomad properly without hacky work arounds. This change adds an optional pull_timeout config parameter which gives operators the possibility to account for increase pull times where needed. The infra docker image also has the option to set a custom timeout to keep consistency.	2020-08-12 08:58:07 +01:00
Nick Ethier	e39574be59	docker: support group allocated ports and host_networks (#8623 ) * docker: support group allocated ports * docker: add new ports driver config to specify which group ports are mapped * docker: update port mapping docs	2020-08-11 18:30:22 -04:00
Mahmood Ali	5796719124	docker: disable host volume binding by default	2020-06-23 13:43:37 -04:00
Seth Hoenig	ad91ba865c	driver/docker: enable setting hard/soft memory limits Fixes #2093 Enable configuring `memory_hard_limit` in the docker config stanza for tasks. If set, this field will be passed to the container runtime as `--memory`, and the `memory` configuration from the task resource configuration will be passed as `--memory_reservation`, creating hard and soft memory limits for tasks using the docker task driver.	2020-06-01 09:22:45 -05:00
Mahmood Ali	d9543a1a80	tests: don't delete images after tests complete Fix some docker test flakiness where image cleanup process may contaminate other tests. A clean up process may attempt to delete an image while it's used by another test.	2020-05-26 18:53:24 -04:00
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	aa8927abb4	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Mahmood Ali	182b95f7b1	use allow_runtimes for consistency Other allow lists use allow_ prefix (e.g. allow_caps, allow_privileged).	2020-05-12 11:03:08 -04:00
Mahmood Ali	0d692f0931	Add a knob to restrict docker runtimes	2020-05-12 10:14:43 -04:00
Ben Buzbee	769a3cd8b3	Rename OCIRuntime to Runtime; allow gpu conflicts is they are the same runtime; add conflict test	2020-04-03 12:15:11 -07:00
Ben Buzbee	d4f26d1eee	Support custom docker runtimes This enables customers who want to use gvisor and have it configured on their clients.	2020-04-03 11:07:37 -07:00
John Schlederer	8b35c75206	Making pull activity timeout configurable in Docker * Making pull activity timeout configurable in Docker plugin config, first pass * Fixing broken function call * Fixing broken tests * Fixing linter suggestion * Adding documentation on new parameter in Docker plugin config * Adding unit test * Setting min value for pull_activity_timeout, making pull activity duration a private var	2019-12-18 12:58:53 +01:00
Mahmood Ali	46bc3b57e6	address review comments	2019-12-13 11:21:00 -05:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Nick Ethier	729dd9018c	docker: set default cpu cfs period (#6737 ) * docker: set default cpu cfs period Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-11-19 19:05:15 -05:00
Mahmood Ali	977b86f924	driver/docker: ensure that defaults are populated Looks like we may need to pass default literal at each layer to be able, so defaults are set properly.	2019-10-18 18:27:28 -04:00
Mahmood Ali	3aec7b56ea	Only start reconciler once in main driver driver.SetConfig is not appropriate for starting up reconciler goroutine. Some ephemeral driver instances are created for validating config and we ought not to side-effecting goroutines for those. We currently lack a lifecycle hook to inject these, so I picked the `Fingerprinter` function for now, and reconciler should only run after fingerprinter started. Use `sync.Once` to ensure that we only start reconciler loop once.	2019-10-18 14:43:23 -04:00
Mahmood Ali	8739cc2a62	refactor reconciler code and address comments	2019-10-17 09:42:23 -04:00
Mahmood Ali	aa59280edc	docker: periodically reconcile containers When running at scale, it's possible that Docker Engine starts containers successfully but gets wedged in a way where API call fails. The Docker Engine may remain unavailable for arbitrary long time. Here, we introduce a periodic reconcilation process that ensures that any container started by nomad is tracked, and killed if is running unexpectedly. Basically, the periodic job inspects any container that isn't tracked in its handlers. A creation grace period is used to prevent killing newly created containers that aren't registered yet. Also, we aim to avoid killing unrelated containters started by host or through raw_exec drivers. The logic is to pattern against containers environment variables and mounts to infer if they are an alloc docker container. Lastly, the periodic job can be disabled to avoid any interference if need be.	2019-10-17 08:36:01 -04:00
Danielle Lancashire	724586ba1d	docker: Fix driver spec hclspec.NewLiteral does not quote its values, which caused `3m` to be parsed as a nonsensical literal which broke the plugin loader during initialization. By quoting the value here, it starts correctly.	2019-09-03 08:53:37 +02:00
Zhiguang Wang	832df1091b	Add default value "3m" to image_delay, making it consistent with docs.	2019-09-02 16:40:00 +08:00
Nick Ethier	1dae42ab81	docker: allow configuration of infra image	2019-07-31 01:04:07 -04:00
Nick Ethier	1fc5f86a7c	docker: support shared network namespaces	2019-07-31 01:03:20 -04:00
Mahmood Ali	104869c0e1	drivers/docker: rename logging `type` to `driver` Docker uses the term logging `driver` in its public documentations: in `docker` daemon config[1], `docker run` arguments [2] and in docker compose file[3]. Interestingly, docker used `type` in its API [4] instead of everywhere else. It's unfortunate that Nomad used `type` modeling after the Docker API rather than the user facing documents. Nomad using `type` feels very non-user friendly as it's disconnected from how Docker markets the flag and shows internal representation instead. Here, we rectify the situation by introducing `driver` field and prefering it over `type` in logging. [1] https://docs.docker.com/config/containers/logging/configure/ [2] https://docs.docker.com/engine/reference/run/#logging-drivers---log-driver [3] https://docs.docker.com/compose/compose-file/#logging [4] https://docs.docker.com/engine/api/v1.39/#operation/ContainerCreate	2019-02-28 16:04:03 -05:00
Mahmood Ali	4def8529db	driver/docker: use BlockAttrs for storage_opts storage_opts is a new field in 0.9 cycle and doesn't have backward compatibility constraints.	2019-02-19 20:35:28 -05:00
Mahmood Ali	46cd3c3f55	drivers: restore port_map old json support This ensures that `port_map` along with other block like attribute declarations (e.g. ulimit, labels, etc) can handle various hcl and json syntax that was supported in 0.8. In 0.8.7, the following declarations are effectively equivalent: ``` // hcl block port_map { http = 80 https = 443 } // hcl assignment port_map = { http = 80 https = 443 } // json single element array of map (default in API response) {"port_map": [{"http": 80, "https": 443}]} // json array of individual maps (supported accidentally iiuc) {"port_map: [{"http": 80}, {"https": 443}]} ``` We achieve compatbility by using `NewAttr("...", "list(map(string))", false)` to be serialized to a `map[string]string` wrapper, instead of using `BlockAttrs` declaration. The wrapper merges the list of maps automatically, to ease driver development. This approach is closer to how v0.8.7 implemented the fields [1][2], and despite its verbosity, seems to perserve 0.8.7 behavior in hcl2. This is only required for built-in types that have backward compatibility constraints. External drivers should use `BlockAttrs` instead, as they see fit. [1] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L216 [2] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L698-L700	2019-02-16 11:37:33 -05:00
Mahmood Ali	f7102cd01d	tests: add hcl task driver config parsing tests (#5314 ) * drivers: add config parsing tests Add basic tests for parsing and encoding task config. * drivers/docker: fix some config declarations * refactor and document config parse helpers	2019-02-12 14:46:37 -05:00
Michael Schurter	3b84e08fa4	Merge pull request #5297 from hashicorp/b-docker-logging Docker: Fix logging config parsing	2019-02-11 06:57:52 -08:00
Gertjan Roggemans	94ca78354b	docker: Fix volume driver_config options spec (#5309 ) Fixes #5308	2019-02-11 09:18:44 -05:00
Michael Schurter	e1e4b10884	docker: fix logging config parsing Fixes https://groups.google.com/d/topic/nomad-tool/B3Uo6Kns2BI/discussion	2019-02-04 11:07:57 -08:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00

1 2

67 commits