open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	eabbcebdd4	exec: allow running commands from host volume (#14851 ) The exec driver and other drivers derived from the shared executor check the path of the command before handing off to libcontainer to ensure that the command doesn't escape the sandbox. But we don't check any host volume mounts, which should be safe to use as a source for executables if we're letting the user mount them to the container in the first place. Check the mount config to verify the executable lives in the mount's host path, but then return an absolute path within the mount's task path so that we can hand that off to libcontainer to run. Includes a good bit of refactoring here because the anchoring of the final task path has different code paths for inside the task dir vs inside a mount. But I've fleshed out the test coverage of this a good bit to ensure we haven't created any regressions in the process.	2022-11-11 09:51:15 -05:00
Seth Hoenig	bc09a2e114	deps: update opencontainers/runc to v1.1.3	2022-08-04 12:56:49 -05:00
Eng Zer Jun	97d1bc735c	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
Seth Hoenig	113b7eb727	client: cgroups v2 code review followup	2022-03-24 13:40:42 -05:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Michael Schurter	fd68bbc342	test: update tests to properly use AllocDir Also use t.TempDir when possible.	2021-10-19 10:49:07 -07:00
Michael Schurter	10c3bad652	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Mahmood Ali	0ac126fa78	drivers/exec: Don't inherit Nomad oom_score_adj value (#10698 ) Explicitly set the `oom_score_adj` value for `exec` and `java` tasks. We recommend that the Nomad service to have oom_score_adj of a low value (e.g. -1000) to avoid having nomad agent OOM Killed if the node is oversubscriped. However, Nomad's workloads should not inherit Nomad's process, which is the default behavior. Fixes #10663	2021-06-03 14:15:50 -04:00
Seth Hoenig	5b8a32f23d	drivers/exec: enable setting allow_caps on exec driver This PR enables setting allow_caps on the exec driver plugin configuration, as well as cap_add and cap_drop in exec task configuration. These options replicate the functionality already present in the docker task driver. Important: this change also reduces the default set of capabilities enabled by the exec driver to match the default set enabled by the docker driver. Until v1.0.5 the exec task driver would enable all capabilities supported by the operating system. v1.0.5 removed NET_RAW from that list of default capabilities, but left may others which could potentially also be leveraged by compromised tasks. Important: the "root" user is still special cased when used with the exec driver. Older versions of Nomad enabled enabled all capabilities supported by the operating system for tasks set with the root user. To maintain compatibility with existing clusters we continue supporting this "feature", however we maintain support for the legacy set of capabilities rather than enabling all capabilities now supported on modern operating systems.	2021-05-17 12:37:40 -06:00
Seth Hoenig	1e75f99839	drivers/docker+exec+java: disable net_raw capability by default The default Linux Capabilities set enabled by the docker, exec, and java task drivers includes CAP_NET_RAW (for making ping just work), which has the side affect of opening an ARP DoS/MiTM attack between tasks using bridge networking on the same host network. https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities This PR disables CAP_NET_RAW for the docker, exec, and java task drivers. The previous behavior can be restored for docker using the allow_caps docker plugin configuration option. A future version of nomad will enable similar configurability for the exec and java task drivers.	2021-05-12 13:22:09 -07:00
zhsj	5a182e1d03	deps: update runc to v1.0.0-rc93 includes updates for breaking changes in runc v1.0.0-rc93	2021-03-31 10:57:02 -04:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Seth Hoenig	8ee9835923	drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior This PR adds pid_mode and ipc_mode options to the exec and java task driver config options. By default these will defer to the default_pid_mode and default_ipc_mode agent plugin options created in #9969. Setting these values to "host" mode disables isolation for the task. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9970	2021-02-08 14:26:35 -06:00
Seth Hoenig	4bc6e5a215	drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior. This PR adds default_pid_mode and default_ipc_mode options to the exec and java task drivers. By default these will default to "private" mode, enabling PID and IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9969	2021-02-05 15:52:11 -06:00
Kris Hicks	f5527aea48	Backfill unit test for NEWIPC	2021-01-28 12:03:19 +00:00
Kris Hicks	a5298ea4ba	Add unit test for container namespacing	2021-01-28 12:03:19 +00:00
Kris Hicks	c13f75d9e1	Always check that resource constraints were applied	2021-01-28 12:03:19 +00:00
Mahmood Ali	a89da9982d	raw_exec: don't use cgroups when no_cgroup is set (#9328 ) When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup. Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task. Fixes https://github.com/hashicorp/nomad/issues/8565	2020-11-11 16:20:34 -05:00
Mahmood Ali	cd060db42a	tests: ignore empty cgroup My latest Vagrant box contains an empty cgroup name that isn't used for isolation: ``` $ cat /proc/self/cgroup \| grep :: 0::/user.slice/user-1000.slice/session-17.scope ```	2020-10-01 10:23:13 -04:00
Shengjing Zhu	7a4f48795d	Adjust cgroup change in libcontainer	2020-08-20 00:31:07 +08:00
Mahmood Ali	739e5e8811	drivers/exec: test all cgroups are destroyed	2019-12-11 11:12:29 -05:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Mahmood Ali	5734c8a648	update comment	2019-06-11 13:00:26 -04:00
Mahmood Ali	f7608c4cef	exec: use an independent name=systemd cgroup path We aim for containers to be part of a new cgroups hierarchy independent from nomad agent. However, we've been setting a relative path as libcontainer `cfg.Cgroups.Path`, which makes libcontainer concatinate the executor process cgroup with passed cgroup, as set in [1]. By setting an absolute path, we ensure that all cgroups subsystem (including `name=systemd` get a dedicated one). This matches behavior in Nomad 0.8, and behavior of how Docker and OCI sets CgroupsPath[2] Fixes #5736 [1] `d7edf9b2e4/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/apply_raw.go (L326-L340)` [2] `238f8eaa31/vendor/github.com/containerd/containerd/oci/spec.go (L229)`	2019-06-10 22:00:12 -04:00
Mahmood Ali	cb554a015f	Fix test comparisons	2019-05-24 21:38:22 -05:00
Mahmood Ali	99637c8bbc	Test for expected capabilities specifically	2019-05-24 16:07:05 -05:00
Mahmood Ali	7455c746aa	use /bin/bash	2019-05-24 14:50:23 -04:00
Mahmood Ali	00081b15d6	fix	2019-05-20 15:30:07 -04:00
Lang Martin	3ae276cfd2	executor_linux_test call lookupTaskBin with an ExecCommand	2019-05-08 10:01:51 -04:00
Lang Martin	87585e950d	move lookupTaskBin to executor_linux, for os dependency clarity	2019-05-07 16:58:27 -04:00
Lang Martin	de807a410a	driver_test leave cat in the test, but add cat to the chroot	2019-05-07 16:14:01 -04:00
Lang Martin	1619d3e3cb	executor_linux_test test PATH lookup inside the container	2019-05-03 16:21:58 -04:00
Lang Martin	47b9fc3d26	executor_linux_test new TestExecutor_EscapeContainer	2019-05-03 14:38:42 -04:00
Mahmood Ali	dac2cd3df3	Add test cases for waiting on children Also, make the test use files just like in the non-test case.	2019-04-01 13:24:18 -04:00
Mahmood Ali	df5d7ba50d	fix test setup	2019-03-26 09:15:22 -04:00
Mahmood Ali	9369b123de	use drivers.FSIsolation	2019-01-08 09:11:47 -05:00
Alex Dadgar	327b551b39	Drivers	2018-12-18 15:50:11 -08:00
Nick Ethier	09dadf0a23	Merge branch 'master' into f-grpc-executor * master: (71 commits) Fix output of 'nomad deployment fail' with no arg Always create a running allocation when testing task state tests: ensure exec tests pass valid task resources (#4992) some changes for more idiomatic code fix iops related tests fixed bug in loop delay gofmt improved code for readability client: updateAlloc release lock after read fixup! device attributes in `nomad node status -verbose` drivers/exec: support device binds and mounts fix iops bug and increase test matrix coverage tests: tag image explicitly changelog ci: install lxc-templates explicitly tests: skip checking rdma cgroup ci: use Ubuntu 16.04 (Xenial) in TravisCI client: update driver info on new fingerprint drivers/docker: enforce volumes.enabled (#4983) client: Style: use fluent style for building loggers ...	2018-12-13 14:41:09 -05:00
Mahmood Ali	74bd0be6ea	drivers/exec: support device binds and mounts	2018-12-11 18:35:21 -05:00
Mahmood Ali	69b2355274	Merge pull request #4975 from hashicorp/fix-master-20181209 Some test fixes and remedies	2018-12-11 18:00:21 -05:00
Mahmood Ali	5a487ac884	tests: prevent indefinite blocking in some tests Noticed few places where tests seem to block indefinitely and panic after the test run reaches the test package timeout. I intend to follow up with the proper fix later, but timing out is much better than indefinitely blocking.	2018-12-11 09:35:26 -05:00
Nick Ethier	19a695308f	executor: fix tests	2018-12-06 21:39:53 -05:00
Nick Ethier	71353a88d4	executor: remove structs package	2018-12-06 20:54:14 -05:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Nick Ethier	57ffece7f8	executor: update test references	2018-12-05 11:07:48 -05:00
Nick Ethier	8b20de4801	executor: use grpc instead of netrpc as plugin protocol * Added protobuf spec for executor * Seperated executor structs into their own package	2018-12-05 11:03:56 -05:00
Danielle Tomlinson	2db5ae38d8	client: Rename drivers/shared/env => client/taskenv	2018-11-30 12:18:39 +01:00
Danielle Tomlinson	0544a57abe	drivers: Move client/drivers/executor to drivers/shared/executor	2018-11-30 10:46:13 +01:00

49 Commits