open-nomad

Author	SHA1	Message	Date
Mahmood Ali	d5e136b82b	executor: set CpuWeight in cgroup-v2 (#11287 ) Cgroup-v2 uses `cpu.weight` property instead of cpu shares: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files . And it uses a different range (i.e. `[1, 10000]`) from cpu.shares (i.e. `[2, 262144]`) to make things more interesting. Luckily, the libcontainer provides a helper function to perform the conversion [`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value). I have confirmed that docker/libcontainer performs the conversion as well in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541 , and that CpuShares is ignored by libcontainer in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29 .	2021-10-14 08:46:07 -04:00
Mahmood Ali	48aa6e26e9	executor: suppress spurious log messages (#11273 ) Suppress stats streaming error log messages when task finishes. Streaming errors are expected when a task finishes and they aren't actionable to users. Also, note that the task runner Stats hook retries collecting stats after a delay. If the connection terminates prematurely, it will be retried, and closing the stats stream is not very disruptive. Ideally, executor terminates cleanly when task exits, but that's a more substantial change that may require changing the executor/drivers interface. Fixes #10814	2021-10-06 12:42:35 -04:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
James Rasell	050b5408c7	drivers: remove duplicate import statements.	2021-06-11 09:38:09 +02:00
Mahmood Ali	0ac126fa78	drivers/exec: Don't inherit Nomad oom_score_adj value (#10698 ) Explicitly set the `oom_score_adj` value for `exec` and `java` tasks. We recommend that the Nomad service to have oom_score_adj of a low value (e.g. -1000) to avoid having nomad agent OOM Killed if the node is oversubscriped. However, Nomad's workloads should not inherit Nomad's process, which is the default behavior. Fixes #10663	2021-06-03 14:15:50 -04:00
Seth Hoenig	fe9258b754	drivers/exec: pass capabilities through executor RPC Add capabilities to the LaunchRequest proto so that the capabilities set actually gets plumbed all the way through to task launch.	2021-05-17 12:37:40 -06:00
Seth Hoenig	5b8a32f23d	drivers/exec: enable setting allow_caps on exec driver This PR enables setting allow_caps on the exec driver plugin configuration, as well as cap_add and cap_drop in exec task configuration. These options replicate the functionality already present in the docker task driver. Important: this change also reduces the default set of capabilities enabled by the exec driver to match the default set enabled by the docker driver. Until v1.0.5 the exec task driver would enable all capabilities supported by the operating system. v1.0.5 removed NET_RAW from that list of default capabilities, but left may others which could potentially also be leveraged by compromised tasks. Important: the "root" user is still special cased when used with the exec driver. Older versions of Nomad enabled enabled all capabilities supported by the operating system for tasks set with the root user. To maintain compatibility with existing clusters we continue supporting this "feature", however we maintain support for the legacy set of capabilities rather than enabling all capabilities now supported on modern operating systems.	2021-05-17 12:37:40 -06:00
Seth Hoenig	1e75f99839	drivers/docker+exec+java: disable net_raw capability by default The default Linux Capabilities set enabled by the docker, exec, and java task drivers includes CAP_NET_RAW (for making ping just work), which has the side affect of opening an ARP DoS/MiTM attack between tasks using bridge networking on the same host network. https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities This PR disables CAP_NET_RAW for the docker, exec, and java task drivers. The previous behavior can be restored for docker using the allow_caps docker plugin configuration option. A future version of nomad will enable similar configurability for the exec and java task drivers.	2021-05-12 13:22:09 -07:00
Nick Ethier	b34db8b3b6	nit: code cleanup/organization	2021-04-16 15:14:29 -04:00
Nick Ethier	fe283c5a8f	executor: add support for cpuset cgroup	2021-04-15 10:24:31 -04:00
Yoan Blanc	ac0d5d8bd3	chore: bump golangci-lint from v1.24 to v1.39 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2021-04-03 09:50:23 +02:00
Mahmood Ali	edec658e50	drivers/exec: Account for cgroup-v2 memory stats If the host is running with cgroup-v2, RSS and Max Usage doesn't get reported anymore.	2021-04-01 12:13:21 -04:00
zhsj	5a182e1d03	deps: update runc to v1.0.0-rc93 includes updates for breaking changes in runc v1.0.0-rc93	2021-03-31 10:57:02 -04:00
Mahmood Ali	bf1c0dcf17	driver/exec: set soft memory limit Linux offers soft memory limit: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html#soft-limits , and https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=memory.low . We can set soft memory limits through libcontainer `Resources.MemoryReservation`: https://pkg.go.dev/github.com/opencontainers/runc@v0.1.1/libcontainer/configs#Resources	2021-03-30 16:55:58 -04:00
Mahmood Ali	f44a04454d	oversubscription: driver/exec to honor MemoryMaxMB	2021-03-30 16:55:58 -04:00
Tim Gross	f820021f9e	deps: bump gopsutil to v3.21.2	2021-03-30 16:02:51 -04:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Seth Hoenig	8ee9835923	drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior This PR adds pid_mode and ipc_mode options to the exec and java task driver config options. By default these will defer to the default_pid_mode and default_ipc_mode agent plugin options created in #9969. Setting these values to "host" mode disables isolation for the task. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9970	2021-02-08 14:26:35 -06:00
Seth Hoenig	152534fe21	docs: fixup comments, var names	2021-02-08 10:58:44 -06:00
Seth Hoenig	4bc6e5a215	drivers/exec+java: Add configuration to restore previous PID/IPC namespace behavior. This PR adds default_pid_mode and default_ipc_mode options to the exec and java task drivers. By default these will default to "private" mode, enabling PID and IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9969	2021-02-05 15:52:11 -06:00
Chris Baker	ce68ee164b	Version 1.0.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/ X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg= =2g3c -----END PGP SIGNATURE----- Merge tag 'v1.0.3' into post-release-1.0.3 Version 1.0.3	2021-01-29 19:30:08 +00:00
Kris Hicks	f5527aea48	Backfill unit test for NEWIPC	2021-01-28 12:03:19 +00:00
Chris Baker	ac1b9655ce	put exec process in a new IPC namespace	2021-01-28 12:03:19 +00:00
Kris Hicks	a5298ea4ba	Add unit test for container namespacing	2021-01-28 12:03:19 +00:00
Kris Hicks	c13f75d9e1	Always check that resource constraints were applied	2021-01-28 12:03:19 +00:00
Kris Hicks	87188f04de	Add PID namespacing and e2e test	2021-01-28 12:03:19 +00:00
Kris Hicks	8a8b95a119	executor_linux: Remove unreachable PATH= code (#9778 ) This has to have been unused because the HasPrefix operation is backwards, meaning a Command.Env that includes PATH= never would have worked; the default path was always used.	2021-01-15 11:19:09 -08:00
Kris Hicks	0cf9cae656	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Kris Hicks	93155ba3da	Add gocritic to golangci-lint config (#9556 )	2020-12-08 12:47:04 -08:00
Mahmood Ali	98c02851c8	use comment ignores (#9448 ) Use targetted ignore comments for the cases where we are bound by backward compatibility. I've left some file based linters, especially when the file is riddled with linter voilations (e.g. enum names), or if it's a property of the file (e.g. package and file names). I encountered an odd behavior related to RPC_REQUEST_RESPONSE_UNIQUE and RPC_REQUEST_STANDARD_NAME. Apparently, if they target a `stream` type, we must separate them into separate lines so that the ignore comment targets the type specifically.	2020-11-25 16:03:01 -05:00
Mahmood Ali	b2a8752c5f	honor task user when execing into raw_exec task (#9439 ) Fix #9210 . This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.	2020-11-25 09:34:10 -05:00
Mahmood Ali	a89da9982d	raw_exec: don't use cgroups when no_cgroup is set (#9328 ) When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup. Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task. Fixes https://github.com/hashicorp/nomad/issues/8565	2020-11-11 16:20:34 -05:00
Mahmood Ali	2d4634bcc3	Merge pull request #9304 from hashicorp/b-legacy-executors-are-executors Legacy executors are executors after all	2020-11-10 12:54:03 -05:00
Kris Hicks	9d03cf4c5f	protos: Update .proto files not to use Go package name (#9301 ) Previously, it was required that you `go get github.com/hashicorp/nomad` to be able to build protos, as the protoc invocation added an include directive that pointed to `$GOPATH/src`, which is how dependent protos were discovered. As Nomad now uses Go modules, it won't necessarily be cloned to `$GOPATH`. (Additionally, if you _had_ go-gotten Nomad at some point, protoc compilation would have possibly used the _wrong_ protos, as those wouldn't necessarily be the most up-to-date ones.) This change modifies the proto files and the `protoc` invocation to handle discovering dependent protos via protoc plugin modifier statements that are specific to the protoc plugin being used. In this change, `make proto` was run to recompile the protos, which results in changes only to the gzipped `FileDescriptorProto`.	2020-11-10 08:42:35 -08:00
Mahmood Ali	ac185b41e2	Legacy executors are executors after all This fixes a bug where pre-0.9 executors fail to recover after an upgrade. The bug is that legacyExecutorWrappers didn't get updated with ExecStreaming function, and thus failed to implement the Executor function. Sadly, this meant that all recovery attempts fail, as the runtime check in `b312aacbc9/drivers/shared/executor/utils.go (L103-L110)` .	2020-11-10 10:20:07 -05:00
Mahmood Ali	f4b0aa0c1c	tests: copy permissions when copying files On the failover path, copy the permission bits (a.k.a. file mode), specially the execution bit.	2020-10-01 10:23:14 -04:00
Mahmood Ali	cd060db42a	tests: ignore empty cgroup My latest Vagrant box contains an empty cgroup name that isn't used for isolation: ``` $ cat /proc/self/cgroup \| grep :: 0::/user.slice/user-1000.slice/session-17.scope ```	2020-10-01 10:23:13 -04:00
Mahmood Ali	91376cccf2	tests: failover to copying when symlinking fails Symlinking busybox may fail when the test code and the test temporary directory live on different volumes/partitions; so we should copy instead. This situation arises in the Vagrant setup, where the code repository live on special file sharing volume. Somewhat unrelated, remove `f.Sync()` invocation from a test copyFile helper function. Sync is useful only for crash recovery, and isn't necessary in our test setup. The sync invocation is a significant overhead as it requires the OS to flush any cached writes to disk.	2020-09-30 09:58:22 -04:00
Mahmood Ali	d4f385d6e1	Upgrade to golang 1.15 (#8858 ) Upgrade to golang 1.15 Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 . This commit lifts the fix in https://github.com/creack/pty/pull/97 .	2020-09-09 15:59:29 -04:00
Shengjing Zhu	7a4f48795d	Adjust cgroup change in libcontainer	2020-08-20 00:31:07 +08:00
Mahmood Ali	1fcc7970e4	tests: ensure that test is long enough to configure cgroups	2020-05-31 10:42:06 -04:00
Mahmood Ali	88cfe504a0	update grpc Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.	2020-03-03 08:39:54 -05:00
Thomas Lefebvre	84baa950ce	client: support no_pivot_root in exec driver configuration	2020-02-18 09:27:16 -08:00
Mahmood Ali	ac80d62c84	Pass stats interval colleciton to executor This fixes a bug where executor based drivers emit stats every second, regardless of user configuration. When serializing the Stats request across grpc, the nomad agent dropped the Interval value, and then executor uses 1s as a default value.	2020-01-31 14:17:15 -05:00
Mahmood Ali	d80ae6765b	simplify cgroup path lookup	2019-12-11 12:43:25 -05:00
Mahmood Ali	94ab62dfb4	executor: stop joining executor to container cgroup Stop joining libcontainer executor process into the newly created task container cgroup, to ensure that the cgroups are fully destroyed on shutdown, and to make it consistent with other plugin processes. Previously, executor process is added to the container cgroup so the executor process resources get aggregated along with user processes in our metric aggregation. However, adding executor process to container cgroup adds some complications with much benefits: First, it complicates cleanup. We must ensure that the executor is removed from container cgroup on shutdown. Though, we had a bug where we missed removing it from the systemd cgroup. Because executor uses `containerState.CgroupPaths` on launch, which includes systemd, but `cgroups.GetAllSubsystems` which doesn't. Second, it may have advese side-effects. When a user process is cpu bound or uses too much memory, executor should remain functioning without risk of being killed (by OOM killer) or throttled. Third, it is inconsistent with other drivers and plugins. Logmon and DockerLogger processes aren't in the task cgroups. Neither are containerd processes, though it is equivalent to executor in responsibility. Fourth, in my experience when executor process moves cgroup while it's running, the cgroup aggregation is odd. The cgroup `memory.usage_in_bytes` doesn't seem to capture the full memory usage of the executor process and becomes a red-harring when investigating memory issues. For all the reasons above, I opted to have executor remain in nomad agent cgroup and we can revisit this when we have a better story for plugin process cgroup management.	2019-12-11 11:28:09 -05:00
Mahmood Ali	739e5e8811	drivers/exec: test all cgroups are destroyed	2019-12-11 11:12:29 -05:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Nick Ethier	8b881d83d5	executor: rename wrapNetns to withNetworkIsolation	2019-09-30 21:38:31 -04:00

1 2 3 4

180 commits