open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	94ab62dfb4	executor: stop joining executor to container cgroup Stop joining libcontainer executor process into the newly created task container cgroup, to ensure that the cgroups are fully destroyed on shutdown, and to make it consistent with other plugin processes. Previously, executor process is added to the container cgroup so the executor process resources get aggregated along with user processes in our metric aggregation. However, adding executor process to container cgroup adds some complications with much benefits: First, it complicates cleanup. We must ensure that the executor is removed from container cgroup on shutdown. Though, we had a bug where we missed removing it from the systemd cgroup. Because executor uses `containerState.CgroupPaths` on launch, which includes systemd, but `cgroups.GetAllSubsystems` which doesn't. Second, it may have advese side-effects. When a user process is cpu bound or uses too much memory, executor should remain functioning without risk of being killed (by OOM killer) or throttled. Third, it is inconsistent with other drivers and plugins. Logmon and DockerLogger processes aren't in the task cgroups. Neither are containerd processes, though it is equivalent to executor in responsibility. Fourth, in my experience when executor process moves cgroup while it's running, the cgroup aggregation is odd. The cgroup `memory.usage_in_bytes` doesn't seem to capture the full memory usage of the executor process and becomes a red-harring when investigating memory issues. For all the reasons above, I opted to have executor remain in nomad agent cgroup and we can revisit this when we have a better story for plugin process cgroup management.	2019-12-11 11:28:09 -05:00
Mahmood Ali	739e5e8811	drivers/exec: test all cgroups are destroyed	2019-12-11 11:12:29 -05:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Nick Ethier	6fd773eb88	executor: run exec commands in netns if set	2019-09-30 11:50:22 -04:00
Nick Ethier	533b2850fc	executor: cleanup netns handling in executor	2019-07-31 01:04:05 -04:00
Nick Ethier	971c8c9c2b	Driver networking support Adds support for passing network isolation config into drivers and implements support in the rawexec driver as a proof of concept	2019-07-31 01:03:20 -04:00
Mahmood Ali	f7608c4cef	exec: use an independent name=systemd cgroup path We aim for containers to be part of a new cgroups hierarchy independent from nomad agent. However, we've been setting a relative path as libcontainer `cfg.Cgroups.Path`, which makes libcontainer concatinate the executor process cgroup with passed cgroup, as set in [1]. By setting an absolute path, we ensure that all cgroups subsystem (including `name=systemd` get a dedicated one). This matches behavior in Nomad 0.8, and behavior of how Docker and OCI sets CgroupsPath[2] Fixes #5736 [1] `d7edf9b2e4/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/apply_raw.go (L326-L340)` [2] `238f8eaa31/vendor/github.com/containerd/containerd/oci/spec.go (L229)`	2019-06-10 22:00:12 -04:00
Mahmood Ali	68813def56	special case root capabilities	2019-05-24 14:10:10 -04:00
Mahmood Ali	807e7b90e0	drivers/exec: Restore 0.8 capabilities Nomad 0.9 incidentally set effective capabilities that is higher than what's expected of a `nobody` process, and what's set in 0.8. This change restores the capabilities to ones used in Nomad 0.9.	2019-05-20 13:11:29 -04:00
Lang Martin	0256cf700d	Merge pull request #5649 from hashicorp/b-lookup-exe-chroot lookup executables inside chroot	2019-05-17 15:07:41 -04:00
Mahmood Ali	b4df061fef	use pty/tty terminology similar to github.com/kr/pty	2019-05-10 19:17:14 -04:00
Mahmood Ali	3055fd53df	executors: implement streaming exec Implements streamign exec handling in both executors (i.e. universal and libcontainer). For creation of TTY, some incidental complexity leaked in. The universal executor uses github.com/kr/pty for creation of TTYs. On the other hand, libcontainer expects a console socket and for libcontainer to create the underlying console object on process start. The caller can then use `libcontainer.utils.RecvFd()` to get tty master end. I chose github.com/kr/pty for managing TTYs here. I tried `github.com/containerd/console` package (which is already imported), but the package did not work as expected on macOS.	2019-05-10 19:17:14 -04:00
Lang Martin	99359d7fbe	executor_linux only do path resolution in the taskDir, not local split out lookPathIn to show it's similarity to exec.LookPath	2019-05-10 11:33:35 -04:00
Lang Martin	743a2a2875	executor_linux pass the command to lookupTaskBin to get path	2019-05-08 10:01:20 -04:00
Lang Martin	8db3fe047c	executor/* Launch log at top of Launch is more explicit, trace	2019-05-07 17:01:05 -04:00
Lang Martin	87585e950d	move lookupTaskBin to executor_linux, for os dependency clarity	2019-05-07 16:58:27 -04:00
Lang Martin	22e99e41c1	executor and executor_linux debug launch prep and process start	2019-05-03 14:42:57 -04:00
Lang Martin	88ce590dac	executor_linux call new lookupTaskBin	2019-05-03 11:55:19 -04:00
Mahmood Ali	4322055301	comment what refer to	2019-04-19 09:49:04 -04:00
Mahmood Ali	18993421f2	Move libcontainer helper to executor package	2019-04-19 09:49:04 -04:00
Michael Schurter	47bed4316f	executor/linux: comment this bizarre code	2019-04-02 11:25:45 -07:00
Michael Schurter	1d569a27dc	Revert "executor/linux: add defensive checks to binary path" This reverts commit cb36f4537e63d53b198c2a87d1e03880895631bd.	2019-04-02 11:17:12 -07:00
Michael Schurter	fc5487dbbc	executor/linux: add defensive checks to binary path	2019-04-02 09:40:53 -07:00
Michael Schurter	7d49bc4c71	executor/linux: make chroot binary paths absolute Avoid libcontainer.Process trying to lookup the binary via $PATH as the executor has already found where the binary is located.	2019-04-01 15:45:31 -07:00
Mahmood Ali	2a7b18aec4	Revert "executor: synchronize exitState accesses" (#5449 ) Reverts hashicorp/nomad#5433 Apparently, channel communications can constitute Happens-Before even for proximate variables, so this syncing isn't necessary. > _The closing of a channel happens before a receive that returns a zero value because the channel is closed._ https://golang.org/ref/mem#tmp_7	2019-03-20 07:33:05 -04:00
Nick Ethier	505e36ff7a	Merge pull request #5429 from hashicorp/b-blocking-executor-shutdown executor: block shutdown on process exiting	2019-03-19 15:18:01 -04:00
Mahmood Ali	a1776dba34	executor: synchronize exitState accesses exitState is set in `wait()` goroutine but accessed in a different `Wait()` goroutine, so accesses must be synchronized by a lock.	2019-03-17 11:56:58 -04:00
Nick Ethier	7418d09cf0	executor: block shutdown on process exiting	2019-03-15 23:50:17 -04:00
Iskander (Alex) Sharipov	e69909fbd3	drivers/shared/executor: fix strings.Replace call strings.Replace call with n=0 argument makes no sense as it will do nothing. Probably -1 is intended. Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>	2019-03-02 00:33:17 +03:00
Mahmood Ali	a394cd63f4	CVE-2019-5736: Update libcontainer depedencies (#5334 ) * CVE-2019-5736: Update libcontainer depedencies Libcontainer is vulnerable to a runc container breakout, that was reported as CVE-2019-5736[1]. Upgrading vendored libcontainer with the fix. The runc changes are captured in `369b920277` . [1] https://seclists.org/oss-sec/2019/q1/119	2019-02-19 20:21:18 -05:00
Mahmood Ali	5df63fda7c	Merge pull request #5190 from hashicorp/f-memory-usage Track Basic Memory Usage as reported by cgroups	2019-01-18 16:46:02 -05:00
Alex Dadgar	471fdb3ccf	Merge pull request #5173 from hashicorp/b-log-levels Plugins use parent loggers	2019-01-14 16:14:30 -08:00
Mahmood Ali	9909d98bee	Track Basic Memory Usage as reported by cgroups Track current memory usage, `memory.usage_in_bytes`, in addition to `memory.max_memory_usage_in_bytes` and friends. This number is closer what Docker reports. Related to https://github.com/hashicorp/nomad/issues/5165 .	2019-01-14 18:47:52 -05:00
Nick Ethier	9fea54e0dc	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00
Alex Dadgar	14ed757a56	Plugins use parent loggers This PR fixes various instances of plugins being launched without using the parent loggers. This meant that logs would not all go to the same output, break formatting etc.	2019-01-11 11:36:37 -08:00
Mahmood Ali	90f3cea187	Merge pull request #5157 from hashicorp/r-drivers-no-cstructs drivers: avoid referencing client/structs package	2019-01-09 13:06:46 -05:00
Mahmood Ali	d19b92edec	executor: add a comment detailing isolation	2019-01-08 12:10:26 -05:00
Mahmood Ali	64f80343fc	drivers: re-export ResourceUsage structs Re-export the ResourceUsage structs in drivers package to avoid drivers directly depending on the internal client/structs package directly. I attempted moving the structs to drivers, but that caused some import cycles that was a bit hard to disentagle. Alternatively, I added an alias here that's sufficient for our purposes of avoiding external drivers depend on internal packages, while allowing us to restructure packages in future without breaking source compatibility.	2019-01-08 09:11:47 -05:00
Mahmood Ali	8797a4f0ea	drivers/exec: restrict devices exposed to tasks We ultimately decided to provide a limited set of devices in exec/java drivers instead of all of host ones. Pre-0.9, we made all host devices available to exec tasks accidentally, yet most applications only use a small subset, and this choice limits our ability to restrict/isolate GPU and other devices. Starting with 0.9, by default, we only provide the same subset of devices Docker provides, and allow users to provide more devices as needed on case-by-case basis. This reverts commit 5805c64a9f1c3b409693493dfa30e7136b9f547b. This reverts commit ff9a4a17e59388dcab067949e0664f645b2f5bcf.	2019-01-06 17:03:19 -05:00
Mahmood Ali	56e3171310	driver/exec: use dedicated /dev mount (#5147 ) Use a dedicated /dev mount so we can inject more devices if necessary, and avoid allowing a container to contaminate host /dev. Follow up to https://github.com/hashicorp/nomad/pull/5143 - and fixes master.	2019-01-04 10:36:05 -05:00
Mahmood Ali	5b0702c9eb	drivers/exec: bind mount /dev into rootfs Restores pre-0.9 behavior, where Nomad makes /dev available to exec task. Switching to libcontainer, we accidentally made only a small subset available. Here, we err on the side of preserving behavior of 0.8, instead of going for the sensible route, where only a reasonable subset of devices is mounted by default and user can opt to request more.	2019-01-03 14:29:18 -05:00
Alex Dadgar	b8268d9a46	Lint	2018-12-18 15:50:44 -08:00
Alex Dadgar	327b551b39	Drivers	2018-12-18 15:50:11 -08:00
Nick Ethier	09dadf0a23	Merge branch 'master' into f-grpc-executor * master: (71 commits) Fix output of 'nomad deployment fail' with no arg Always create a running allocation when testing task state tests: ensure exec tests pass valid task resources (#4992) some changes for more idiomatic code fix iops related tests fixed bug in loop delay gofmt improved code for readability client: updateAlloc release lock after read fixup! device attributes in `nomad node status -verbose` drivers/exec: support device binds and mounts fix iops bug and increase test matrix coverage tests: tag image explicitly changelog ci: install lxc-templates explicitly tests: skip checking rdma cgroup ci: use Ubuntu 16.04 (Xenial) in TravisCI client: update driver info on new fingerprint drivers/docker: enforce volumes.enabled (#4983) client: Style: use fluent style for building loggers ...	2018-12-13 14:41:09 -05:00
Mahmood Ali	74bd0be6ea	drivers/exec: support device binds and mounts	2018-12-11 18:35:21 -05:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	021d3720b5	Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill executor: kill all container processes	2018-12-08 09:52:42 -05:00
Nick Ethier	86e9c11ec2	executor: don't drop errors when configuring libcontainer cfg, add nil check on resources	2018-12-07 14:03:42 -05:00
Nick Ethier	2283cb2c39	executor: use drivers.Resources as resource model	2018-12-06 21:22:02 -05:00
Nick Ethier	71353a88d4	executor: remove structs package	2018-12-06 20:54:14 -05:00

1 2

58 Commits