open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	bae42fad7c	exec: fix exec handler test Fixup this test to handle cgroups v2, as well as the :misc: cgroup	2022-04-06 12:11:37 -05:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Seth Hoenig	3ce4f52740	Merge pull request #12446 from shoenig/no-pkg-err cleanup: purge github.com/pkg/errors	2022-04-04 09:22:44 -05:00
James Rasell	19281bb2fe	Merge pull request #12304 from th0m/tlefebvre/fix-wrong-drivernetworkmanager-interface fix: update incorrect DriverNetworkManager interface implementation	2022-04-04 11:29:22 +02:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Seth Hoenig	174a7532a1	tests: create fresh harness for each docker dns test Not actually sure this fixes the flaky tests, but seems like it could be related.	2022-03-31 08:17:34 -05:00
Seth Hoenig	113b7eb727	client: cgroups v2 code review followup	2022-03-24 13:40:42 -05:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
James Rasell	1a4db3523d	Merge branch 'main' into tlefebvre/fix-wrong-drivernetworkmanager-interface	2022-03-17 09:38:13 +01:00
Thomas Lefebvre	c7fbf1089c	fix: update incorrect DriverNetworkManager interface implementation in plugins/drivers/client.go and drivers/mock/driver.go And add assertions to catch drifts at compilation time.	2022-03-15 11:51:01 -07:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Seth Hoenig	db2347a86c	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Tim Gross	1dad0e597e	fix integer bounds checks (#11815 ) * driver: fix integer conversion error The shared executor incorrectly parsed the user's group into int32 and then cast to uint32 without bounds checking. This is harmless because an out-of-bounds gid will throw an error later, but it triggers security and code quality scans. Parse directly to uint32 so that we get correct error handling. * helper: fix integer conversion error The autopilot flags helper incorrectly parses a uint64 to a uint which is machine specific size. Although we don't have 32-bit builds, this sets off security and code quality scaans. Parse to the machine sized uint. * driver: restrict bounds of port map The plugin server doesn't constrain the maximum integer for port maps. This could result in a user-visible misconfiguration, but it also triggers security and code quality scans. Restrict the bounds before casting to int32 and return an error. * cpuset: restrict upper bounds of cpuset values Our cpuset configuration expects values in the range of uint16 to match the expectations set by the kernel, but we don't constrain the values before downcasting. An underflow could lead to allocations failing on the client rather than being caught earlier. This also make security and code quality scanners happy. * http: fix integer downcast for per_page parameter The parser for the `per_page` query parameter downcasts to int32 without bounds checking. This could result in underflow and nonsensical paging, but there's no server-side consequences for this. Fixing this will silence some security and code quality scanners though.	2022-01-25 11:16:48 -05:00
Seth Hoenig	0030424384	Merge pull request #11889 from hashicorp/build-update-circle build: upgrade circleci configuration	2022-01-24 10:18:21 -06:00
Seth Hoenig	2f0cfb5740	build: upgrade and speedup circleci configuration This PR upgrades our CI images and fixes some affected tests. - upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02) - eliminate go-machine-recent-image (no longer necessary) - manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174) - fix tcp dial error check (message seems to be OS specific) - spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2) - use safe MkdirTemp for generating tmpfiles NOT applied: (too flakey) - eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting) - upgrade resource type for all imanges to large (2C -> 4C)	2022-01-24 08:28:14 -06:00
Seth Hoenig	f2a71fd0d9	deps: pty has new home github.com/kr/pty was moved to github.com/creack/pty Swap this dependency so we can upgrade to the latest version and no longer need a replace directive.	2022-01-19 12:33:05 -06:00
Seth Hoenig	4650e97d29	deps: upgrade docker and runc This PR upgrades - docker dependency to the latest tagged release (v20.10.12) - runc dependency to the latest tagged release (v1.0.3) Docker does not abide by [semver](https://github.com/moby/moby/issues/39302), so it is marked +incompatible, and transitive dependencies are upgrade manually. Runc made three relevant breaking changes * cgroup manager .Set changed to accept Resources instead of Cgroup `3f65946756` * config.Device moved to devices.Device https://github.com/opencontainers/runc/pull/2679 * mountinfo.Mounted now returns an error if the specified path does not exist https://github.com/moby/sys/blob/mountinfo/v0.5.0/mountinfo/mountinfo.go#L16	2022-01-18 08:35:26 -06:00
Tim Gross	73d0779858	drivers: set world-readable permissions on copied resolv.conf (#11856 ) When we copy the system DNS to a task's `resolv.conf`, we should set the permissions as world-readable so that unprivileged users within the task can read it.	2022-01-14 12:25:23 -05:00
Alessandro De Blasis	e647549ecf	metrics: added `mapped_file` metric (#11500 ) Signed-off-by: Alessandro De Blasis <alex@deblasis.net> Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>	2022-01-10 15:35:19 -05:00
Shishir	65eab35412	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
James Rasell	45f4689f9c	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	fc1d4814d9	qemu: add `args_allowlist` to sandbox VM command line inputs The QEMU driver allows arbitrary command line options, but many of these options give access to host resources that operators may not want to expose such as devices. Add an optional allowlist to the plugin configuration so that operators can limit the resources for QEMU.	2021-11-19 11:11:52 -05:00
Michael Schurter	ef3fc79225	Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir client: never embed alloc_dir in chroot	2021-11-03 16:48:09 -07:00
Michael Schurter	fd68bbc342	test: update tests to properly use AllocDir Also use t.TempDir when possible.	2021-10-19 10:49:07 -07:00
Michael Schurter	10c3bad652	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Shishir Mahajan	d4daef7ebf	Add support for --init to docker driver. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2021-10-15 12:53:25 -07:00
Mahmood Ali	d5e136b82b	executor: set CpuWeight in cgroup-v2 (#11287 ) Cgroup-v2 uses `cpu.weight` property instead of cpu shares: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files . And it uses a different range (i.e. `[1, 10000]`) from cpu.shares (i.e. `[2, 262144]`) to make things more interesting. Luckily, the libcontainer provides a helper function to perform the conversion [`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value). I have confirmed that docker/libcontainer performs the conversion as well in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541 , and that CpuShares is ignored by libcontainer in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29 .	2021-10-14 08:46:07 -04:00
Mahmood Ali	48aa6e26e9	executor: suppress spurious log messages (#11273 ) Suppress stats streaming error log messages when task finishes. Streaming errors are expected when a task finishes and they aren't actionable to users. Also, note that the task runner Stats hook retries collecting stats after a delay. If the connection terminates prematurely, it will be retried, and closing the stats stream is not very disruptive. Ideally, executor terminates cleanly when task exits, but that's a more substantial change that may require changing the executor/drivers interface. Fixes #10814	2021-10-06 12:42:35 -04:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	0e926ef3fd	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
Timothé Perez	ce877bdf7c	fix: load token in docker auth config	2021-07-22 22:27:29 +02:00
Tim Gross	db96e40f3a	docker: move host path for hosts file mount to alloc dir (#10823 ) In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network namespace, rather than using the default hosts file from the pause container. This hosts file should be shared between tasks in the same allocation so that tasks can update the file and have the results propagated between tasks.	2021-06-30 11:10:04 -04:00
Tim Gross	7bd61bbf43	docker: generate /etc/hosts file for bridge network mode (#10766 ) When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.	2021-06-16 14:55:22 -04:00
Seth Hoenig	8f493cfa89	client/fingerprint/java: improve java version string regex matching This PR improves the regular expression used for matching the java version string, which varies a lot depending on the java vendor and version. These are the example strings we now test for: java version "1.7.0_80" openjdk version "11.0.1" 2018-10-16 openjdk version "11.0.1" 2018-10-16 java version "1.6.0_36" openjdk version "1.8.0_192" openjdk 11.0.11 2021-04-20 LTS The last one is a new test added on behalf of #6081, which is still broken on today's CentOS 7 default JDK package. openjdk 11.0.11 2021-04-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing) ==> Evaluation "21c6caf7" finished with status "complete" but failed to place all allocations: Task Group "example" (failed to place 1 allocation): * Constraint "${driver.java.version} >= 11.0.0": 1 nodes excluded by filter Evaluation "2b737d48" waiting for additional capacity to place remainder Fixes #6081	2021-06-15 14:15:01 -05:00
James Rasell	939b23936a	Merge pull request #10744 from hashicorp/b-remove-duplicate-imports chore: remove duplicate import statements	2021-06-11 16:42:34 +02:00
James Rasell	050b5408c7	drivers: remove duplicate import statements.	2021-06-11 09:38:09 +02:00
Mahmood Ali	0976af471c	driver/docker: ignore cpuset errors for short-lived tasks follow up (#10730 ) minor refactor and changelog	2021-06-09 11:00:39 -04:00
Mahmood Ali	c2026dfa28	Merge pull request #10416 from hashicorp/b-cores-docker driver/docker: ignore error if container exists before cgroup can be written	2021-06-09 10:34:02 -04:00
Mahmood Ali	0ac126fa78	drivers/exec: Don't inherit Nomad oom_score_adj value (#10698 ) Explicitly set the `oom_score_adj` value for `exec` and `java` tasks. We recommend that the Nomad service to have oom_score_adj of a low value (e.g. -1000) to avoid having nomad agent OOM Killed if the node is oversubscriped. However, Nomad's workloads should not inherit Nomad's process, which is the default behavior. Fixes #10663	2021-06-03 14:15:50 -04:00
Seth Hoenig	fe9258b754	drivers/exec: pass capabilities through executor RPC Add capabilities to the LaunchRequest proto so that the capabilities set actually gets plumbed all the way through to task launch.	2021-05-17 12:37:40 -06:00
Seth Hoenig	e365652e81	drivers: fixup linux version dependent test cases The error output being checked depends on the linux caps supported by the particular operating system. Fix these test cases to just check that an error did occur.	2021-05-17 12:37:40 -06:00
Seth Hoenig	f64baec276	docs: update docs for linux capabilities in exec/java/docker drivers Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver pages. Also update upgrade guide with guidance on new default linux capabilities for exec and java drivers.	2021-05-17 12:37:40 -06:00
Seth Hoenig	87c96eed11	drivers/docker: reuse capabilities plumbing in docker driver This changeset does not introduce any functional change for the docker driver, but rather cleans up the implementation around computing configured capabilities by re-using code written for the exec/java task drivers.	2021-05-17 12:37:40 -06:00
Seth Hoenig	2361a91938	drivers/java: enable setting allow_caps on java driver Enable setting allow_caps on the java task driver plugin, along with the associated cap_add and cap_drop options in java task configuration.	2021-05-17 12:37:40 -06:00
Seth Hoenig	5b8a32f23d	drivers/exec: enable setting allow_caps on exec driver This PR enables setting allow_caps on the exec driver plugin configuration, as well as cap_add and cap_drop in exec task configuration. These options replicate the functionality already present in the docker task driver. Important: this change also reduces the default set of capabilities enabled by the exec driver to match the default set enabled by the docker driver. Until v1.0.5 the exec task driver would enable all capabilities supported by the operating system. v1.0.5 removed NET_RAW from that list of default capabilities, but left may others which could potentially also be leveraged by compromised tasks. Important: the "root" user is still special cased when used with the exec driver. Older versions of Nomad enabled enabled all capabilities supported by the operating system for tasks set with the root user. To maintain compatibility with existing clusters we continue supporting this "feature", however we maintain support for the legacy set of capabilities rather than enabling all capabilities now supported on modern operating systems.	2021-05-17 12:37:40 -06:00
Seth Hoenig	1e75f99839	drivers/docker+exec+java: disable net_raw capability by default The default Linux Capabilities set enabled by the docker, exec, and java task drivers includes CAP_NET_RAW (for making ping just work), which has the side affect of opening an ARP DoS/MiTM attack between tasks using bridge networking on the same host network. https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities This PR disables CAP_NET_RAW for the docker, exec, and java task drivers. The previous behavior can be restored for docker using the allow_caps docker plugin configuration option. A future version of nomad will enable similar configurability for the exec and java task drivers.	2021-05-12 13:22:09 -07:00
Isabel Suchanek	ed9e12cdc7	Clean up docker driver test to make it less flaky (#10559 ) Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2021-05-10 14:58:19 -07:00
Isabel Suchanek	b5a2f48c78	Fix test panic in docker driver test	2021-05-07 12:12:33 -07:00
Isabel Suchanek	cb4fc53353	drivers/docker: add support for STOPSIGNAL This fixes a bug where Nomad overrides a Dockerfile's STOPSIGNAL with the default kill_signal (SIGTERM). This adds a check for kill_signal. If it's not set, it calls StopContainer instead of Signal, which uses STOPSIGNAL if it's specified. If both kill_signal and STOPSIGNAL are set, Nomad tries to stop the container with kill_signal first, before then calling StopContainer. Fixes #9989	2021-05-05 10:27:58 -07:00

1 2 3 4 5 ...

689 Commits