open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	76284a09a0	docker: move pause container recovery to after `SetConfig` (#16713 ) When we added recovery of pause containers in #16352 we called the recovery function from the plugin factory function. But in our plugin setup protocol, a plugin isn't ready for use until we call `SetConfig`. This meant that recovering pause containers was always done with the default config. Setting up the Docker client only happens once, so setting the wrong config in the recovery function also means that all other Docker API calls will use the default config. Move the `recoveryPauseContainers` call into the `SetConfig`. Fix the error handling so that we return any error but also don't log when the context is canceled, which happens twice during normal startup as we fingerprint the driver.	2023-03-29 16:20:37 -04:00
Elvis Pranskevichus	11a9bb6ce7	drivers/exec: Fix handling of capabilities for unprivileged tasks (#16643 ) Currently, the `exec` driver is only setting the Bounding set, which is not sufficient to actually enable the requisite capabilities for the task process. In order for the capabilities to survive `execve` performed by libcontainer, the `Permitted`, `Inheritable`, and `Ambient` sets must also be set. Per CAPABILITIES (7): > Ambient: This is a set of capabilities that are preserved across an > execve(2) of a program that is not privileged. The ambient capability > set obeys the invariant that no capability can ever be ambient if it > is not both permitted and inheritable.	2023-03-28 12:12:55 -04:00
Seth Hoenig	87f4b71df0	client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips (#16672 ) * client/fingerprint: correctly fingerprint E/P cores of Apple Silicon chips This PR adds detection of asymetric core types (Power & Efficiency) (P/E) when running on M1/M2 Apple Silicon CPUs. This functionality is provided by shoenig/go-m1cpu which makes use of the Apple IOKit framework to read undocumented registers containing CPU performance data. Currently working on getting that functionality merged upstream into gopsutil, but gopsutil would still not support detecting P vs E cores like this PR does. Also refactors the CPUFingerprinter code to handle the mixed core types, now setting power vs efficiency cpu attributes. For now the scheduler is still unaware of mixed core types - on Apple platforms tasks cannot reserve cores anyway so it doesn't matter, but at least now the total CPU shares available will be correct. Future work should include adding support for detecting P/E cores on the latest and upcoming Intel chips, where computation of total cpu shares is currently incorrect. For that, we should also include updating the scheduler to be core-type aware, so that tasks of resources.cores on Linux platforms can be assigned the correct number of CPU shares for the core type(s) they have been assigned. node attributes before cpu.arch = arm64 cpu.modelname = Apple M2 Pro cpu.numcores = 12 cpu.reservablecores = 0 cpu.totalcompute = 1000 node attributes after cpu.arch = arm64 cpu.frequency.efficiency = 2424 cpu.frequency.power = 3504 cpu.modelname = Apple M2 Pro cpu.numcores.efficiency = 4 cpu.numcores.power = 8 cpu.reservablecores = 0 cpu.totalcompute = 37728 * fingerprint/cpu: follow up cr items	2023-03-28 08:27:58 -05:00
Lance Haig	2332d694bb	deps: Update ioutil library references to os and io respectively for drivers package (#16331 ) * Update ioutil library references to os and io respectively for drivers package No user facing changes so I assume no change log is required * Fix failing tests	2023-03-08 10:31:09 -06:00
Seth Hoenig	835365d2a4	docker: fix bug where network pause containers would be erroneously reconciled (#16352 ) * docker: fix bug where network pause containers would be erroneously gc'd * docker: cl: thread context from driver into pause container restoration	2023-03-07 12:17:32 -06:00
Farbod Ahmadian	629ac58763	tests: add functionality to skip a test if it's not running in CI and not with root user (#16222 )	2023-03-02 13:38:27 -05:00
Seth Hoenig	68894bdc62	docker: disable driver when running as non-root on cgroups v2 hosts (#16063 ) * docker: disable driver when running as non-root on cgroups v2 hosts This PR modifies the docker driver to behave like exec when being run as a non-root user on a host machine with cgroups v2 enabled. Because of how cpu resources are managed by the Nomad client, the nomad agent must be run as root to manage docker-created cgroups. * cl: update cl	2023-02-06 14:09:19 -06:00
Seth Hoenig	139f2c0b0f	docker: set force=true on remove image to handle images referenced by multiple tags (#15962 ) * docker: set force=true on remove image to handle images referenced by multiple tags This PR changes our call of docker client RemoveImage() to RemoveImageExtended with the Force=true option set. This fixes a bug where an image referenced by more than one tag could never be garbage collected by Nomad. The Force option only applies to stopped containers; it does not affect running workloads. * docker: add note about image_delay and multiple tags	2023-01-31 07:53:18 -06:00
Yorick Gersie	d94f22bee2	Ensure infra_image gets proper label used for reconciliation (#15898 ) * Ensure infra_image gets proper label used for reconciliation Currently infra containers are not cleaned up as part of the dangling container cleanup routine. The reason is that Nomad checks if a container is a Nomad owned container by verifying the existence of the: `com.hashicorp.nomad.alloc_id` label. Ensure we set this label on the infra container as well. * fix unit test * changelog: add entry --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-01-30 09:46:45 -06:00
Piotr Kazmierczak	14b53df3b6	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Nick Wales	825af1f62a	docker: add option for Windows isolation modes (#15819 )	2023-01-24 16:31:48 -05:00
Seth Hoenig	2868a45982	docker: configure restart policy for networking pause container (#15732 ) This PR modifies the configuration of the networking pause contaier to include the "unless-stopped" restart policy. The pause container should always be restored into a running state until Nomad itself issues a stop command for the container. This is not a _perfect_ fix for #12216 but it should cover the 99% use case - where a pause container gets accidently stopped / killed for some reason. There is still a possibility where the pause container and main task container are stopped and started in the order where the bad behavior persists, but this is fundamentally unavoidable due to how docker itself abstracts and manages the underlying network namespace referenced by the containers. Closes #12216	2023-01-10 07:50:09 -06:00
Seth Hoenig	7214e21402	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Tim Gross	eabbcebdd4	exec: allow running commands from host volume (#14851 ) The exec driver and other drivers derived from the shared executor check the path of the command before handing off to libcontainer to ensure that the command doesn't escape the sandbox. But we don't check any host volume mounts, which should be safe to use as a source for executables if we're letting the user mount them to the container in the first place. Check the mount config to verify the executable lives in the mount's host path, but then return an absolute path within the mount's task path so that we can hand that off to libcontainer to run. Includes a good bit of refactoring here because the anchoring of the final task path has different code paths for inside the task dir vs inside a mount. But I've fleshed out the test coverage of this a good bit to ensure we haven't created any regressions in the process.	2022-11-11 09:51:15 -05:00
Seth Hoenig	c68ed3b4c8	client: protect user lookups with global lock (#14742 ) * client: protect user lookups with global lock This PR updates Nomad client to always do user lookups while holding a global process lock. This is to prevent concurrency unsafe implementations of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo). * cl: add cl	2022-09-29 09:30:13 -05:00
Michael Schurter	bd4b4b8f66	Data race fixes in tests and a new semgrep rule (#14594 ) * test: don't use loop vars in goroutines fixes a data race in the test * test: copy objects in statestore before mutating fixes data race in test * test: @lgfa29's segmgrep rule for loops/goroutines Found 2 places where we were improperly using loop variables inside goroutines.	2022-09-15 10:35:08 -07:00
James Rasell	4b9bcf94da	chore: remove use of "err" a log line context key for errors. (#14433 ) Log lines which include an error should use the full term "error" as the context key. This provides consistency across the codebase and avoids a Go style which operators might not be aware of.	2022-09-01 15:06:10 +02:00
Tim Gross	cc9b480996	testing: setting env var incompatible with parallel tests (#14405 ) Neither the `os.Setenv` nor `t.Setenv` helper are safe to use in parallel tests because environment variables are process-global. The stdlib panics if you try to do this. Remove the `ci.Parallel()` call from all tests where we're setting environment variables.	2022-08-30 14:49:03 -04:00
Seth Hoenig	0d97a94814	drivers/java: add parsing test case for corretto 17	2022-08-24 09:16:38 -05:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Seth Hoenig	dc761aa7ec	docker: create a docker task config setting for disable built-in healthcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068	2022-08-11 10:33:48 -05:00
Seth Hoenig	eb933ad27d	Merge pull request #14019 from hashicorp/deps-update-runc-to-v1.1.3 deps: update runc to v1.1.3	2022-08-05 07:34:16 -05:00
Seth Hoenig	bc09a2e114	deps: update opencontainers/runc to v1.1.3	2022-08-04 12:56:49 -05:00
Luiz Aoqui	9affe31a0f	qemu: reduce monitor socket path (#13971 ) The QEMU driver can take an optional `graceful_shutdown` configuration which will create a Unix socket to send ACPI shutdown signal to the VM. Unix sockets have a hard length limit and the driver implementation assumed that QEMU versions 2.10.1 were able to handle longer paths. This is not correct, the linked QEMU fix only changed the behaviour from silently truncating longer socket paths to throwing an error. By validating the socket path before starting the QEMU machine we can provide users a more actionable and meaningful error message, and by using a shorter socket file name we leave a bit more room for user-defined values in the path, such as the task name. The maximum length allowed is also platform-dependant, so validation needs to be different for each OS.	2022-08-04 12:10:35 -04:00
Luiz Aoqui	2c0fea64e9	qemu: restore monitor socket path (#14000 ) When a QEMU task is recovered the monitor socket path was not being restored into the task handler, so the `graceful_shutdown` configuration was effectively ignored if the client restarted.	2022-08-04 10:44:08 -04:00
Tim Gross	e093b7d5c1	test: disable docker OOM detection test on cgroups v2 (#13928 ) OOM detection under cgroups v2 is flaky under versions of `containerd` before v1.6.3, but our `containerd` dependency is transitive on `moby/moby`, who have not yet updated. Disable this test for cgroups v2 environments until we can update the dependency chain.	2022-07-28 14:47:06 -04:00
Tim Gross	eb06c25d5f	deps: remove deprecated net/context (#13932 ) The `golang.org/x/net/context` package was merged into the stdlib as of go 1.7. Update the imports to use the identical stdlib version. Clean up import blocks for the impacted files to remove unnecessary package aliasing.	2022-07-28 14:46:56 -04:00
Ted Behling	6a032a54d2	driver/docker: Don't pull InfraImage if it exists (#13265 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 17:44:06 +02:00
Daniel Rossbach	8c52c03c8c	qemu driver: Add option to configure drive_interface (#11864 )	2022-06-10 10:03:51 -04:00
phreakocious	94a78597d2	Add `guest_agent` config option for QEMU driver (#12800 ) Add boolean 'guest_agent' config option for QEMU driver, which will create the socket file for the QEMU Guest Agent in the task dir when enabled.	2022-06-09 09:21:38 -04:00
Derek Strickland	34dea90d7a	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
Seth Hoenig	8af061ffc5	docker: remove dead comment	2022-05-25 09:26:20 -05:00
Seth Hoenig	92685bad63	tests: minor fixes for some docker tests	2022-05-25 08:48:24 -05:00
Seth Hoenig	c6c3ae020d	drivers/docker: do not set cgroup parent in v1 mode This PR fixes a bug where the CgroupParent on the docker HostConfig struct was accidently being set when running in cgroups v1 mode.	2022-05-24 11:22:50 -05:00
Seth Hoenig	65f7abf2f4	cli: update default redis and use nomad service discovery Closes #12927 Closes #12958 This PR updates the version of redis used in our examples from 3.2 to 7. The old version is very not supported anymore, and we should be setting a good example by using a supported version. The long-form example job is now fixed so that the service stanza uses nomad as the service discovery provider, and so now the job runs without a requirement of having Consul running and configured.	2022-05-17 10:24:19 -05:00
Eng Zer Jun	97d1bc735c	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
Seth Hoenig	96ec19788d	cgroups: make sure cgroup still exists after task restart This PR modifies raw_exec and exec to ensure the cgroup for a task they are driving still exists during a task restart. These drivers have the same bug but with different root cause. For raw_exec, we were removing the cgroup in 2 places - the cpuset manager, and in the unix containment implementation (the thing that uses freezer cgroup to clean house). During a task restart, the containment would remove the cgroup, and when the task runner hooks went to start again would block on waiting for the cgroup to exist, which will never happen, because it gets created by the cpuset manager which only runs as an alloc pre-start hook. The fix here is to simply not delete the cgroup in the containment implementation; killing the PIDs is enough. The removal happens in the cpuset manager later anyway. For exec, it's the same idea, except DestroyTask is called on task failure, which in turn calls into libcontainer, which in turn deletes the cgroup. In this case we do not have control over the deletion of the cgroup, so instead we hack the cgroup back into life after the call to DestroyTask. All of this only applies to cgroups v2.	2022-05-05 09:51:03 -05:00
Tim Gross	c763c4cb96	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Tim Gross	140dbab832	docker: back out cgroup v2 OOM detection (#12735 ) When shutting down an allocation that ends up needing to be force-killed, we're getting a spurious "OOM Killed (137)" message on the task termination event. We introduced this as part of cgroups v2 support because the Docker daemon isn't detecting the container status correctly. Although exit code 137 is the exit code we get for OOM-killed processes, that's because OOM kill is a `SIGKILL`. So any sigkilled process will get that exit code.	2022-04-21 12:31:34 -04:00
Seth Hoenig	16cab10346	ci: fix docker logger not supported test This test checks for behavior when asking for logs of a docker task configured with a log driver that does not support streaming logs. Previously this was using the 'gelf' log driver, but it seems that no longer returns an error as expected. Instead we can just use the 'none' log driver, which has the desired effect 2022-04-19T10:23:19.129-0500 [ERROR] docklog/docker_logger.go:133: log streaming ended with terminal error: error="API error (501): configured logging driver does not support reading"	2022-04-19 10:27:01 -05:00
Seth Hoenig	bae42fad7c	exec: fix exec handler test Fixup this test to handle cgroups v2, as well as the :misc: cgroup	2022-04-06 12:11:37 -05:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Seth Hoenig	3ce4f52740	Merge pull request #12446 from shoenig/no-pkg-err cleanup: purge github.com/pkg/errors	2022-04-04 09:22:44 -05:00
James Rasell	19281bb2fe	Merge pull request #12304 from th0m/tlefebvre/fix-wrong-drivernetworkmanager-interface fix: update incorrect DriverNetworkManager interface implementation	2022-04-04 11:29:22 +02:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Seth Hoenig	174a7532a1	tests: create fresh harness for each docker dns test Not actually sure this fixes the flaky tests, but seems like it could be related.	2022-03-31 08:17:34 -05:00
Seth Hoenig	113b7eb727	client: cgroups v2 code review followup	2022-03-24 13:40:42 -05:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
James Rasell	1a4db3523d	Merge branch 'main' into tlefebvre/fix-wrong-drivernetworkmanager-interface	2022-03-17 09:38:13 +01:00

1 2 3 4 5 ...

730 Commits