open-nomad

Author	SHA1	Message	Date
Tim Gross	bfcbc00f4e	workload identity (#13223 ) In order to support implicit ACL policies for tasks to get their own secrets, each task would need to have its own ACL token. This would add extra raft overhead as well as new garbage collection jobs for cleaning up task-specific ACL tokens. Instead, Nomad will create a workload Identity Claim for each task. An Identity Claim is a JSON Web Token (JWT) signed by the server’s private key and attached to an Allocation at the time a plan is applied. The encoded JWT can be submitted as the X-Nomad-Token header to replace ACL token secret IDs for the RPCs that support identity claims. Whenever a key is is added to a server’s keyring, it will use the key as the seed for a Ed25519 public-private private keypair. That keypair will be used for signing the JWT and for verifying the JWT. This implementation is a ruthlessly minimal approach to support the secure variables feature. When a JWT is verified, the allocation ID will be checked against the Nomad state store, and non-existent or terminal allocation IDs will cause the validation to be rejected. This is sufficient to support the secure variables feature at launch without requiring implementation of a background process to renew soon-to-expire tokens.	2022-07-11 13:34:05 -04:00
Seth Hoenig	5dd8aa3e27	client: enforce max_kill_timeout client configuration This PR fixes a bug where client configuration max_kill_timeout was not being enforced. The feature was introduced in 9f44780 but seems to have been removed during the major drivers refactoring. We can make sure the value is enforced by pluming it through the DriverHandler, which now uses the lesser of the task.killTimeout or client.maxKillTimeout. Also updates Event.SetKillTimeout to require both the task.killTimeout and client.maxKillTimeout so that we don't make the mistake of using the wrong value - as it was being given only the task.killTimeout before.	2022-07-06 15:29:38 -05:00
Derek Strickland	7d6a3df197	csi_hook: valid if any driver supports csi (#13446 ) * csi_hook: valid if any driver supports csi volumes	2022-06-22 10:43:43 -04:00
Jeffrey Clark	a97699221c	cni: add loopback to linux bridge (#13428 ) CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. https://github.com/containernetworking/cni/pull/121 Fixes #10014	2022-06-20 11:22:53 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Derek Strickland	12f3ee46ea	alloc_runner: stop sidecar tasks last (#13055 ) alloc_runner: stop sidecar tasks last	2022-06-07 11:35:19 -04:00
Radek Simko	9cc71d6665	client/allochealth: add healthy_deadline as context to error messages (#13214 )	2022-06-06 10:11:08 -04:00
Michael Schurter	2965dc6a1a	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Seth Hoenig	65f7abf2f4	cli: update default redis and use nomad service discovery Closes #12927 Closes #12958 This PR updates the version of redis used in our examples from 3.2 to 7. The old version is very not supported anymore, and we should be setting a good example by using a supported version. The long-form example job is now fixed so that the service stanza uses nomad as the service discovery provider, and so now the job runs without a requirement of having Consul running and configured.	2022-05-17 10:24:19 -05:00
Seth Hoenig	26b5c01431	Merge pull request #12817 from twunderlich-grapl/fix-network-interpolation Fix network.dns interpolation	2022-05-17 09:31:32 -05:00
Eng Zer Jun	97d1bc735c	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
Seth Hoenig	96ec19788d	cgroups: make sure cgroup still exists after task restart This PR modifies raw_exec and exec to ensure the cgroup for a task they are driving still exists during a task restart. These drivers have the same bug but with different root cause. For raw_exec, we were removing the cgroup in 2 places - the cpuset manager, and in the unix containment implementation (the thing that uses freezer cgroup to clean house). During a task restart, the containment would remove the cgroup, and when the task runner hooks went to start again would block on waiting for the cgroup to exist, which will never happen, because it gets created by the cpuset manager which only runs as an alloc pre-start hook. The fix here is to simply not delete the cgroup in the containment implementation; killing the PIDs is enough. The removal happens in the cpuset manager later anyway. For exec, it's the same idea, except DestroyTask is called on task failure, which in turn calls into libcontainer, which in turn deletes the cgroup. In this case we do not have control over the deletion of the cgroup, so instead we hack the cgroup back into life after the call to DestroyTask. All of this only applies to cgroups v2.	2022-05-05 09:51:03 -05:00
Thomas Wunderlich	245d2a463b	Fix formatting	2022-04-29 10:02:20 -04:00
Thomas Wunderlich	c86e287de9	Remove debug log lines	2022-04-28 19:14:31 -04:00
Thomas Wunderlich	960e192359	Quick and dirty hack to get interpolated dns values working	2022-04-28 17:09:53 -04:00
Tim Gross	3d630a3629	CSI: enforce one plugin supervisor loop via `sync.Once` (#12785 ) We enforce exactly one plugin supervisor loop by checking whether `running` is set and returning early. This works but is fairly subtle. It can briefly result in two goroutines where one quickly exits before doing any work. Clarify the intent by using `sync.Once`. The goroutine we've spawned only exits when the entire task runner is being torn down, and not when the task driver restarts the workload, so it should never be re-run.	2022-04-26 10:38:50 -04:00
Tim Gross	766025cde7	CSI: plugin supervisor prestart should not mark itself done (#12752 ) The task runner hook `Prestart` response object includes a `Done` field that's intended to tell the client not to run the hook again. The plugin supervisor creates mount points for the task during prestart and saves these mounts in the hook resources. But if a client restarts the hook resources will not be populated. If the plugin task restarts at any time after the client restarts, it will fail to have the correct mounts and crash loop until restart attempts run out. Fix this by not returning `Done` in the response, just as we do for the `volume_mount_hook`.	2022-04-22 13:07:47 -04:00
Gowtham	1ff8b5f759	Add Concurrent Download Support for artifacts (#11531 ) * add concurrent download support - resolves #11244 * format imports * mark `wg.Done()` via `defer` * added tests for successful and failure cases and resolved some goleak * docs: add changelog for #11531 * test typo fixes and improvements Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-20 10:15:56 -07:00
Seth Hoenig	d1bda4a954	ci: fixup task runner chroot test This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test. And also the TestTaskRunner_Download_ChrootExec test. - Use TinyChroot to stop copying gigabytes of junk, which causes GHA to fail to create the environment in time. - Pre-create cgroups on V2 systems. Normally the cgroup directory is managed by the cpuset manager, but that is not active in taskrunner tests, so create it by hand in the test framework.	2022-04-19 10:37:46 -05:00
James Rasell	9bc16b1333	client: account for service provider namespace updates in hooks. (#12479 ) When a service is updated, the service hooks update a number of internal fields which helps generate the new workload. This also needs to update the namespace for the service provider. It is possible for these to be different, and in the case of Nomad and Consul running OSS, this is to be expected.	2022-04-06 19:26:22 +02:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Derek Strickland	d1d6009e2c	disconnected clients: Support operator manual interventions (#12436 ) * allocrunner: Remove Shutdown call in Reconnect * Node.UpdateAlloc: Stop orphaned allocs. * reconciler: Stop failed reconnects. * Apply feedback from code review. Handle rebase conflict. * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-06 09:33:32 -04:00
Derek Strickland	bd719bc7b8	reconciler: 2 phase reconnects and tests (#12333 ) * structs: Add alloc.Expired & alloc.Reconnected functions. Add Reconnect eval trigger by. * node_endpoint: Emit new eval for reconnecting unknown allocs. * filterByTainted: handle 2 phase commit filtering rules. * reconciler: Append AllocState on disconnect. Logic updates from testing and 2 phase reconnects. * allocs: Set reconnect timestamp. Destroy if not DesiredStatusRun. Watch for unknown status.	2022-04-05 17:13:10 -04:00
Derek Strickland	c7a5b17251	Fix client test reconnect test; Remove guard test (#12173 ) * Update reconnect test to new algorithm and interface; remove guard test	2022-04-05 17:12:23 -04:00
Derek Strickland	3cbd76ea9d	disconnected clients: Add reconnect task event (#12133 ) * Add TaskClientReconnectedEvent constant * Add allocRunner.Reconnect function to manage task state manually * Removes server-side push	2022-04-05 17:12:23 -04:00
Tim Gross	b5d5393718	CSI: don't block client shutdown for node unmount (#12457 ) When we unmount a volume we need to be able to recover from cases where the plugin has been shutdown before the allocation that needs it, so in #11892 we blocked shutting down the alloc runner hook. But this blocks client shutdown if we're in the middle of unmounting. The client won't be able to communicate with the plugin or send the unpublish RPC anyways, so we should cancel the context and assume that we'll resume the unmounting process when the client restarts. For `-dev` mode we don't send the graceful `Shutdown()` method and instead destroy all the allocations. In this case, we'll never be able to communicate with the plugin but also never close the context we need to prevent the hook from blocking. To fix this, move the retries into their own goroutine that doesn't block the main `Postrun`.	2022-04-05 13:05:10 -04:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Seth Hoenig	8dcf01e94b	tests: remove update 08 groups services test This is a test around upgrading from Nomad 0.8, which is long since no longer supported. The test is slow, flaky, and imports consul/sdk. Remove this test as it is no longer relevant.	2022-03-31 10:14:22 -05:00
Michael Schurter	cae69ba8ce	Merge pull request #12312 from hashicorp/f-writeToFile template: disallow `writeToFile` by default	2022-03-29 13:41:59 -07:00
Michael Schurter	33fe04ff6a	template: fix comments and docs Review notes from @lgfa29 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-03-29 09:25:23 -07:00
Tim Gross	a6652bffad	CSI: reorder controller volume detachment (#12387 ) In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.	2022-03-29 09:44:00 -04:00
Michael Schurter	7a28fcb8af	template: disallow `writeToFile` by default Resolves #12095 by WONTFIXing it. This approach disables `writeToFile` as it allows arbitrary host filesystem writes and is only a small quality of life improvement over multiple `template` stanzas. This approach has the significant downside of leaving people who have altered their `template.function_denylist` still vulnerable! I added an upgrade note, but we should have implemented the denylist as a `map[string]bool` so that new funcs could be denied without overriding custom configurations. This PR also includes a bug fix that broke enabling all consul-template funcs. We repeatedly failed to differentiate between a nil (unset) denylist and an empty (allow all) one.	2022-03-28 17:05:42 -07:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Tim Gross	e687a21da9	CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257 ) * Use unix:// prefix for CSI_ENDPOINT variable by default * Some plugins have strict validation over the format of the `CSI_ENDPOINT` variable, and unfortunately not all plugins agree. Allow the user to override the `CSI_ENDPOINT` to workaround those cases. * Update all demos and tests with CSI_ENDPOINT	2022-03-21 11:48:47 -04:00
James Rasell	042bf0fa57	client: hookup service wrapper for use within client hooks.	2022-03-21 10:29:57 +01:00
James Rasell	91e4d20b1d	Merge pull request #12307 from hashicorp/b-groupservices-avoid-double-tg-lookup client: avoid double group lookup within groupservice hook setup.	2022-03-16 17:00:01 +01:00
James Rasell	e7d0dfbc8b	client: avoid double group lookup within groupservice hook setup.	2022-03-16 09:42:57 +01:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
James Rasell	7cd28a6fb6	client: refactor common service registration objects from Consul. This commit performs refactoring to pull out common service registration objects into a new `client/serviceregistration` package. This new package will form the base point for all client specific service registration functionality. The Consul specific implementation is not moved as it also includes non-service registration implementations; this reduces the blast radius of the changes as well.	2022-03-15 09:38:30 +01:00
Tim Gross	f4dfaec589	CSI: set plugin socket path on restore (#12149 ) The Prestart hook for task runner hooks doesn't get called when we restore a task, because the task is already running. The Postrun hook for CSI plugin supervisors needs the socket path to have been populated so that the client has a valid path.	2022-03-01 10:22:52 -05:00
Tim Gross	22cf24a6bd	CSI: retry claims from client when max claims are reached (#12113 ) When the alloc runner claims a volume, an allocation for a previous version of the job may still have the volume claimed because it's still shutting down. In this case we'll receive an error from the server. Retry this error until we succeed or until a very long timeout expires, to give operators a chance to recover broken plugins. Make the alloc runner hook tolerant of temporary RPC failures.	2022-02-24 10:39:07 -05:00
Tim Gross	246db87a74	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Tim Gross	309ac6c3d8	csi: don't wait to fire initial unmount RPC (#12102 ) In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.	2022-02-22 13:43:06 -05:00
Michael Schurter	48aaa2c7d9	Merge pull request #11975 from hashicorp/f-connect-debugging connect: write envoy bootstrap debugging info	2022-02-18 13:56:22 -08:00
Seth Hoenig	6550c90198	connect: bootstrap envoy using -proxy-id This PR modifies the Consul CLI arguments used to bootstrap envoy for Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'. Nomad registers the sidecar service, so we know what ID it has. The '-sidecar-for' was intended for use when you only know the name of the service for which the sidecar is being created. The improvement here is that using '-proxy-id' does not require an underlying request for listing Consul services. This will make make the interaction between Nomad and Consul more efficient. Closes #10452	2022-02-18 14:58:23 -06:00
Michael Schurter	27b8112123	connect: write envoy bootstrap debugging info When Consul Connect just works, it's wonderful. When it doesn't work it can be exceeding difficult to debug: operators have to check task events, Nomad logs, Consul logs, Consul APIs, and even then critical information is missing. Using Consul to generate a bootstrap config for Envoy is notoriously difficult. Nomad doesn't even log stderr, so operators are left trying to piece together what went wrong. This patch attempts to provide maximal context which unfortunately includes secrets. Secrets are always restricted to the secrets/ directory. This makes debugging a little harder, but allows operators to know exactly what operation Nomad was trying to perform. What's added: - stderr is sent to alloc/logs/envoy_bootstrap.stderr.0 - the CLI is written to secrets/.envoy_bootstrap.cmd - the environment is written to secrets/.envoy_bootstrap.env as JSON Accessing this information is unfortunately awkward: ``` nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0 ``` The above assumes an alloc id that starts with `b36a` and a Connect sidecar proxy for a service named `count-countdash`. If the alloc is unable to start successfully, the debugging files are only accessible from the host filesystem.	2022-02-18 12:02:36 -08:00
Tim Gross	27bb2da5ee	CSI: make gRPC client creation more robust (#12057 ) Nomad communicates with CSI plugin tasks via gRPC. The plugin supervisor hook uses this to ping the plugin for health checks which it emits as task events. After the first successful health check the plugin supervisor registers the plugin in the client's dynamic plugin registry, which in turn creates a CSI plugin manager instance that has its own gRPC client for fingerprinting the plugin and sending mount requests. If the plugin manager instance fails to connect to the plugin on its first attempt, it exits. The plugin supervisor hook is unaware that connection failed so long as its own pings continue to work. A transient failure during plugin startup may mislead the plugin supervisor hook into thinking the plugin is up (so there's no need to restart the allocation) but no fingerprinter is started. * Refactors the gRPC client to connect on first use. This provides the plugin manager instance the ability to retry the gRPC client connection until success. * Add a 30s timeout to the plugin supervisor so that we don't poll forever waiting for a plugin that will never come back up. Minor improvements: * The plugin supervisor hook creates a new gRPC client for every probe and then throws it away. Instead, reuse the client as we do for the plugin manager. * The gRPC client constructor has a 1 second timeout. Clarify that this timeout applies to the connection and not the rest of the client lifetime.	2022-02-15 16:57:29 -05:00

1 2 3 4 5 ...

639 commits