open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	acfdf0f479	compliance: add headers with fixed copywrite tool (#17353 ) Closes #17117	2023-05-30 09:20:32 -05:00
Tim Gross	d018fcbff7	allocrunner: provide factory function so we can build mock ARs (#17161 ) Tools like `nomad-nodesim` are unable to implement a minimal implementation of an allocrunner so that we can test the client communication without having to lug around the entire allocrunner/taskrunner code base. The allocrunner was implemented with an interface specifically for this purpose, but there were circular imports that made it challenging to use in practice. Move the AllocRunner interface into an inner package and provide a factory function type. Provide a minimal test that exercises the new function so that consumers have some idea of what the minimum implementation required is.	2023-05-12 13:29:44 -04:00
Daniel Bennett	a7ed6f5c53	full task cleanup when alloc prerun hook fails (#17104 ) to avoid leaking task resources (e.g. containers, iptables) if allocRunner prerun fails during restore on client restart. now if prerun fails, TaskRunner.MarkFailedKill() will only emit an event, mark the task as failed, and cancel the tr's killCtx, so then ar.runTasks() -> tr.Run() can take care of the actual cleanup. removed from (formerly) tr.MarkFailedDead(), now handled by tr.Run(): * set task state as dead * save task runner local state * task stop hooks also done in tr.Run() now that it's not skipped: * handleKill() to kill tasks while respecting their shutdown delay, and retrying as needed * also includes task preKill hooks * clearDriverHandle() to destroy the task and associated resources * task exited hooks	2023-05-08 13:17:10 -05:00
Tim Gross	62548616d4	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Lance Haig	35c17b2e56	deps: Update ioutil deprecated library references to os and io respectively in the client package (#16318 ) * Update ioutil deprecated library references to os and io respectively * Deal with the errors produced. Add error handling to filEntry info Add error handling to info	2023-03-08 13:25:10 -06:00
Seth Hoenig	165791dd89	artifact: protect against unbounded artifact decompression (1.5.0) (#16151 ) * artifact: protect against unbounded artifact decompression Starting with 1.5.0, set defaut values for artifact decompression limits. artifact.decompression_size_limit (default "100GB") - the maximum amount of data that will be decompressed before triggering an error and cancelling the operation artifact.decompression_file_count_limit (default 4096) - the maximum number of files that will be decompressed before triggering an error and cancelling the operation. * artifact: assert limits cannot be nil in validation	2023-02-14 09:28:39 -06:00
Michael Schurter	0a496c845e	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Charlie Voiselle	4caac1a92f	client: Add option to enable hairpinMode on Nomad bridge (#15961 ) * Add `bridge_network_hairpin_mode` client config setting * Add node attribute: `nomad.bridge.hairpin_mode` * Changed format string to use `%q` to escape user provided data * Add test to validate template JSON for developer safety Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-02-02 10:12:15 -05:00
Karl Johann Schubert	b773a1b77f	client: add disk_total_mb and disk_free_mb config options (#15852 )	2023-01-24 09:14:22 -05:00
Seth Hoenig	be3f89b5f9	artifact: enable inheriting environment variables from client (#15514 ) * artifact: enable inheriting environment variables from client This PR adds client configuration for specifying environment variables that should be inherited by the artifact sandbox process from the Nomad Client agent. Most users should not need to set these values but the configuration is provided to ensure backwards compatability. Configuration of go-getter should ideally be done through the artifact block in a jobspec task. e.g. ```hcl client { artifact { set_environment_variables = "TMPDIR,GIT_SSH_OPTS" } } ``` Closes #15498 * website: update set_environment_variables text to mention PATH	2022-12-09 15:46:07 -06:00
Seth Hoenig	825c5cc65e	artifact: add client toggle to disable filesystem isolation (#15503 ) This PR adds the client config option for turning off filesystem isolation, applicable on Linux systems where filesystem isolation is possible and enabled by default. ```hcl client{ artifact { disable_filesystem_isolation = <bool:false> } } ``` Closes #15496	2022-12-08 12:29:23 -06:00
Seth Hoenig	51a2212d3d	client: sandbox go-getter subprocess with landlock (#15328 ) * client: sandbox go-getter subprocess with landlock This PR re-implements the getter package for artifact downloads as a subprocess. Key changes include On all platforms, run getter as a child process of the Nomad agent. On Linux platforms running as root, run the child process as the nobody user. On supporting Linux kernels, uses landlock for filesystem isolation (via go-landlock). On all platforms, restrict environment variables of the child process to a static set. notably TMP/TEMP now points within the allocation's task directory kernel.landlock attribute is fingerprinted (version number or unavailable) These changes make Nomad client more resilient against a faulty go-getter implementation that may panic, and more secure against bad actors attempting to use artifact downloads as a privilege escalation vector. Adds new e2e/artifact suite for ensuring artifact downloading works. TODO: Windows git test (need to modify the image, etc... followup PR) * landlock: fixup items from cr * cr: fixup tests and go.mod file	2022-12-07 16:02:25 -06:00
James Rasell	d7b311ce55	acl: correctly resolve ACL roles within client cache. (#14922 ) The client ACL cache was not accounting for tokens which included ACL role links. This change modifies the behaviour to resolve role links to policies. It will also now store ACL roles within the cache for quick lookup. The cache TTL is configurable in the same manner as policies or tokens. Another small fix is included that takes into account the ACL token expiry time. This was not included, which meant tokens with expiry could be used past the expiry time, until they were GC'd.	2022-10-20 09:37:32 +02:00
Michael Schurter	45ce8c13cf	client: remove unused LogOutput and LogLevel (#14867 ) * client: remove unused LogOutput * client: remove unused config.LogLevel	2022-10-11 09:24:40 -07:00
Seth Hoenig	5e38a0e82c	cleanup: rename Equals to Equal for consistency (#14759 )	2022-10-10 09:28:46 -05:00
Seth Hoenig	2088ca3345	cleanup more helper updates (#14638 ) * cleanup: refactor MapStringStringSliceValueSet to be cleaner * cleanup: replace SliceStringToSet with actual set * cleanup: replace SliceStringSubset with real set * cleanup: replace SliceStringContains with slices.Contains * cleanup: remove unused function SliceStringHasPrefix * cleanup: fixup StringHasPrefixInSlice doc string * cleanup: refactor SliceSetDisjoint to use real set * cleanup: replace CompareSliceSetString with SliceSetEq * cleanup: replace CompareMapStringString with maps.Equal * cleanup: replace CopyMapStringString with CopyMap * cleanup: replace CopyMapStringInterface with CopyMap * cleanup: fixup more CopyMapStringString and CopyMapStringInt * cleanup: replace CopySliceString with slices.Clone * cleanup: remove unused CopySliceInt * cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice * cleanup: replace CopyMap with maps.Clone * cleanup: run go mod tidy	2022-09-21 14:53:25 -05:00
Luiz Aoqui	7ee3de3ea5	fix minor issues found durint ENT merge (#14250 )	2022-08-23 17:22:18 -04:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Seth Hoenig	5dd8aa3e27	client: enforce max_kill_timeout client configuration This PR fixes a bug where client configuration max_kill_timeout was not being enforced. The feature was introduced in 9f44780 but seems to have been removed during the major drivers refactoring. We can make sure the value is enforced by pluming it through the DriverHandler, which now uses the lesser of the task.killTimeout or client.maxKillTimeout. Also updates Event.SetKillTimeout to require both the task.killTimeout and client.maxKillTimeout so that we don't make the mistake of using the wrong value - as it was being given only the task.killTimeout before.	2022-07-06 15:29:38 -05:00
Derek Strickland	13ea5ae87a	consul-template: Add fault tolerant defaults (#13041 ) consul-template: Add fault tolerant defaults Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-08 14:08:25 -04:00
Michael Schurter	2965dc6a1a	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Seth Hoenig	d1bda4a954	ci: fixup task runner chroot test This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test. And also the TestTaskRunner_Download_ChrootExec test. - Use TinyChroot to stop copying gigabytes of junk, which causes GHA to fail to create the environment in time. - Pre-create cgroups on V2 systems. Normally the cgroup directory is managed by the cpuset manager, but that is not active in taskrunner tests, so create it by hand in the test framework.	2022-04-19 10:37:46 -05:00
Derek Strickland	7c6eb47b78	`consul-template`: revert `function_denylist` logic (#12071 ) * consul-template: replace config rather than append Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-04-18 13:57:56 -04:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Michael Schurter	7a28fcb8af	template: disallow `writeToFile` by default Resolves #12095 by WONTFIXing it. This approach disables `writeToFile` as it allows arbitrary host filesystem writes and is only a small quality of life improvement over multiple `template` stanzas. This approach has the significant downside of leaving people who have altered their `template.function_denylist` still vulnerable! I added an upgrade note, but we should have implemented the denylist as a `map[string]bool` so that new funcs could be denied without overriding custom configurations. This PR also includes a bug fix that broke enabling all consul-template funcs. We repeatedly failed to differentiate between a nil (unset) denylist and an empty (allow all) one.	2022-03-28 17:05:42 -07:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
James Rasell	541a0f7d9a	config: add native service discovery admin boolean parameter.	2022-03-14 11:48:13 +01:00
Derek Strickland	b3c8ab9be7	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-26 11:31:37 -05:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
pavel	06349676de	docs: fix typo in the comment comment in the source code for Logger: thhe -> the	2021-11-25 00:35:45 +01:00
Michael Schurter	59fda1894e	Merge pull request #11167 from a-zagaevskiy/master Support configurable dynamic port range	2021-10-13 16:47:38 -07:00
Michael Schurter	7071425af3	client: defensively log reserved ports - Fix test broken due to being improperly setup. - Include min/max ports in default client config.	2021-10-04 15:43:35 -07:00
Michael Schurter	4ad0c258b9	client: add NOMAD_LICENSE to default env deny list By default we should not expose the NOMAD_LICENSE environment variable to tasks. Also refactor where the DefaultEnvDenyList lives so we don't have to maintain 2 copies of it. Since client/config is the most obvious location, keep a reference there to its unfortunate home buried deep in command/agent/host. Since the agent uses this list as well for the /agent/host endpoint the list must be accessible from both command/agent and client.	2021-09-21 13:51:17 -07:00
Aleksandr Zagaevskiy	ebb87e65fe	Support configurable dynamic port range	2021-09-10 11:52:47 +03:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
Nick Ethier	b235091a51	client: disable cpuset cgroup managment if init fails	2021-04-14 14:44:08 -04:00
Nick Ethier	0a21de91dd	Apply suggestions from code review Co-authored-by: Drew Bailey <drewbailey5@gmail.com>	2021-04-13 13:28:15 -04:00
Nick Ethier	edc0da9040	client: only fingerprint reservable cores via cgroups, allowing manual override for other platforms	2021-04-13 13:28:15 -04:00
Nick Ethier	bed4e92b61	fingerprint: implement client fingerprinting of reservable cores on Linux systems this is derived from the configure cpuset cgroup parent (defaults to /nomad) for non Linux systems and Linux systems where cgroups are not enabled, the client defaults to using all cores	2021-04-13 13:28:15 -04:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Fredrik Hoem Grelland	953d4de8dd	update consul-template to v0.25.1 (#8988 )	2020-10-01 14:08:49 -04:00
Mahmood Ali	7ddf4b2902	drivers/exec: fix DNS resolution in systemd hosts Host with systemd-resolved have `/etc/resolv.conf` is a symlink to `/run/systemd/resolve/stub-resolv.conf`. By bind-mounting /etc/resolv.conf only, the exec container DNS resolution fail very badly. This change fixes DNS resolution by binding /run/systemd/resolve as well. Note that this assumes that the systemd resolver (default to 127.0.0.53) is accessible within the container. This is the case here because exec containers share the same network namespace by default. Jobs with custom network dns configurations are not affected, and Nomad will continue to use the job dns settings rather than host one.	2020-09-29 11:33:51 -04:00

1 2 3 4

158 Commits