open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	ec47b245d0	client: don't use `Status` RPC for Consul discovery (#16490 ) In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s.	2023-03-16 15:38:33 -04:00
Seth Hoenig	5b1970468e	artifact: git needs more files for private repositories (#16508 ) * landlock: git needs more files for private repositories This PR fixes artifact downloading so that git may work when cloning from private repositories. It needs - file read on /etc/passwd - dir read on /root/.ssh - file write on /root/.ssh/known_hosts Add these rules to the landlock rules for the artifact sandbox. * cr: use nonexistent instead of devnull Co-authored-by: Michael Schurter <mschurter@hashicorp.com> * cr: use go-homdir for looking up home directory * pr: pull go-homedir into explicit require * cr: fixup homedir tests in homeless root cases * cl: fix root test for real --------- Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-03-16 12:22:25 -05:00
Michael Schurter	81b8c52472	docs: dispatch_payload and jobs api docs had some weirdness (#16514 ) * docs: dispatch_payload docs had some weirdness Docs said "Examples" when there was only 1 example. Not sure what the floating "to" in the description was for. * docs: missing a heading level on jobs api docs	2023-03-16 09:42:46 -07:00
Seth Hoenig	d2e8fb626a	artifact: do not set process attributes on darwin (#16511 ) This PR fixes the non-root macOS use case where artifact downloads stopped working. It seems setting a Credential on a SysProcAttr used by the exec package will always cause fork/exec to fail - even if the credential contains our own UID/GID or nil UID/GID. Technically we do not need to set this as the child process will inherit the parent UID/GID anyway... and not setting it makes things work again ... /shrug	2023-03-16 11:31:18 -05:00
Seth Hoenig	25944cbb7d	artifact: use specific version link for zipbomb artifact (#16513 ) Fix the e2e case where we download the go-getter bomb.zip test file, which is being removed on main. We can still get it from the version tag - yay git!	2023-03-16 10:18:46 -05:00
James Rasell	184733a126	build: fix `test-nomad` make target when running locally. (#16506 )	2023-03-16 09:32:14 +01:00
Daniel Bennett	0331dd71ca	test: set BuildDate in default TestAgent config (#16499 ) so enterprise tests don't fail due to the default zero time	2023-03-15 11:47:15 -05:00
James Rasell	b0a3964e6b	cli: fix login help output formatting. (#16502 )	2023-03-15 13:23:26 +01:00
Seth Hoenig	ed7177de76	scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421 ) * scheduler: annotate tasksUpdated with reason and purge DeepEquals * cr: move opaque into helper * cr: swap affinity/spread hashing for slice equal * contributing: update checklist-jobspec with notes about struct methods * cr: add more cases to wait config equal method * cr: use reflect when comparing envoy config blocks * cl: add cl	2023-03-14 09:46:00 -05:00
Anthony	6a7e22d546	Merge pull request #16484 from hashicorp/tunzor-patch-1 Update for enterprise trial wording and link	2023-03-14 10:19:29 -04:00
Anthony	9a3d2924e4	Updated trial license link and wording	2023-03-14 09:31:06 -04:00
Juana De La Cuesta	c235bafa3f	cli: Add `-json` and `-t` flags to `namespace status` command (#16442 ) * cli: Add and flag to namespace status command * Update command/namespace_status.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * cli: update tests for namespace status command to use must --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-14 14:23:04 +01:00
Tim Gross	16b731e456	docs: clarify migration behavior under `nomad alloc stop` (#16468 )	2023-03-14 09:00:29 -04:00
Tim Gross	8579d1e479	agent: trim space when parsing X-Nomad-Token header (#16469 ) Our auth token parsing code trims space around the `Authorization` header but not around `X-Nomad-Token`. When using the UI, it's easy to accidentally introduce a leading or trailing space, which results in spurious authentication errors. Trim the space at the HTTP server.	2023-03-14 08:57:53 -04:00
Seth Hoenig	a25d3ea792	cgv1: do not disable cpuset manager if reserved interface already exists (#16467 ) * cgv1: do not disable cpuset manager if reserved interface already exists This PR fixes a bug where restarting a Nomad Client on a machine using cgroups v1 (e.g. Ubuntu 20.04) would cause the cpuset cgroups manager to disable itself. This is being caused by incorrectly interpreting a "file exists" error as problematic when ensuring the reserved cpuset exists. If we get a "file exists" error, that just means the Client was likely restarted. Note that a machine reboot would fix the issue - the groups interfaces are ephemoral. * cl: add cl	2023-03-13 17:00:17 -05:00
Luiz Aoqui	adf147cb36	acl: update job eval requirement to `submit-job` (#16463 ) The job evaluate endpoint creates a new evaluation for the job which is a write operation. This change modifies the necessary capability from `read-job` to `submit-job` to better reflect this.	2023-03-13 17:13:54 -04:00
Luiz Aoqui	c29a87b875	plugin: add missing fields to `TaskConfig` (#16434 )	2023-03-13 15:58:16 -04:00
Dao Thanh Tung	ca9a43eced	doc: Update `nomad fmt` doc to run against non-deprecated HCL2 jobspec only (#16435 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-03-13 15:26:27 -04:00
Michael Schurter	8da636c6d5	build: update from go1.20.1 to go1.20.2 (#16427 ) * build: update from go1.20.1 to go1.20.2 Note that the CVE fixed in go1.20.2 does not impact Nomad. https://github.com/golang/go/issues/58647	2023-03-13 09:47:07 -07:00
dependabot[bot]	5b9bbd12ea	build(deps): bump go.uber.org/goleak from 1.2.0 to 1.2.1 (#16439 ) Bumps [go.uber.org/goleak](https://github.com/uber-go/goleak) from 1.2.0 to 1.2.1. - [Release notes](https://github.com/uber-go/goleak/releases) - [Changelog](https://github.com/uber-go/goleak/blob/master/CHANGELOG.md) - [Commits](https://github.com/uber-go/goleak/compare/v1.2.0...v1.2.1) --- updated-dependencies: - dependency-name: go.uber.org/goleak dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-13 11:23:56 -05:00
Tim Gross	9dfb51579c	scheduler: refactor system util tests (#16416 ) The tests for the system allocs reconciling code path (`diffSystemAllocs`) include many impossible test environments, such as passing allocs for the wrong node into the function. This makes the test assertions nonsensible for use in walking yourself through the correct behavior. I've pulled this changeset out of PR #16097 so that we can merge these improvements and revisit the right approach to fix the problem in #16097 with less urgency now that the PFNR bug fix has been merged. This changeset breaks up a couple of tests, expands test coverage, and makes test assertions more clear. It also corrects one bit of production code that behaves fine in production because of canonicalization, but forces us to remember to set values in tests to compensate.	2023-03-13 11:59:31 -04:00
Seth Hoenig	630bd8eb68	scheduler: add simple benchmark for tasksUpdated (#16422 ) In preperation for some refactoring to tasksUpdated, add a benchmark to the old code so it's easy to compare with the changes, making sure nothing goes off the rails for performance.	2023-03-13 10:44:14 -05:00
Seth Hoenig	b3cec771d6	deps: remove replace statement for go-discover (#16304 ) Which we no longer need since we no longer have consul as a dependency	2023-03-13 10:40:35 -05:00
Tim Gross	c156640e84	Merge pull request #16445 from hashicorp/post-1.5.1-release Post 1.5.1 release	2023-03-13 11:29:49 -04:00
Tim Gross	d0aa105087	Merge release 1.5.1 files	2023-03-13 11:15:04 -04:00
hc-github-team-nomad-core	2d1a4d90e9	Prepare for next release	2023-03-13 11:13:27 -04:00
hc-github-team-nomad-core	35167e692a	Generate files for 1.5.1 release	2023-03-13 11:13:27 -04:00
Tim Gross	1cf28996e7	acl: prevent privilege escalation via workload identity ACL policies can be associated with a job so that the job's Workload Identity can have expanded access to other policy objects, including other variables. Policies set on the variables the job automatically has access to were ignored, but this includes policies with `deny` capabilities. Additionally, when resolving claims for a workload identity without any attached policies, the `ResolveClaims` method returned a `nil` ACL object, which is treated similarly to a management token. While this was safe in Nomad 1.4.x, when the workload identity token was exposed to the task via the `identity` block, this allows a user with `submit-job` capabilities to escalate their privileges. We originally implemented automatic workload access to Variables as a separate code path in the Variables RPC endpoint so that we don't have to generate on-the-fly policies that blow up the ACL policy cache. This is fairly brittle but also the behavior around wildcard paths in policies different from the rest of our ACL polices, which is hard to reason about. Add an `ACLClaim` parameter to the `AllowVariableOperation` method so that we can push all this logic into the `acl` package and the behavior can be consistent. This will allow a `deny` policy to override automatic access (and probably speed up checks of non-automatic variable access).	2023-03-13 11:13:27 -04:00
Michael Schurter	832bca91a1	e2e fixes: cli output, timing issue, and some cleanups (#16418 ) * e2e: job expects alloc to run until stopped * e2e: fix case changed by #16306 * e2e: couldn't find a bug but improved test+jobspecs	2023-03-10 13:14:51 -08:00
Luiz Aoqui	7305a374e3	allocrunner: fix health check monitoring for Consul services (#16402 ) Services must be interpolated to replace runtime variables before they can be compared against the values returned by Consul.	2023-03-10 14:43:31 -05:00
Juana De La Cuesta	5089f13f1d	cli: add `-json` and `-t` flag for `alloc checks` command (#16405 ) * cli: add -json flag to alloc checks for completion * CLI: Expand test to include testing the json flag for allocation checks * Documentation: Add the checks command * Documentation: Add example for alloc check command * Update website/content/docs/commands/alloc/checks.mdx Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * CLI: Add template flag to alloc checks command * Update website/content/docs/commands/alloc/checks.mdx Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * CLI: Extend test to include -t flag for alloc checks * func: add changelog for added flags to alloc checks * cli[doc]: Make usage section on alloc checks clearer * Update website/content/docs/commands/alloc/checks.mdx Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Delete modd.conf * cli[doc]: add -t flag to command description for alloc checks --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com> Co-authored-by: Juanita De La Cuesta Morales <juanita.delacuestamorales@juanita.delacuestamorales-LHQ7X0QG9X>	2023-03-10 16:58:53 +01:00
Michael Schurter	0021b282ef	env/aws: update ec2 cpu info data (#16417 ) Update AWS EC2 CPU tables using `make ec2info`	2023-03-09 14:33:21 -08:00
Luiz Aoqui	1aceff7806	cli: remove hard requirement on `list-jobs` (#16380 ) Most job subcommands allow for job ID prefix match as a convenience functionality so users don't have to type the full job ID. But this introduces a hard ACL requirement that the token used to run these commands have the `list-jobs` permission, even if the token has enough permission to execute the basic command action and the user passed an exact job ID. This change softens this requirement by not failing the prefix match in case the request results in a permission denied error and instead using the information passed by the user directly.	2023-03-09 15:00:04 -05:00
Bryce Kalow	3239539526	docs: update content-conformance package (#16412 )	2023-03-09 12:47:46 -06:00
James Rasell	001bca34a6	cli: fix help output format on `job init` command. (#16407 )	2023-03-09 18:17:15 +01:00
Tim Gross	99d46e5a49	scheduling: prevent self-collision in dynamic port network offerings (#16401 ) When the scheduler tries to find a placement for a new allocation, it iterates over a subset of nodes. For each node, we populate a `NetworkIndex` bitmap with the ports of all existing allocations and any other allocations already proposed as part of this same evaluation via its `SetAllocs` method. Then we make an "ask" of the `NetworkIndex` in `AssignPorts` for any ports we need and receive an "offer" in return. The offer will include both static ports and any dynamic port assignments. The `AssignPorts` method was written to support group networks, and it shares code that selects dynamic ports with the original `AssignTaskNetwork` code. `AssignTaskNetwork` can request multiple ports from the bitmap at a time. But `AssignPorts` requests them one at a time and does not account for possible collisions, and doesn't return an error in that case. What happens next varies: 1. If the scheduler doesn't place the allocation on that node, the port conflict is thrown away and there's no problem. 2. If the node is picked and this is the only allocation (or last allocation), the plan applier will reject the plan when it calls `SetAllocs`, as we'd expect. 3. If the node is picked and there are additional allocations in the same eval that iterate over the same node, their call to `SetAllocs` will detect the impossible state and the node will be rejected. This can have the puzzling behavior where a second task group for the job without any networking at all can hit a port collision error! It looks like this bug has existed since we implemented group networks, but there are several factors that add up to making the issue rare for many users yet frustratingly frequent for others: * You're more likely to hit this bug the more tightly packed your range for dynamic ports is. With 12000 ports in the range by default, many clusters can avoid this for a long time. * You're more likely to hit case (3) for jobs with lots of allocations or if a scheduler has to iterate over a large number of nodes, such as with system jobs, jobs with `spread` blocks, or (sometimes) jobs using `unique` constraints. For unlucky combinations of these factors, it's possible that case (3) happens repeatedly, preventing scheduling of a given job until a client state change (ex. restarting the agent so all its allocations are rescheduled elsewhere) re-opens the range of dynamic ports available. This changeset: * Fixes the bug by accounting for collisions in dynamic port selection in `AssignPorts`. * Adds test coverage for `AssignPorts`, expands coverage of this case for the deprecated `AssignTaskNetwork`, and tightens the dynamic port range in a scheduler test for spread scheduling to more easily detect this kind of problem in the future. * Adds a `String()` method to `Bitmap` so that any future "screaming" log lines have a human-readable list of used ports.	2023-03-09 10:09:54 -05:00
Proskurin Kirill	f3ecd1db7c	Updated who-uses-nomad to add Behavox (#16339 )	2023-03-08 19:43:12 -05:00
Seth Hoenig	ff4503aac6	client: disable running artifact downloader as nobody (#16375 ) * client: disable running artifact downloader as nobody This PR reverts a change from Nomad 1.5 where artifact downloads were executed as the nobody user on Linux systems. This was done as an attempt to improve the security model of artifact downloading where third party tools such as git or mercurial would be run as the root user with all the security implications thereof. However, doing so conflicts with Nomad's own advice for securing the Client data directory - which when setup with the recommended directory permissions structure prevents artifact downloads from working as intended. Artifact downloads are at least still now executed as a child process of the Nomad agent, and on modern Linux systems make use of the kernel Landlock feature for limiting filesystem access of the child process. * docs: update upgrade guide for 1.5.1 sandboxing * docs: add cl * docs: add title to upgrade guide fix	2023-03-08 15:58:43 -06:00
Seth Hoenig	2b5efeac04	e2e: setup nomad permissions correctly (client vs. server) (#16399 ) This PR configures - server nodes with a systemd unit running the agent as the nomad service user - client nodes with a root owned nomad data directory	2023-03-08 14:41:08 -06:00
Phil Renaud	b0124ee683	[ui] Fix: New toast notifications no longer last forever (#16384 ) * Removes an errant console.log and corrects a default sticky=true on toast notifications * Default so no need to refault	2023-03-08 14:50:18 -05:00
Lance Haig	35c17b2e56	deps: Update ioutil deprecated library references to os and io respectively in the client package (#16318 ) * Update ioutil deprecated library references to os and io respectively * Deal with the errors produced. Add error handling to filEntry info Add error handling to info	2023-03-08 13:25:10 -06:00
Lance Haig	2332d694bb	deps: Update ioutil library references to os and io respectively for drivers package (#16331 ) * Update ioutil library references to os and io respectively for drivers package No user facing changes so I assume no change log is required * Fix failing tests	2023-03-08 10:31:09 -06:00
Lance Haig	ae256e28d8	Update ioutil library references to os and io respectively for API and Plugins package (#16330 ) No user facing changes so I assume no change log is required	2023-03-08 10:25:09 -06:00
Lance Haig	e89c3d3b36	Update ioutil library references to os and io respectively for e2e helper nomad (#16332 ) No user facing changes so I assume no change log is required	2023-03-08 09:39:03 -06:00
Lance Haig	d9e585b965	Update ioutil library references to os and io respectively for command (#16329 ) No user facing changes so I assume no change log is required	2023-03-08 09:20:04 -06:00
dependabot[bot]	de766a4239	build(deps): bump golang.org/x/crypto from 0.5.0 to 0.7.0 (#16337 ) Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.5.0 to 0.7.0. - [Release notes](https://github.com/golang/crypto/releases) - [Commits](https://github.com/golang/crypto/compare/v0.5.0...v0.7.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-08 09:14:49 -06:00
Phil Renaud	54bb97f299	Outage recovery link fix (#16365 )	2023-03-07 15:52:26 -05:00
Seth Hoenig	32f8ca6ce3	e2e: fix permissions on nomad data directory (#16376 ) This PR updates the provisioning step where we create /opt/nomad/data, such that it is with 0700 permissions in line with our security guidance.	2023-03-07 14:41:54 -06:00
Seth Hoenig	835365d2a4	docker: fix bug where network pause containers would be erroneously reconciled (#16352 ) * docker: fix bug where network pause containers would be erroneously gc'd * docker: cl: thread context from driver into pause container restoration	2023-03-07 12:17:32 -06:00
James Rasell	05fff34fc8	docs: add 1.5.0, 1.4.5, and 1.3.10 pause regression upgrade note. (#16358 )	2023-03-07 18:29:03 +01:00

1 2 3 4 5 ...

24412 Commits All Branches Search

24412 Commits

All Branches