Commit graph

24376 commits

Author SHA1 Message Date
Proskurin Kirill f3ecd1db7c
Updated who-uses-nomad to add Behavox (#16339) 2023-03-08 19:43:12 -05:00
Seth Hoenig ff4503aac6
client: disable running artifact downloader as nobody (#16375)
* client: disable running artifact downloader as nobody

This PR reverts a change from Nomad 1.5 where artifact downloads were
executed as the nobody user on Linux systems. This was done as an attempt
to improve the security model of artifact downloading where third party
tools such as git or mercurial would be run as the root user with all
the security implications thereof.

However, doing so conflicts with Nomad's own advice for securing the
Client data directory - which when setup with the recommended directory
permissions structure prevents artifact downloads from working as intended.

Artifact downloads are at least still now executed as a child process of
the Nomad agent, and on modern Linux systems make use of the kernel Landlock
feature for limiting filesystem access of the child process.

* docs: update upgrade guide for 1.5.1 sandboxing

* docs: add cl

* docs: add title to upgrade guide fix
2023-03-08 15:58:43 -06:00
Seth Hoenig 2b5efeac04
e2e: setup nomad permissions correctly (client vs. server) (#16399)
This PR configures

- server nodes with a systemd unit running the agent as the nomad service user
- client nodes with a root owned nomad data directory
2023-03-08 14:41:08 -06:00
Phil Renaud b0124ee683
[ui] Fix: New toast notifications no longer last forever (#16384)
* Removes an errant console.log and corrects a default sticky=true on toast notifications

* Default so no need to refault
2023-03-08 14:50:18 -05:00
Lance Haig 35c17b2e56
deps: Update ioutil deprecated library references to os and io respectively in the client package (#16318)
* Update ioutil deprecated library references to os and io respectively

* Deal with the errors produced.

Add error handling to filEntry info
Add error handling to info
2023-03-08 13:25:10 -06:00
Lance Haig 2332d694bb
deps: Update ioutil library references to os and io respectively for drivers package (#16331)
* Update ioutil library references to os and io respectively for drivers package

No user facing changes so I assume no change log is required

* Fix failing tests
2023-03-08 10:31:09 -06:00
Lance Haig ae256e28d8
Update ioutil library references to os and io respectively for API and Plugins package (#16330)
No user facing changes so I assume no change log is required
2023-03-08 10:25:09 -06:00
Lance Haig e89c3d3b36
Update ioutil library references to os and io respectively for e2e helper nomad (#16332)
No user facing changes so I assume no change log is required
2023-03-08 09:39:03 -06:00
Lance Haig d9e585b965
Update ioutil library references to os and io respectively for command (#16329)
No user facing changes so I assume no change log is required
2023-03-08 09:20:04 -06:00
dependabot[bot] de766a4239
build(deps): bump golang.org/x/crypto from 0.5.0 to 0.7.0 (#16337)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.5.0 to 0.7.0.
- [Release notes](https://github.com/golang/crypto/releases)
- [Commits](https://github.com/golang/crypto/compare/v0.5.0...v0.7.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-08 09:14:49 -06:00
Phil Renaud 54bb97f299
Outage recovery link fix (#16365) 2023-03-07 15:52:26 -05:00
Seth Hoenig 32f8ca6ce3
e2e: fix permissions on nomad data directory (#16376)
This PR updates the provisioning step where we create /opt/nomad/data,
such that it is with 0700 permissions in line with our security guidance.
2023-03-07 14:41:54 -06:00
Seth Hoenig 835365d2a4
docker: fix bug where network pause containers would be erroneously reconciled (#16352)
* docker: fix bug where network pause containers would be erroneously gc'd

* docker: cl: thread context from driver into pause container restoration
2023-03-07 12:17:32 -06:00
James Rasell 05fff34fc8
docs: add 1.5.0, 1.4.5, and 1.3.10 pause regression upgrade note. (#16358) 2023-03-07 18:29:03 +01:00
James Rasell 7507c92139
cli: support json and t on acl binding-rule info command. (#16357) 2023-03-07 18:27:02 +01:00
Tim Gross 966c4b1a2d
docs: note that secrets dir is usually mounted noexec (#16363) 2023-03-07 11:57:15 -05:00
Tim Gross a2ceab3d8c
scheduler: correctly detect inplace update with wildcard datacenters (#16362)
Wildcard datacenters introduced a bug where a job with any wildcard datacenters
will always be treated as a destructive update when we check whether a
datacenter has been removed from the jobspec.

Includes updating the helper so that callers don't have to loop over the job's
datacenters.
2023-03-07 10:05:59 -05:00
Ashlee M Boyer 9af02f3f4a
CI: delete test-link-rewrites.yml (#16354) 2023-03-06 15:41:01 -05:00
Seth Hoenig 1c8b408a81
deps: update test to 0.6.2 for new functions (#16326) 2023-03-06 09:24:45 -06:00
Phil Renaud edf59597d2
[ui] Fix: Wildcard-datacenter system/sysbatch jobs stopped showing client links/chart (#16274)
* Fix for wildcard DC sys/sysbatch jobs

* A few extra modules for wildcard DC in systemish jobs

* doesMatchPattern moved to its own util as match-glob

* DC glob lookup using matchGlob

* PR feedback
2023-03-06 10:06:31 -05:00
Luiz Aoqui 2a1a790820
client: don't emit task shutdown delay event if not waiting (#16281) 2023-03-03 18:22:06 -05:00
Luiz Aoqui 3f1ea9da4b
api: set last index and request time on alloc stop (#16319)
Some of the methods in `Allocations()` incorrectly use the `putQuery`
in API calls where `put` is more appropriate since they are not reading
information back. These methods are also not returning request metadata
such as `LastIndex` back to callers, which can be useful to have in some
scenarios.

They also provide poor developer experience as they take an
`*api.Allocation` struct when only the allocation ID is necessary. This
can lead consumers to make unnecessary API calls to fetch the full
allocation.

Fixing these problems require updating the methods' signatures so they
take `*WriteOptions` instead of `*QueryOptions` and return `*WriteMeta`,
but this is a breaking change that requires advanced notice to consumers.

This commit adds a future breaking change notice and also fixes the
`Stop` method so it properly returns request metadata in a backwards
compatible way.
2023-03-03 15:52:41 -05:00
Tim Gross 3c0eaba9db
remove backcompat support for non-atomic job registration (#16305)
In Nomad 0.12.1 we introduced atomic job registration/deregistration, where the
new eval was written in the same raft entry. Backwards-compatibility checks were
supposed to have been removed in Nomad 1.1.0, but we missed that. This is long
safe to remove.
2023-03-03 15:52:22 -05:00
Luiz Aoqui 40494e64a9
docs: fix alloc stop no_shutdown_delay (#16282) 2023-03-03 14:44:49 -05:00
Luiz Aoqui 1d051d834d
cli: use shared logic for resolving job prefix (#16306)
Several `nomad job` subcommands had duplicate or slightly similar logic
for resolving a job ID from a CLI argument prefix, while others did not
have this functionality at all.

This commit pulls the shared logic to the command Meta and updates all
`nomad job` subcommands to use it.
2023-03-03 14:43:20 -05:00
Tim Gross 8747059b86
service: fix regression in task access to list/read endpoint (#16316)
When native service discovery was added, we used the node secret as the auth
token. Once Workload Identity was added in Nomad 1.4.x we needed to use the
claim token for `template` blocks, and so we allowed valid claims to bypass the
ACL policy check to preserve the existing behavior. (Invalid claims are still
rejected, so this didn't widen any security boundary.)

In reworking authentication for 1.5.0, we unintentionally removed this
bypass. For WIs without a policy attached to their job, everything works as
expected because the resulting `acl.ACL` is nil. But once a policy is attached
to the job the `acl.ACL` is no longer nil and this causes permissions errors.

Fix the regression by adding back the bypass for valid claims. In future work,
we should strongly consider getting turning the implicit policies into real
`ACLPolicy` objects (even if not stored in state) so that we don't have these
kind of brittle exceptions to the auth code.
2023-03-03 11:41:19 -05:00
Dao Thanh Tung 62a69552c1
api: add new test case for force-leave (#16260)
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-03-03 10:38:40 -05:00
Aofei Sheng e81fecdd1f
docs: fix typos in task-api.mdx and workload-identity.mdx (#16309) 2023-03-03 08:37:59 -05:00
Valentino 1f9d11feff
Add namespace argument to the job verification help text (#16243) 2023-03-02 16:42:14 -05:00
Dao Thanh Tung ed31e0a5f5
cli: sort Node value in nomad operator raft list-peers command (#16221)
Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>
2023-03-02 16:16:30 -05:00
Michael Schurter 4b01df1787
Merge pull request #16293 from hashicorp/post-1.5.0-release
admin: Post 1.5.0 release
2023-03-02 12:44:49 -08:00
Phil Renaud 93574ce085
[ui, helios] Toast Component (#16099)
* Template and styles

* @type to @color on flash messages

* Notifications service as wrapper

* Test cases updated for new notifs
2023-03-02 13:52:16 -05:00
Tim Gross 0e1b554299
handle FSM.Apply errors in raftApply (#16287)
The signature of the `raftApply` function requires that the caller unwrap the
first returned value (the response from `FSM.Apply`) to see if it's an
error. This puts the burden on the caller to remember to check two different
places for errors, and we've done so inconsistently.

Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error
as the error value. Similar work was done in Consul in
https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate
and surfaces a few minor bugs in the process:

* job deregistrations of already-GC'd jobs were still emitting evals
* reconcile job summaries does not return scheduler errors
* node updates did not report errors associated with inconsistent service
  discovery or CSI plugin states

Note that although _most_ of the `FSM.Apply` functions return only errors (which
makes it tempting to remove the first return value entirely), there are few that
return `bool` for some reason and Variables relies on the response value for
proper CAS checking.
2023-03-02 13:51:09 -05:00
Tim Gross f3b5952c3e
deps: update go-plugin to 1.4.9 (#16292)
Fixes #16288. An earlier version of `go-plugin` introduced a warning log if
`SecureConfig` is unset. For Nomad and other applications that have "internal"
`go-plugin` consumers where the application runs itself as a plugin, this causes
spurious warn-level logs. For Nomad in particular this means every task driver
and logmon invocation emits the log, which is our primary operation.

The change was reverted upstream, so this changeset picks up the reverted
version.
2023-03-02 13:39:57 -05:00
Farbod Ahmadian 629ac58763
tests: add functionality to skip a test if it's not running in CI and not with root user (#16222) 2023-03-02 13:38:27 -05:00
Tim Gross bb4880ec13
client: use RPC address and not serf after initial Consul discovery (#16217)
Nomad servers can advertise independent IP addresses for `serf` and
`rpc`. Somewhat unexpectedly, the `serf` address is also used for both Serf and
server-to-server RPC communication (including Raft RPC). The address advertised
for `rpc` is only used for client-to-server RPC. This split was introduced
intentionally in Nomad 0.8.

When clients are using Consul discovery for connecting to servers, they get an
initial discovery set from Consul and use the correct `rpc` tag in Consul to get
a list of adddresses for servers. The client then makes a `Status.Peers` RPC to
get the list of those servers that are raft peers. But this endpoint is shared
between servers and clients, and provides the address used for Raft.

Most of the time this is harmless because servers will bind on 0.0.0.0 anyways.,
But in topologies where servers are on a private network and clients are on
separate subnets (or even public subnets), clients will make initial contact
with the server to get the list of peers but then populate their local server
set with unreachable addresses.

Cluster administrators can work around this problem by using `server_join` with
specific IP addresses (or DNS names), because the `Node.UpdateStatus` endpoint
returns the correct set of RPC addresses when updating the node. So once a
client has registered, it will get the correct set of RPC addresses.

This changeset updates the client logic to query `Status.Members` instead of
`Status.Peers`, and then extract the correctly advertised address and port from
the response body.
2023-03-02 13:36:45 -05:00
James Rasell 7eb9e12829
Merge release 1.5.0 files 2023-03-02 17:47:07 +00:00
hc-github-team-nomad-core de3ee79482
Prepare for next release 2023-03-02 17:44:13 +00:00
hc-github-team-nomad-core 646bdf901c
Generate files for 1.5.0 release 2023-03-02 17:44:13 +00:00
James Rasell 19e5e74dee
prepare release 1.5.0 2023-03-02 17:44:13 +00:00
James Rasell b57d37c070
Merge pull request #16284 from hashicorp/post-1.5.0-rc.1-release
admin: post 1.5.0 rc.1 release
2023-03-02 09:31:08 +01:00
hc-github-team-nomad-core 410aad7847
Prepare for next release 2023-03-01 08:09:07 +00:00
hc-github-team-nomad-core 1f2a525341
Generate files for 1.5.0-rc.1 release 2023-03-01 08:09:07 +00:00
Michael Schurter 320c67643c
Prepare release 1.5.0-rc.1 2023-03-01 08:09:05 +00:00
Michael Schurter bd7b60712e
Accept Workload Identities for Client RPCs (#16254)
This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs.

Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled.

---------

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-02-27 10:17:47 -08:00
Daniel Bennett 39e3a1ac3e
build/cli: Add BuildDate (#16216)
* build: add BuildDate to version info

will be used in enterprise to compare to license expiration time

* cli: multi-line version output, add BuildDate

before:
$ nomad version
Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww)

after:
$ nomad version
Nomad v1.5.0-dev
BuildDate 2023-02-17T19:29:26Z
Revision coolfakecommithashomgoshsuchacoolonewoww

compare consul:
$ consul version
Consul v1.14.4
Revision dae670fe
Build Date 2023-01-26T15:47:10Z
Protocol 2 spoken by default, blah blah blah...

and vault:
$ vault version
Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z

* docs: update version command output
2023-02-27 11:27:40 -06:00
Tim Gross 79844048e6
populate Nomad token for task runner update hooks (#16266)
The `TaskUpdateRequest` struct we send to task runner update hooks was not
populating the Nomad token that we get from the task runner (which we do for the
Vault token). This results in task runner hooks like the template hook
overwriting the Nomad token with the zero value for the token. This causes
in-place updates of a task to break templates (but not other uses that rely on
identity but don't currently bother to update it, like the identity hook).
2023-02-27 10:48:13 -05:00
Tim Gross 4c9688271a
CSI: fix potential state store corruptions (#16256)
The `CSIVolume` struct has references to allocations that are "denormalized"; we
don't store them on the `CSIVolume` struct but hydrate them on read. Tests
detecting potential state store corruptions found two locations where we're not
copying the volume before denormalizing:

* When garbage collecting CSI volume claims.
* When checking if it's safe to force-deregister the volume.

There are no known user-visible problems associated with these bugs but both
have the potential of mutating volume claims outside of a FSM transaction. This
changeset also cleans up state mutations in some CSI tests so as to avoid having
working tests cover up potential future bugs.
2023-02-27 08:47:08 -05:00
Michael Schurter 62ac8c561d
agent: only reload HTTP servers that use TLS (#16250)
* agent: only reload HTTP servers that use TLS

* shutdown task api before client and improve names

Fixes #16239
2023-02-23 12:03:44 -08:00
Seth Hoenig 61404b2551
services: Set Nomad's User-Agent by default on HTTP checks for nomad services (#16248) 2023-02-23 08:10:42 -06:00