Commit Graph

34 Commits

Author SHA1 Message Date
Seth Hoenig 52aaf86f52 raw_exec: make raw exec driver work with cgroups v2
This PR adds support for the raw_exec driver on systems with only cgroups v2.

The raw exec driver is able to use cgroups to manage processes. This happens
only on Linux, when exec_driver is enabled, and the no_cgroups option is not
set. The driver uses the freezer controller to freeze processes of a task,
issue a sigkill, then unfreeze. Previously the implementation assumed cgroups
v1, and now it also supports cgroups v2.

There is a bit of refactoring in this PR, but the fundamental design remains
the same.

Closes #12351 #12348
2022-04-04 16:11:38 -05:00
Seth Hoenig 2631659551 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
Arkadiusz ffb174b596
Fix log streaming missing frames (#11721)
Perform one more read after receiving cancel when streaming file from the allocation API
2022-01-04 14:07:16 -05:00
Michael Schurter fd68bbc342 test: update tests to properly use AllocDir
Also use t.TempDir when possible.
2021-10-19 10:49:07 -07:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Lars Lehtonen 03abe3c890
client: fix test umask (#8987) 2020-09-30 08:09:41 -04:00
Yoan Blanc 225c9c1215 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc 761d014071 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Lars Lehtonen 0d344e8578
client: fix use of T.Fatal inside TestFS_logsImpl_NoFollow() goroutine. 2019-11-25 23:51:28 -08:00
Mahmood Ali 4b2ba62e35 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00
Mahmood Ali e29ee4c400 nomad: defensive check for namespaces in job registration call
In a job registration request, ensure that the request namespace "header" and job
namespace field match.  This should be the case already in prod, as http
handlers ensures that the values match [1].

This mitigates bugs that exploit bugs where we may check a value but act
on another, resulting into bypassing ACL system.

[1] https://github.com/hashicorp/nomad/blob/v0.9.5/command/agent/job_endpoint.go#L415-L418
2019-09-26 17:02:47 -04:00
Preetha Appan 3345ce3ba4
Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
Mahmood Ali d9e59eece0 tests: fix client TestFS_Stream data race
Close is invoked in a different goroutine from test
2019-05-21 09:49:56 -04:00
Mahmood Ali d19ba5bd8e tests: deflake client TestFS_Logs_TaskPending test 2019-01-18 21:26:48 -05:00
Danielle Tomlinson 580b8c5dda client/fs: Skip delete-while-streaming test on win 2019-01-17 18:43:13 +01:00
Danielle Tomlinson 4dbddd0620 client/fs: windows error message for not found 2019-01-17 18:43:13 +01:00
Mahmood Ali c3eaa0f4c8 tests: enable and fix tests requiring mock driver 2019-01-10 10:10:11 -05:00
Mahmood Ali 865419e756 convert all config durations to strings in tests 2018-11-13 10:21:40 -05:00
Mahmood Ali c62ec124c0 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Michael Schurter 21d78be961 tests: explicitly cleanup after clients 2018-10-17 10:06:59 -07:00
Nick Ethier 3183b33d24 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter 951ed17436 client: do not inspect task state to follow logs
"Ask forgiveness, not permission."

Instead of peaking at TaskStates (which are no longer updated on the
AR.Alloc() view of the world) to only read logs for running tasks, just
try to read the logs and improve the error handling if they don't exist.

This should make log streaming less dependent on AR/TR behavior.

Also fixed a race where the log streamer could exit before reading an
error. This caused no logs or errors to be displayed sometimes when an
error occurred.
2018-10-16 16:56:55 -07:00
Alex Dadgar 8504505c0d client uses passed logger and fix fingerprinters 2018-10-16 16:53:30 -07:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar 9bab9edf27 test fixes 2018-06-12 17:45:39 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar f5f43218f5 HTTP and tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 6546b43a17 Client implementation of stream 2018-02-15 13:59:02 -08:00
Alex Dadgar 9a5569678c Client Stat/List impl 2018-02-15 13:59:02 -08:00
Alex Dadgar 8854b35b34 Agent logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 857b0ab6c7 client tests 2018-02-15 13:59:02 -08:00