Commit Graph

672 Commits

Author SHA1 Message Date
Tim Gross 73d0779858
drivers: set world-readable permissions on copied resolv.conf (#11856)
When we copy the system DNS to a task's `resolv.conf`, we should set
the permissions as world-readable so that unprivileged users within
the task can read it.
2022-01-14 12:25:23 -05:00
Alessandro De Blasis e647549ecf
metrics: added `mapped_file` metric (#11500)
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>
2022-01-10 15:35:19 -05:00
Shishir 65eab35412
Add support for setting pids_limit in docker plugin config. (#11526) 2021-12-21 13:31:34 -05:00
James Rasell 45f4689f9c
chore: fixup inconsistent method receiver names. (#11704) 2021-12-20 11:44:21 +01:00
Tim Gross fc1d4814d9
qemu: add `args_allowlist` to sandbox VM command line inputs
The QEMU driver allows arbitrary command line options, but many of
these options give access to host resources that operators may not
want to expose such as devices. Add an optional allowlist to the
plugin configuration so that operators can limit the resources for
QEMU.
2021-11-19 11:11:52 -05:00
Michael Schurter ef3fc79225
Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir
client: never embed alloc_dir in chroot
2021-11-03 16:48:09 -07:00
Michael Schurter fd68bbc342 test: update tests to properly use AllocDir
Also use t.TempDir when possible.
2021-10-19 10:49:07 -07:00
Michael Schurter 10c3bad652 client: never embed alloc_dir in chroot
Fixes #2522

Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.

A bad agent configuration will look something like this:

```hcl
data_dir = "/tmp/nomad-badagent"

client {
  enabled = true

  chroot_env {
    # Note that the source matches the data_dir
    "/tmp/nomad-badagent" = "/ohno"
    # ...
  }
}
```

Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.

Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.

When running tests as root in a vm without the fix, the following error
occurs:

```
=== RUN   TestAllocDir_SkipAllocDir
    alloc_dir_test.go:520:
                Error Trace:    alloc_dir_test.go:520
                Error:          Received unexpected error:
                                Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
                Test:           TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```

Also removed unused Copy methods on AllocDir and TaskDir structs.

Thanks to @eveld for not letting me forget about this!
2021-10-18 09:22:01 -07:00
Shishir Mahajan d4daef7ebf Add support for --init to docker driver.
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2021-10-15 12:53:25 -07:00
Mahmood Ali d5e136b82b
executor: set CpuWeight in cgroup-v2 (#11287)
Cgroup-v2 uses `cpu.weight` property instead of cpu shares:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files
. And it uses a different range (i.e. `[1, 10000]`) from cpu.shares
(i.e. `[2, 262144]`) to make things more interesting.

Luckily, the libcontainer provides a helper function to perform the
conversion
[`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value).

I have confirmed that docker/libcontainer performs the conversion as
well in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541
, and that CpuShares is ignored by libcontainer in
https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29
.
2021-10-14 08:46:07 -04:00
Mahmood Ali 48aa6e26e9
executor: suppress spurious log messages (#11273)
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
2021-10-06 12:42:35 -04:00
Mahmood Ali 4d90afb425 gofmt all the files
mostly to handle build directives in 1.17.
2021-10-01 10:14:28 -04:00
James Rasell 0e926ef3fd
allow configuration of Docker hostnames in bridge mode (#11173)
Add a new hostname string parameter to the network block which
allows operators to specify the hostname of the network namespace.
Changing this causes a destructive update to the allocation and it
is omitted if empty from API responses. This parameter also supports
interpolation.

In order to have a hostname passed as a configuration param when
creating an allocation network, the CreateNetwork func of the
DriverNetworkManager interface needs to be updated. In order to
minimize the disruption of future changes, rather than add another
string func arg, the function now accepts a request struct along with
the allocID param. The struct has the hostname as a field.

The in-tree implementations of DriverNetworkManager.CreateNetwork
have been modified to account for the function signature change.
In updating for the change, the enhancement of adding hostnames to
network namespaces has also been added to the Docker driver, whilst
the default Linux manager does not current implement it.
2021-09-16 08:13:09 +02:00
James Rasell b6813f1221
chore: fix incorrect docstring formatting. 2021-08-30 11:08:12 +02:00
Timothé Perez ce877bdf7c fix: load token in docker auth config 2021-07-22 22:27:29 +02:00
Tim Gross db96e40f3a
docker: move host path for hosts file mount to alloc dir (#10823)
In Nomad 1.1.1 we generate a hosts file based on the Nomad-owned network
namespace, rather than using the default hosts file from the pause
container. This hosts file should be shared between tasks in the same
allocation so that tasks can update the file and have the results propagated
between tasks.
2021-06-30 11:10:04 -04:00
Tim Gross 7bd61bbf43
docker: generate /etc/hosts file for bridge network mode (#10766)
When `network.mode = "bridge"`, we create a pause container in Docker with no
networking so that we have a process to hold the network namespace we create
in Nomad. The default `/etc/hosts` file of that pause container is then used
for all the Docker tasks that share that network namespace. Some applications
rely on this file being populated.

This changeset generates a `/etc/hosts` file and bind-mounts it to the
container when Nomad owns the network, so that the container's hostname has an
IP in the file as expected. The hosts file will include the entries added by
the Docker driver's `extra_hosts` field.

In this changeset, only the Docker task driver will take advantage of this
option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts`
file and this can't be changed without breaking backwards compatibility. But
the fields are available in the task driver protobuf for community task
drivers to use if they'd like.
2021-06-16 14:55:22 -04:00
Seth Hoenig 8f493cfa89 client/fingerprint/java: improve java version string regex matching
This PR improves the regular expression used for matching the java
version string, which varies a lot depending on the java vendor and
version.

These are the example strings we now test for:

java version "1.7.0_80"
openjdk version "11.0.1" 2018-10-16
openjdk version "11.0.1" 2018-10-16
java version "1.6.0_36"
openjdk version "1.8.0_192"
openjdk 11.0.11 2021-04-20 LTS

The last one is a new test added on behalf of #6081, which is
still broken on today's CentOS 7 default JDK package.

openjdk 11.0.11 2021-04-20 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing)

==> Evaluation "21c6caf7" finished with status "complete" but failed to place all allocations:
    Task Group "example" (failed to place 1 allocation):
      * Constraint "${driver.java.version} >= 11.0.0": 1 nodes excluded by filter
    Evaluation "2b737d48" waiting for additional capacity to place remainder

Fixes #6081
2021-06-15 14:15:01 -05:00
James Rasell 939b23936a
Merge pull request #10744 from hashicorp/b-remove-duplicate-imports
chore: remove duplicate import statements
2021-06-11 16:42:34 +02:00
James Rasell 050b5408c7
drivers: remove duplicate import statements. 2021-06-11 09:38:09 +02:00
Mahmood Ali 0976af471c
driver/docker: ignore cpuset errors for short-lived tasks follow up (#10730)
minor refactor and changelog
2021-06-09 11:00:39 -04:00
Mahmood Ali c2026dfa28
Merge pull request #10416 from hashicorp/b-cores-docker
driver/docker: ignore error if container exists before cgroup can be written
2021-06-09 10:34:02 -04:00
Mahmood Ali 0ac126fa78
drivers/exec: Don't inherit Nomad oom_score_adj value (#10698)
Explicitly set the `oom_score_adj` value for `exec` and `java` tasks.

We recommend that the Nomad service to have oom_score_adj of a low value
(e.g. -1000) to avoid having nomad agent OOM Killed if the node is
oversubscriped.

However, Nomad's workloads should not inherit Nomad's process, which is
the default behavior.

Fixes #10663
2021-06-03 14:15:50 -04:00
Seth Hoenig fe9258b754 drivers/exec: pass capabilities through executor RPC
Add capabilities to the LaunchRequest proto so that the
capabilities set actually gets plumbed all the way through
to task launch.
2021-05-17 12:37:40 -06:00
Seth Hoenig e365652e81 drivers: fixup linux version dependent test cases
The error output being checked depends on the linux caps supported
by the particular operating system. Fix these test cases to just
check that an error did occur.
2021-05-17 12:37:40 -06:00
Seth Hoenig f64baec276 docs: update docs for linux capabilities in exec/java/docker drivers
Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.
2021-05-17 12:37:40 -06:00
Seth Hoenig 87c96eed11 drivers/docker: reuse capabilities plumbing in docker driver
This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.
2021-05-17 12:37:40 -06:00
Seth Hoenig 2361a91938 drivers/java: enable setting allow_caps on java driver
Enable setting allow_caps on the java task driver plugin, along
with the associated cap_add and cap_drop options in java task
configuration.
2021-05-17 12:37:40 -06:00
Seth Hoenig 5b8a32f23d drivers/exec: enable setting allow_caps on exec driver
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
2021-05-17 12:37:40 -06:00
Seth Hoenig 1e75f99839 drivers/docker+exec+java: disable net_raw capability by default
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.

A future version of nomad will enable similar configurability for the
exec and java task drivers.
2021-05-12 13:22:09 -07:00
Isabel Suchanek ed9e12cdc7
Clean up docker driver test to make it less flaky (#10559)
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2021-05-10 14:58:19 -07:00
Isabel Suchanek b5a2f48c78 Fix test panic in docker driver test 2021-05-07 12:12:33 -07:00
Isabel Suchanek cb4fc53353 drivers/docker: add support for STOPSIGNAL
This fixes a bug where Nomad overrides a Dockerfile's STOPSIGNAL with
the default kill_signal (SIGTERM).

This adds a check for kill_signal. If it's not set, it calls
StopContainer instead of Signal, which uses STOPSIGNAL if it's
specified. If both kill_signal and STOPSIGNAL are set, Nomad tries to
stop the container with kill_signal first, before then calling
StopContainer.

Fixes #9989
2021-05-05 10:27:58 -07:00
Tim Gross cf838f49e1 docker: improve error message for auth helper
The error returned from the stdlib's `exec` package is always a message with
the exit code of the exec'd process, not any error message that process might
have given us. This results in opaque failures for the Nomad user. Cast to an
`ExitError` so that we can access the output from stderr.
2021-05-03 11:30:12 -04:00
Nick Ethier 9d194bb2d9 driver/docker: ignore error if container exists before cgroup can be written 2021-04-19 23:38:35 -04:00
Nick Ethier b34db8b3b6 nit: code cleanup/organization 2021-04-16 15:14:29 -04:00
Nick Ethier 07dca26f0d qemu: set the number of cores equal to the number of reserved cores if set 2021-04-15 13:32:33 -04:00
Nick Ethier c9216ba7d9 drivers/docker: move cgroups logic to linux build file 2021-04-15 10:39:11 -04:00
Nick Ethier 390c4c5119 docker: add support for cpuset cgroup management 2021-04-15 10:24:31 -04:00
Nick Ethier fe283c5a8f executor: add support for cpuset cgroup 2021-04-15 10:24:31 -04:00
Nick Ethier b6b74a98a9 client/fingerprint: move existing cgroup concerns to cgutil 2021-04-13 13:28:36 -04:00
Yoan Blanc ac0d5d8bd3
chore: bump golangci-lint from v1.24 to v1.39
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2021-04-03 09:50:23 +02:00
Mahmood Ali 9ff7220588 reuse existing function and typo fix 2021-04-02 11:56:27 -04:00
Mahmood Ali 565496e6ba drivers/docker: account for cgroup-v2 memory stats
If the docker engine is running on cgroup-v2 host, then RSS and Max
Usage doesn't get reported.

Using a heauristic here to avoid adding more API calls to the Docker
Engine to infer cgroups version. Also, opted to avoid coordinating stats
collection with fingerprinting, which adds concurrency complexities.
2021-04-01 12:23:57 -04:00
Mahmood Ali edec658e50 drivers/exec: Account for cgroup-v2 memory stats
If the host is running with cgroup-v2, RSS and Max Usage doesn't get
reported anymore.
2021-04-01 12:13:21 -04:00
Tim Gross e76eeeb848 drivers/docker: fix flaky image coordinator test
The test assertion that we don't have a delete future remaining races with the
code its testing, because the removal of the image and the removal of the
future are not atomic. Move this assertion into a `WaitForResult` to avoid
test flakes which we're seeing on CI on Windows in particular.
2021-03-31 15:59:01 -04:00
zhsj 5a182e1d03
deps: update runc to v1.0.0-rc93
includes updates for breaking changes in runc v1.0.0-rc93
2021-03-31 10:57:02 -04:00
Mahmood Ali bf1c0dcf17 driver/exec: set soft memory limit
Linux offers soft memory limit:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html#soft-limits
, and
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html?highlight=memory.low
.

We can set soft memory limits through libcontainer
`Resources.MemoryReservation`: https://pkg.go.dev/github.com/opencontainers/runc@v0.1.1/libcontainer/configs#Resources
2021-03-30 16:55:58 -04:00
Mahmood Ali f44a04454d oversubscription: driver/exec to honor MemoryMaxMB 2021-03-30 16:55:58 -04:00
Mahmood Ali 275feb5bec oversubscription: docker to honor MemoryMaxMB values 2021-03-30 16:55:58 -04:00