Explicitly set the `oom_score_adj` value for `exec` and `java` tasks.
We recommend that the Nomad service to have oom_score_adj of a low value
(e.g. -1000) to avoid having nomad agent OOM Killed if the node is
oversubscriped.
However, Nomad's workloads should not inherit Nomad's process, which is
the default behavior.
Fixes#10663
The error output being checked depends on the linux caps supported
by the particular operating system. Fix these test cases to just
check that an error did occur.
Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.
This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.
This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.
Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.
Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
The default Linux Capabilities set enabled by the docker, exec, and
java task drivers includes CAP_NET_RAW (for making ping just work),
which has the side affect of opening an ARP DoS/MiTM attack between
tasks using bridge networking on the same host network.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
This PR disables CAP_NET_RAW for the docker, exec, and java task
drivers. The previous behavior can be restored for docker using the
allow_caps docker plugin configuration option.
A future version of nomad will enable similar configurability for the
exec and java task drivers.
This fixes a bug where Nomad overrides a Dockerfile's STOPSIGNAL with
the default kill_signal (SIGTERM).
This adds a check for kill_signal. If it's not set, it calls
StopContainer instead of Signal, which uses STOPSIGNAL if it's
specified. If both kill_signal and STOPSIGNAL are set, Nomad tries to
stop the container with kill_signal first, before then calling
StopContainer.
Fixes#9989
The error returned from the stdlib's `exec` package is always a message with
the exit code of the exec'd process, not any error message that process might
have given us. This results in opaque failures for the Nomad user. Cast to an
`ExitError` so that we can access the output from stderr.
If the docker engine is running on cgroup-v2 host, then RSS and Max
Usage doesn't get reported.
Using a heauristic here to avoid adding more API calls to the Docker
Engine to infer cgroups version. Also, opted to avoid coordinating stats
collection with fingerprinting, which adds concurrency complexities.
The test assertion that we don't have a delete future remaining races with the
code its testing, because the removal of the image and the removal of the
future are not atomic. Move this assertion into a `WaitForResult` to avoid
test flakes which we're seeing on CI on Windows in particular.
* Fixup uses of `sanity`
* Remove unnecessary comments.
These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.
* Update nomad/fsm_test.go
This PR adds pid_mode and ipc_mode options to the exec and java task
driver config options. By default these will defer to the default_pid_mode
and default_ipc_mode agent plugin options created in #9969. Setting
these values to "host" mode disables isolation for the task. Doing so
is not recommended, but may be necessary to support legacy job configurations.
Closes#9970
This PR adds default_pid_mode and default_ipc_mode options to the exec and java
task drivers. By default these will default to "private" mode, enabling PID and
IPC isolation for tasks. Setting them to "host" mode disables isolation. Doing
so is not recommended, but may be necessary to support legacy job configurations.
Closes#9969
This has to have been unused because the HasPrefix operation is
backwards, meaning a Command.Env that includes PATH= never would have
worked; the default path was always used.
Introduce a new more-block friendly syntax for specifying mounts with a new `mount` block type with the target as label:
```hcl
config {
image = "..."
mount {
type = "..."
target = "target-path"
volume_options { ... }
}
}
```
The main benefit here is that by `mount` being a block, it can nest blocks and avoids the compatibility problems noted in https://github.com/hashicorp/nomad/pull/9634/files#diff-2161d829655a3a36ba2d916023e4eec125b9bd22873493c1c2e5e3f7ba92c691R128-R155 .
The intention is for us to promote this `mount` blocks and quietly deprecate the `mounts` type, while still honoring to preserve compatibility as much as we could.
This addresses the issue in https://github.com/hashicorp/nomad/issues/9604 .
When the Docker driver kills as task, we send a request via the Docker API for
dockerd to fire the signal. We send that signal and then block for the
`kill_timeout` waiting for the container to exit. But if the Docker API
blocks, we will block indefinitely because we haven't configured the API call
with the same timeout.
This changeset is a minimal intervention to add the timeout to the Docker API
call _only_ when we have the `kill_timeout` set. Future work should examine
whether we should be threading contexts through other `go-dockerclient` API
calls.
Use targetted ignore comments for the cases where we are bound by
backward compatibility.
I've left some file based linters, especially when the file is riddled
with linter voilations (e.g. enum names), or if it's a property of the
file (e.g. package and file names).
I encountered an odd behavior related to RPC_REQUEST_RESPONSE_UNIQUE and
RPC_REQUEST_STANDARD_NAME. Apparently, if they target a `stream` type,
we must separate them into separate lines so that the ignore comment
targets the type specifically.
Fix#9210 .
This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.
When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup.
Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task.
Fixes https://github.com/hashicorp/nomad/issues/8565
The default behavior for `docker.volumes.enabled` is intended to be `false`,
but the HCL schema defaults to `true` if the value is unset. Set the default
literal value to `true`.
Additionally, Docker driver mounts of type "volume" (but not "bind") are not
being properly sandboxed with that setting. Disable Docker mounts with type
"volume" entirely whenever the `docker.volumes.enabled` flag is set to
false. Note this is unrelated to the `volume_mount` feature, which is
constrained to preconfigured host volumes or whatever is mounted by a CSI
plugin.
This changeset includes updates to unit tests that should have been failing
under the documented behavior but were not.
Previously, it was required that you `go get github.com/hashicorp/nomad` to be
able to build protos, as the protoc invocation added an include directive that
pointed to `$GOPATH/src`, which is how dependent protos were discovered. As
Nomad now uses Go modules, it won't necessarily be cloned to `$GOPATH`.
(Additionally, if you _had_ go-gotten Nomad at some point, protoc compilation
would have possibly used the _wrong_ protos, as those wouldn't necessarily be
the most up-to-date ones.)
This change modifies the proto files and the `protoc` invocation to handle
discovering dependent protos via protoc plugin modifier statements that are
specific to the protoc plugin being used.
In this change, `make proto` was run to recompile the protos, which results in
changes only to the gzipped `FileDescriptorProto`.
This fixes a bug where pre-0.9 executors fail to recover after an
upgrade.
The bug is that legacyExecutorWrappers didn't get updated with
ExecStreaming function, and thus failed to implement the Executor
function. Sadly, this meant that all recovery attempts fail, as the
runtime check in
b312aacbc9/drivers/shared/executor/utils.go (L103-L110)
.
Dockerhub is going to rate limit unauthenticated pulls.
Use our HashiCorp internal mirror for builds run through CircleCI.
Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
In the Docker driver plugin config for garbage collection, the `image_delay`
field was missing from the default we set if the entire `gc` stanza is
missing. This results in a default of 0s and immediate GC of Docker images.
Expanded docker gc config test fields.
This PR adds a version specific upgrade note about the docker stop
signal behavior. Also adds test for the signal logic in docker driver.
Closes#8932 which was fixed in #8933
My latest Vagrant box contains an empty cgroup name that isn't used for
isolation:
```
$ cat /proc/self/cgroup | grep ::
0::/user.slice/user-1000.slice/session-17.scope
```
Symlinking busybox may fail when the test code and the test temporary
directory live on different volumes/partitions; so we should copy
instead. This situation arises in the Vagrant setup, where the code
repository live on special file sharing volume.
Somewhat unrelated, remove `f.Sync()` invocation from a test copyFile
helper function. Sync is useful only for crash recovery, and isn't
necessary in our test setup. The sync invocation is a significant
overhead as it requires the OS to flush any cached writes to disk.
The 'docker.config.infra_image' would default to an amd64 container.
It is possible to reference the correct image for a platform using
the `runtime.GOARCH` variable, eliminating the need to explicitly set
the `infra_image` on non-amd64 platforms.
Also upgrade to Google's pause container version 3.1 from 3.0, which
includes some enhancements around process management.
Fixes#8926
Pulling large docker containers can take longer than the default
context timeout. Without a way to change this it is very hard for
users to utilise Nomad properly without hacky work arounds.
This change adds an optional pull_timeout config parameter which
gives operators the possibility to account for increase pull times
where needed. The infra docker image also has the option to set a
custom timeout to keep consistency.
* docker: support group allocated ports
* docker: add new ports driver config to specify which group ports are mapped
* docker: update port mapping docs
Fixes#2093
Enable configuring `memory_hard_limit` in the docker config stanza for tasks.
If set, this field will be passed to the container runtime as `--memory`, and
the `memory` configuration from the task resource configuration will be passed
as `--memory_reservation`, creating hard and soft memory limits for tasks using
the docker task driver.