This fixes a bug where docker images may not be GCed. The cause of the
bug is that we track the task using `task.ID+task.Name` on task start
but remove on plain `task.ID`.
This haromize the two paths by using `task.ID`, as it's unique enough
and it's also used in the `loadImage` path (path when loading an image
from a local tarball instead of dockerhub).
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one. If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.
I'm not certain how we get into the state, except for being very
unlucky. I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).
If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
Looks like the latest `github.com/docker/docker/registry.ResolveAuthConfig` expect
`github.com/docker/docker/api/types.AuthConfig` rather than
`github.com/docker/cli/cli/config/types.AuthConfig`. The two types are
identical but live in different packages.
Here, we embed `registry.ResolveAuthConfig` from upstream repo, but with
the signature we need.
* Making pull activity timeout configurable in Docker plugin config, first pass
* Fixing broken function call
* Fixing broken tests
* Fixing linter suggestion
* Adding documentation on new parameter in Docker plugin config
* Adding unit test
* Setting min value for pull_activity_timeout, making pull activity duration a private var
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.
Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.
This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs. Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.
Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.
The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.
This is meant as a stop gap measure. Ideally, we'd follow up with at
least two changes:
First, we should optimize behavior when we can such that operators don't
need to disable docker log collection. Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.
Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand. This ensures that the cost of log collection
is paid sparingly.
When a job has a task group network, this log line ends up being
misleading if you're trying to debug networking issues. We really only
care about this when there's no port map set, in which case we get the
error returned anyways.
driver.SetConfig is not appropriate for starting up reconciler
goroutine. Some ephemeral driver instances are created for validating
config and we ought not to side-effecting goroutines for those.
We currently lack a lifecycle hook to inject these, so I picked the
`Fingerprinter` function for now, and reconciler should only run after
fingerprinter started.
Use `sync.Once` to ensure that we only start reconciler loop once.
When running at scale, it's possible that Docker Engine starts
containers successfully but gets wedged in a way where API call fails.
The Docker Engine may remain unavailable for arbitrary long time.
Here, we introduce a periodic reconcilation process that ensures that any
container started by nomad is tracked, and killed if is running
unexpectedly.
Basically, the periodic job inspects any container that isn't tracked in
its handlers. A creation grace period is used to prevent killing newly
created containers that aren't registered yet.
Also, we aim to avoid killing unrelated containters started by host or
through raw_exec drivers. The logic is to pattern against containers
environment variables and mounts to infer if they are an alloc docker
container.
Lastly, the periodic job can be disabled to avoid any interference if
need be.
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
The docker creation API calls may fail with http errors (e.g. timeout)
even if container was successfully created.
Here, we force remove container if we got unexpected failure. We
already do this in some error handlers, and this commit updates all
paths.
I stopped short from a more aggressive refactoring, as the code is ripe
for refactoring and would rather do that in another PR.
This handles a bug where we may start a container successfully, yet we
fail due to retries and startContainer not being idempotent call.
Here, we ensure that when starting a container fails with 500 error,
the retry succeeds if container was started successfully.
hclspec.NewLiteral does not quote its values, which caused `3m` to be
parsed as a nonsensical literal which broke the plugin loader during
initialization. By quoting the value here, it starts correctly.
Using the API as provided from the `mounts` package imposes validation
on the `src:dest` which shouldn't be performed at this time. To workaround
that lift the internal code from that library required to only perform
the split.
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework. Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.
This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin. This is frequently overlooked, resulting in
puzzlement.
The more significant issue with `executor` in particular is in relation
to external driver:
* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.
* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).
This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2]. `init()`
will be invoked and handled properly in tests and external drivers.
For external drivers, this change will cause external drivers to launch
the executor that's compiled against.
There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers. Must avoid having
command execution rely on any other init in the package. I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.
[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
Support Docker `volumes` field in Windows. Previously, volumes parser
assumed some Unix-ism (e.g. didn't expect `:` in mount paths).
Here, we use the Docker parser to identify host and container paths.
Docker parsers use different validation logic from our previous unix
implementation: Docker parser accepts single path as a volume entry
(parsing it as a container path with auto-created volume) and enforces
additional checks (e.g. validity of mode). Thereforce, I opted to use
Docker parser only for Windows, and keep Nomad's linux parser to
preserve current behavior.
In Nomad 0.9, we made volume driver handling the same for `""`, and
`"local"` volumes. Prior to Nomad 0.9 however these had slightly different
behaviour for relative paths and named volumes.
Prior to 0.9 the empty string would expand relative paths within the task
dir, and `"local"` volumes that are not absolute paths would be treated
as docker named volumes.
This commit reverts to the previous behaviour as follows:
| Nomad Version | Driver | Volume Spec | Behaviour |
|-------------------------------------------------------------------------
| all | "" | testing:/testing | allocdir/testing |
| 0.8.7 | "local" | testing:/testing | "testing" as named volume |
| 0.9.0 | "local" | testing:/testing | allocdir/testing |
| 0.9.1 | "local" | testing:/testing | "testing" as named volume |
nvidia library use of dynamic library seems to conflict with alpine and
musl based OSes. This adds a `nonvidia` tag to allow compiling nomad
for alpine images.
The nomad releases currently only support glibc based OS environments,
so we default to compiling with nvidia.
Fix AppVeyor failing builds, by moving docker image url test to run on unix
systems only. The used paused image is a linux image only, not
available on Windows.
destCh was being written to by one goroutine and closed by another
goroutine. This panic occurred in Travis:
```
=== FAIL: drivers/docker TestDockerCoordinator_ConcurrentPulls (117.66s)
=== PAUSE TestDockerCoordinator_ConcurrentPulls
=== CONT TestDockerCoordinator_ConcurrentPulls
panic: send on closed channel
goroutine 5358 [running]:
github.com/hashicorp/nomad/drivers/docker.dockerStatsCollector(0xc0003a4a20, 0xc0003a49c0, 0x3b9aca00)
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats.go:108 +0x167
created by
github.com/hashicorp/nomad/drivers/docker.TestDriver_DockerStatsCollector
/home/travis/gopath/src/github.com/hashicorp/nomad/drivers/docker/stats_test.go:33 +0x1ab
```
The 2 ways to fix this kind of error are to either (1) add extra
coordination around multiple goroutines writing to a chan or (2) make it
so only one goroutines writes to a chan.
I implemented (2) first as it's simpler, but @notnoop pointed out since
the same destCh in reused in the stats loop there's now a double close
panic possible!
So this implements (1) by adding a *usageSender struct for handling
concurrent senders and closing.
Fix#5418
When using a docker logger that doesn't support log streaming through
API, currently docker logger runs a tight loop of Docker API calls
unexpectedly. This change ensures we stop fetching logs early.
Also, this adds some basic backoff strategy when Docker API logging
fails unexpectedly, to avoid accidentally DoSing the docker daemon.
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.
The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).
This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
This commit causes the docker driver to return undetected before it
first establishes a connection to the docker daemon.
This fixes a bug where hosts without docker installed would return as
unhealthy, rather than undetected.
Currently if a docker_logger cannot be reattached to, we will leak the
container that was being used. This is problematic if e.g using static
ports as it means you can never recover your task, or if a service is
expensive to run and will then be running without supervision.
Sometimes the nomad docker_logger may be killed by a service manager
when restarting the client for upgrades or reliability reasons.
Currently if this happens, we leak the users container and try to
reschedule over it.
This commit adds a new step to the recovery process that will spawn a
new docker logger process that will fetch logs from _the current
timestamp_. This is to avoid restarting users tasks because our logging
sidecar has failed.
This commit adds some extra resiliency to the docker logger in the case
of API failure from the docker daemon, by restarting the stream from the
current point in time if the stream returns and the container is still
running.
This ensures that `port_map` along with other block like attribute
declarations (e.g. ulimit, labels, etc) can handle various hcl and json
syntax that was supported in 0.8.
In 0.8.7, the following declarations are effectively equivalent:
```
// hcl block
port_map {
http = 80
https = 443
}
// hcl assignment
port_map = {
http = 80
https = 443
}
// json single element array of map (default in API response)
{"port_map": [{"http": 80, "https": 443}]}
// json array of individual maps (supported accidentally iiuc)
{"port_map: [{"http": 80}, {"https": 443}]}
```
We achieve compatbility by using `NewAttr("...", "list(map(string))",
false)` to be serialized to a `map[string]string` wrapper, instead of using
`BlockAttrs` declaration. The wrapper merges the list of maps
automatically, to ease driver development.
This approach is closer to how v0.8.7 implemented the fields [1][2], and
despite its verbosity, seems to perserve 0.8.7 behavior in hcl2.
This is only required for built-in types that have backward
compatibility constraints. External drivers should use `BlockAttrs`
instead, as they see fit.
[1] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L216
[2] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L698-L700
Windows Docker daemon does not support SIGINT, SIGTERM is the semantic
equivalent that allows for graceful shutdown before being followed up by
a SIGKILL.