Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs. Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.
Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.
The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.
This is meant as a stop gap measure. Ideally, we'd follow up with at
least two changes:
First, we should optimize behavior when we can such that operators don't
need to disable docker log collection. Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.
Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand. This ensures that the cost of log collection
is paid sparingly.
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
NetIsolationModes and MustInitiateNetwork were left out of the
driver Capabilities when using an external task driver plugin
Signed-off-by: Lucas BEE <pouulet@gmail.com>
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin
shutdown similar to other functions. This ensures that TaskStats returns
`ErrPluginShutdown` when plugin shutdown.
In this commit, we add two driver interfaces for supporting `nomad exec`
invocation:
* A high level `ExecTaskStreamingDriver`, that operates on io reader/writers.
Drivers should prefer using this interface
* A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of
input structs; useful when a driver delegates handling to driver backend (e.g.
across RPC/grpc).
The interfaces are optional for a driver, as `nomad exec` support is opt-in.
Existing drivers continue to compile without exec support, until their
maintainer add such support.
Furthermore, we create protobuf structures to represent exec stream entities:
`ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`. We aim to reuse the
protobuf generated code as much as possible, without translation to avoid
conversion overhead.
`ExecTaskStream` abstract fetching and sending stream entities. It's influenced
by the grpc bi-directional stream interface, to avoid needing any adapter. I
considered using channels, but the asynchronisity and concurrency makes buffer
reuse too complicated, which would put more pressure on GC and slows exec operation.
Fix a case where TotalTicks doesn't get serialized across executor grpc
calls.
Here, I opted to implicit add field, rather than explicitly mark it as a
measured field, because it's a derived field and to preserve 0.8
behavior where total ticks aren't explicitly marked as a measured field.
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.
The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).
This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
As far as I can tell this is the most straightforward and resilient way
to skip error logging on context cancellation with grpc streams. You
cannot compare the error against context.Canceled directly as it is of
type `*status.statusError`. The next best solution I found was:
```go
resp, err := stream.Recv()
if code, ok := err.(interface{ Code() code.Code }); ok {
if code.Code == code.Canceled {
return
}
}
```
However I think checking ctx.Err() directly makes the code much easier
to read and is resilient against grpc API changes.
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends. This number is closer
what Docker reports.
Related to https://github.com/hashicorp/nomad/issues/5165 .
plugins/driver: update driver interface to support streaming stats
client/tr: use streaming stats api
TODO:
* how to handle errors and closed channel during stats streaming
* prevent tight loop if Stats(ctx) returns an error
drivers: update drivers TaskStats RPC to handle streaming results
executor: better error handling in stats rpc
docker: better control and error handling of stats rpc
driver: allow stats to return a recoverable error