Commit graph

155 commits

Author SHA1 Message Date
Mahmood Ali 567597e108 Compare to the correct host network setting
In systemd-resolved hosts with no DNS customizations, the docker driver
DNS setting should be compared to /run/systemd/resolve/resolv.conf while
exec/java drivers should be compared to /etc/resolv.conf.

When system-resolved is enabled, /etc/resolv.conf is a stub that points
to 127.0.0.53. Docker avoids this stub because this address isn't
accessible from the container.  The exec/java drivers that don't create
network isolations use the stub though in the default configuration.
2020-10-01 10:23:14 -04:00
Nick Ethier e39574be59
docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Mahmood Ali 2588b3bc98 cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross aa8927abb4
volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Anthony Scalisi 9664c6b270
fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Yoan Blanc 225c9c1215 fixup! vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:48:07 -04:00
Yoan Blanc 761d014071 vendor: explicit use of hashicorp/go-msgpack
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-03-31 09:45:21 -04:00
Chris Baker d6364e13bc
fix typo in comment 2020-03-13 09:09:46 -05:00
Mahmood Ali 88cfe504a0 update grpc
Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.
2020-03-03 08:39:54 -05:00
Mahmood Ali 0b7085ba3a driver: allow disabling log collection
Operators commonly have docker logs aggregated using various tools and
don't need nomad to manage their docker logs.  Worse, Nomad uses a
somewhat heavy docker api call to collect them and it seems to cause
problems when a client runs hundreds of log collections.

Here we add a knob to disable log aggregation completely for nomad.
When log collection is disabled, we avoid running logmon and
docker_logger for the docker tasks in this implementation.

The downside here is once disabled, `nomad logs ...` commands and API
no longer return logs and operators must corrolate alloc-ids with their
aggregated log info.

This is meant as a stop gap measure.  Ideally, we'd follow up with at
least two changes:

First, we should optimize behavior when we can such that operators don't
need to disable docker log collection.  Potentially by reverting to
using pre-0.9 syslog aggregation in linux environments, though with
different trade-offs.

Second, when/if logs are disabled, nomad logs endpoints should lookup
docker logs api on demand.  This ensures that the cost of log collection
is paid sparingly.
2019-12-08 14:15:03 -05:00
Danielle Lancashire 4fbcc668d0
volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Tim Gross d965a15490 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Grégoire Delattre c6ac788258 Fix the ExecTask function in DriverExecTaskNotSupported (#6145)
This fixes the ExecTask definition to match with the DriverPlugin
interface.
2019-08-29 11:36:29 -04:00
Lucas BEE 35a1b72962 Add NetworkIsolation in TaskConfig (#6135)
NetworkIsolation was left out of the task config when using an
external task driver plugin
2019-08-15 13:05:55 -04:00
Lucas BEE 406642f34a Fix missing plugin driver capabilities (#6128)
NetIsolationModes and MustInitiateNetwork were left out of the
driver Capabilities when using an external task driver plugin

Signed-off-by: Lucas BEE <pouulet@gmail.com>
2019-08-14 09:10:10 -04:00
Danielle Lancashire 6ef8d5233e
client: Add volume_hook for mounting volumes 2019-08-12 15:39:08 +02:00
Nick Ethier 971c8c9c2b
Driver networking support
Adds support for passing network isolation config into drivers and
implements support in the rawexec driver as a proof of concept
2019-07-31 01:03:20 -04:00
Nick Ethier 63c5504d56
ar: fix lint errors 2019-07-31 01:03:19 -04:00
Nick Ethier 2d60ef64d9
plugins/driver: make DriverNetworkManager interface optional 2019-07-31 01:03:19 -04:00
Nick Ethier ab84630132
plugin/driver: fix tests and add new dep to vendor 2019-07-31 01:03:17 -04:00
Nick Ethier 548f78ef15
ar: initial driver based network management 2019-07-31 01:03:17 -04:00
Nick Ethier 66c514a388
Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Mahmood Ali 72d81da4e0 Signal plugin shutdown for driver.TaskStats
The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin
shutdown similar to other functions.  This ensures that TaskStats returns
`ErrPluginShutdown` when plugin shutdown.
2019-07-11 13:57:35 +08:00
Chris Baker 9442c26cff docker: DestroyTask was not cleaning up Docker images because it was erroring early due to an attempt to inspect an image that had already been removed 2019-06-03 19:04:27 +00:00
Mahmood Ali 57c18fec4e Add basic drivers conformance tests
Add consolidated testing package to serve as conformance tests for all drivers.
2019-05-09 16:49:08 -04:00
Mahmood Ali 13de875750 implemment streaming exec handling in driver grpc handlers
Also add a helper that converts the adapts the high level interface to the
low-level interface of nomad exec interfaces.
2019-05-09 16:49:08 -04:00
Mahmood Ali f47d3d5f8a add nomad streaming exec core data structures and interfaces
In this commit, we add two driver interfaces for supporting `nomad exec`
invocation:

* A high level `ExecTaskStreamingDriver`, that operates on io reader/writers.
  Drivers should prefer using this interface
* A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of
  input structs; useful when a driver delegates handling to driver backend (e.g.
  across RPC/grpc).

The interfaces are optional for a driver, as `nomad exec` support is opt-in.
Existing drivers continue to compile without exec support, until their
maintainer add such support.

Furthermore, we create protobuf structures to represent exec stream entities:
`ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`.  We aim to reuse the
protobuf generated code as much as possible, without translation to avoid
conversion overhead.

`ExecTaskStream` abstract fetching and sending stream entities.  It's influenced
by the grpc bi-directional stream interface, to avoid needing any adapter.  I
considered using channels, but the asynchronisity and concurrency makes buffer
reuse too complicated, which would put more pressure on GC and slows exec operation.
2019-04-30 14:02:29 -04:00
Mahmood Ali 0c8ee8c404 Simplify proto conversion and handle swap
Convert all cpu and memory usage fields regardless of stated measured
fields, and handle swap fields
2019-03-30 15:18:28 -04:00
Mahmood Ali f4a68f556f deserialize total ticks 2019-03-30 07:14:57 -04:00
Mahmood Ali 9656d79eba Always report TotalTicks when percent is measured
Fix a case where TotalTicks doesn't get serialized across executor grpc
calls.

Here, I opted to implicit add field, rather than explicitly mark it as a
measured field, because it's a derived field and to preserve 0.8
behavior where total ticks aren't explicitly marked as a measured field.
2019-03-29 22:34:28 -04:00
Mahmood Ali b08a2744f8
Merge pull request #5428 from hashicorp/b-dropped-logs-on-task-restart
client/logmon: restart log collection correctly when a task is restarted
2019-03-21 14:02:08 -04:00
Nick Ethier 83936bea3c
logmon: fix logmon handling in driver test harness 2019-03-20 21:14:08 -04:00
Mahmood Ali fb55717b0c
Regenerate Proto files (#5421)
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.

The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).

This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync.  Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
2019-03-14 10:56:27 -04:00
Michael Schurter d74755900e Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Michael Schurter 38821954b7 plugins: squelch context Canceled error logs
As far as I can tell this is the most straightforward and resilient way
to skip error logging on context cancellation with grpc streams. You
cannot compare the error against context.Canceled directly as it is of
type `*status.statusError`. The next best solution I found was:

```go
resp, err := stream.Recv()
if code, ok := err.(interface{ Code() code.Code }); ok {
	if code.Code == code.Canceled {
		return
	}
}
```

However I think checking ctx.Err() directly makes the code much easier
to read and is resilient against grpc API changes.
2019-02-21 15:32:18 -08:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Nick Ethier bb9a8afe9b
executor: fix bug and add tests for incorrect stats timestamp reporting 2019-01-28 21:57:45 -05:00
Mahmood Ali 389e043129 drivers: pass logger through driver plugin client
This fixes a panic whenever driver plugin attempts to log a message.
2019-01-25 09:38:41 -05:00
Nick Ethier c41f96943d
plugins/drivers: change stats interval to duration type in proto 2019-01-24 22:19:18 -05:00
Michael Schurter 32daa7b47b goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter be0bab7c3f move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar b7a65676fe gofmt 2019-01-22 15:43:34 -08:00
Alex Dadgar 6c2782f037 move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Nick Ethier 7da9f5eac7
drivers: regen proto 2019-01-18 18:53:45 -05:00
Nick Ethier e3c6f89b9a
drivers: use consts for task handle version 2019-01-18 18:31:01 -05:00
Nick Ethier 04772aac7c
drivers: fix test 2019-01-18 18:31:01 -05:00
Nick Ethier e5a6fc9271
executor: add pre 0.9 client and wrapper 2019-01-18 18:30:58 -05:00
Mahmood Ali 9909d98bee Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00