open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	aa8927abb4	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Chris Baker	d6364e13bc	fix typo in comment	2020-03-13 09:09:46 -05:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Tim Gross	d965a15490	driver/networking: don't recreate existing network namespaces	2019-09-25 14:58:17 -04:00
Grégoire Delattre	c6ac788258	Fix the ExecTask function in DriverExecTaskNotSupported (#6145 ) This fixes the ExecTask definition to match with the DriverPlugin interface.	2019-08-29 11:36:29 -04:00
Danielle Lancashire	6ef8d5233e	client: Add volume_hook for mounting volumes	2019-08-12 15:39:08 +02:00
Nick Ethier	2d60ef64d9	plugins/driver: make DriverNetworkManager interface optional	2019-07-31 01:03:19 -04:00
Nick Ethier	ab84630132	plugin/driver: fix tests and add new dep to vendor	2019-07-31 01:03:17 -04:00
Nick Ethier	548f78ef15	ar: initial driver based network management	2019-07-31 01:03:17 -04:00
Nick Ethier	66c514a388	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00
Mahmood Ali	f47d3d5f8a	add nomad streaming exec core data structures and interfaces In this commit, we add two driver interfaces for supporting `nomad exec` invocation: * A high level `ExecTaskStreamingDriver`, that operates on io reader/writers. Drivers should prefer using this interface * A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of input structs; useful when a driver delegates handling to driver backend (e.g. across RPC/grpc). The interfaces are optional for a driver, as `nomad exec` support is opt-in. Existing drivers continue to compile without exec support, until their maintainer add such support. Furthermore, we create protobuf structures to represent exec stream entities: `ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`. We aim to reuse the protobuf generated code as much as possible, without translation to avoid conversion overhead. `ExecTaskStream` abstract fetching and sending stream entities. It's influenced by the grpc bi-directional stream interface, to avoid needing any adapter. I considered using channels, but the asynchronisity and concurrency makes buffer reuse too complicated, which would put more pressure on GC and slows exec operation.	2019-04-30 14:02:29 -04:00
Nick Ethier	e3c6f89b9a	drivers: use consts for task handle version	2019-01-18 18:31:01 -05:00
Nick Ethier	9fea54e0dc	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00
Mahmood Ali	64f80343fc	drivers: re-export ResourceUsage structs Re-export the ResourceUsage structs in drivers package to avoid drivers directly depending on the internal client/structs package directly. I attempted moving the structs to drivers, but that caused some import cycles that was a bit hard to disentagle. Alternatively, I added an alias here that's sufficient for our purposes of avoiding external drivers depend on internal packages, while allowing us to restructure packages in future without breaking source compatibility.	2019-01-08 09:11:47 -05:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Mahmood Ali	9369b123de	use drivers.FSIsolation	2019-01-08 09:11:47 -05:00
Danielle Tomlinson	8df20f49f7	drivers: Add internal interface for Shutdown This allows us to correctly terminate internal state during runs of the nomad test suite, e.g closing eventer contexts correctly.	2019-01-08 13:48:49 +01:00
Preetha Appan	2fb2de3cef	Standardize driver health description messages for all drivers	2019-01-06 22:06:38 -06:00
Alex Dadgar	9d34802f7a	Store device envs separately and pass to drivers	2018-12-19 14:23:09 -08:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Alex Dadgar	b653ae2af7	utilities	2018-12-18 15:48:52 -08:00
Danielle Tomlinson	d582ea1d8b	drivers: Create drivers/shared/structs This creates a drivers/shared/structs package and moves the buffer size checks into it.	2018-11-30 10:46:13 +01:00
Preetha	1f526db414	Merge pull request #4919 from hashicorp/f-fingerprint-attribute-type Modify fingerprint interface to use typed attribute struct	2018-11-28 14:18:28 -06:00
Preetha Appan	f89dbcd9cc	modify fingerprint interface to use typed attribute struct	2018-11-28 10:01:03 -06:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Chris Baker	a1fb1f3830	Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names drivers/rkt: fix invalid volumes	2018-11-27 18:55:00 -05:00
Preetha Appan	125869686b	Fix nil dereference in copy method	2018-11-26 15:53:15 -06:00
Chris Baker	9bd4317139	modified TaskConfig to include AllocID use this for volume names in drivers/rkt to address #1150	2018-11-26 18:54:26 +00:00
Nick Ethier	1f3fe02e62	docker: sync access to exit result within a handle	2018-11-20 20:41:32 -05:00
Nick Ethier	4be8a86ef9	plugins/driver: remove NodeResources from task Resources and use PercentTicks field for docker driver	2018-11-19 22:59:17 -05:00
Nick Ethier	ced5d5c445	docker: move recoverable error proto to shared structs	2018-11-19 22:59:16 -05:00
Nick Ethier	69049d37f5	drivers: added NodeResources to drivers.TaskConfig	2018-11-19 22:59:16 -05:00
Nick Ethier	8f8698b3e1	docker: started work on porting docker driver to new plugin framework	2018-11-19 22:59:15 -05:00
Alex Dadgar	693f244cce	Plugin client's handle plugin dying This PR plumbs the plugins done ctx through the base and driver plugin clients (device already had it). Further, it adds generic handling of gRPC stream errors.	2018-11-12 17:09:27 -08:00
Michael Schurter	2bbd88888c	client: first pass at implementing task restoring Task restoring works but dead tasks may be restarted	2018-11-05 12:32:05 -08:00
Preetha Appan	4f4777d6a6	Review comments	2018-10-16 16:56:56 -07:00
Preetha Appan	678072ecd1	RKT driver plugin and unit tests	2018-10-16 16:56:56 -07:00
Nick Ethier	ed3cdaf3d1	plugin/driver: add Copy funcs	2018-10-16 16:56:56 -07:00
Nick Ethier	4a4c7dbbfc	client: begin driver plugin integration client: fingerprint driver plugins	2018-10-16 16:56:56 -07:00
Alex Dadgar	7946a14aa8	Fix lints	2018-10-16 16:56:56 -07:00
Nick Ethier	352c05cdf4	plugin/drivers: plumb in stdout/stderr paths	2018-10-16 16:53:31 -07:00
Nick Ethier	0e3f85222a	driver/raw_exec: port existing raw_exec tests and add some testing utilities	2018-10-16 16:53:31 -07:00
Nick Ethier	bcc5c4a8bd	clientv2: base driver plugin (#4671 ) Driver plugin framework to facilitate development of driver plugins. Implementing plugins only need to implement the DriverPlugin interface. The framework proxies this interface to the go-plugin GRPC interface generated from the driver.proto spec. A testing harness is provided to allow implementing drivers to test the full lifecycle of the driver plugin. An example use: func TestMyDriver(t *testing.T) { harness := NewDriverHarness(t, &MyDiverPlugin{}) // The harness implements the DriverPlugin interface and can be used as such taskHandle, err := harness.StartTask(...) }	2018-10-16 16:53:31 -07:00

45 Commits