open-nomad

Commit Graph

Author	SHA1	Message	Date
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Mahmood Ali	567597e108	Compare to the correct host network setting In systemd-resolved hosts with no DNS customizations, the docker driver DNS setting should be compared to /run/systemd/resolve/resolv.conf while exec/java drivers should be compared to /etc/resolv.conf. When system-resolved is enabled, /etc/resolv.conf is a stub that points to 127.0.0.53. Docker avoids this stub because this address isn't accessible from the container. The exec/java drivers that don't create network isolations use the stub though in the default configuration.	2020-10-01 10:23:14 -04:00
Nick Ethier	e39574be59	docker: support group allocated ports and host_networks (#8623 ) * docker: support group allocated ports * docker: add new ports driver config to specify which group ports are mapped * docker: update port mapping docs	2020-08-11 18:30:22 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	aa8927abb4	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Chris Baker	d6364e13bc	fix typo in comment	2020-03-13 09:09:46 -05:00
Mahmood Ali	88cfe504a0	update grpc Upgrade grpc to v1.27.1 and protobuf plugins to v1.3.4.	2020-03-03 08:39:54 -05:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Tim Gross	d965a15490	driver/networking: don't recreate existing network namespaces	2019-09-25 14:58:17 -04:00
Grégoire Delattre	c6ac788258	Fix the ExecTask function in DriverExecTaskNotSupported (#6145 ) This fixes the ExecTask definition to match with the DriverPlugin interface.	2019-08-29 11:36:29 -04:00
Lucas BEE	35a1b72962	Add NetworkIsolation in TaskConfig (#6135 ) NetworkIsolation was left out of the task config when using an external task driver plugin	2019-08-15 13:05:55 -04:00
Lucas BEE	406642f34a	Fix missing plugin driver capabilities (#6128 ) NetIsolationModes and MustInitiateNetwork were left out of the driver Capabilities when using an external task driver plugin Signed-off-by: Lucas BEE <pouulet@gmail.com>	2019-08-14 09:10:10 -04:00
Danielle Lancashire	6ef8d5233e	client: Add volume_hook for mounting volumes	2019-08-12 15:39:08 +02:00
Nick Ethier	971c8c9c2b	Driver networking support Adds support for passing network isolation config into drivers and implements support in the rawexec driver as a proof of concept	2019-07-31 01:03:20 -04:00
Nick Ethier	63c5504d56	ar: fix lint errors	2019-07-31 01:03:19 -04:00
Nick Ethier	2d60ef64d9	plugins/driver: make DriverNetworkManager interface optional	2019-07-31 01:03:19 -04:00
Nick Ethier	ab84630132	plugin/driver: fix tests and add new dep to vendor	2019-07-31 01:03:17 -04:00
Nick Ethier	548f78ef15	ar: initial driver based network management	2019-07-31 01:03:17 -04:00
Nick Ethier	66c514a388	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00
Mahmood Ali	72d81da4e0	Signal plugin shutdown for driver.TaskStats The driver plugin stub client must call `grpcutils.HandleGrpcErr` to handle plugin shutdown similar to other functions. This ensures that TaskStats returns `ErrPluginShutdown` when plugin shutdown.	2019-07-11 13:57:35 +08:00
Chris Baker	9442c26cff	docker: DestroyTask was not cleaning up Docker images because it was erroring early due to an attempt to inspect an image that had already been removed	2019-06-03 19:04:27 +00:00
Mahmood Ali	57c18fec4e	Add basic drivers conformance tests Add consolidated testing package to serve as conformance tests for all drivers.	2019-05-09 16:49:08 -04:00
Mahmood Ali	13de875750	implemment streaming exec handling in driver grpc handlers Also add a helper that converts the adapts the high level interface to the low-level interface of nomad exec interfaces.	2019-05-09 16:49:08 -04:00
Mahmood Ali	f47d3d5f8a	add nomad streaming exec core data structures and interfaces In this commit, we add two driver interfaces for supporting `nomad exec` invocation: * A high level `ExecTaskStreamingDriver`, that operates on io reader/writers. Drivers should prefer using this interface * A low level `ExecTaskStreamingRawDriver` that operates on the raw stream of input structs; useful when a driver delegates handling to driver backend (e.g. across RPC/grpc). The interfaces are optional for a driver, as `nomad exec` support is opt-in. Existing drivers continue to compile without exec support, until their maintainer add such support. Furthermore, we create protobuf structures to represent exec stream entities: `ExecTaskStreamingRequest` and `ExecTaskStreamingResponse`. We aim to reuse the protobuf generated code as much as possible, without translation to avoid conversion overhead. `ExecTaskStream` abstract fetching and sending stream entities. It's influenced by the grpc bi-directional stream interface, to avoid needing any adapter. I considered using channels, but the asynchronisity and concurrency makes buffer reuse too complicated, which would put more pressure on GC and slows exec operation.	2019-04-30 14:02:29 -04:00
Mahmood Ali	0c8ee8c404	Simplify proto conversion and handle swap Convert all cpu and memory usage fields regardless of stated measured fields, and handle swap fields	2019-03-30 15:18:28 -04:00
Mahmood Ali	f4a68f556f	deserialize total ticks	2019-03-30 07:14:57 -04:00
Mahmood Ali	9656d79eba	Always report TotalTicks when percent is measured Fix a case where TotalTicks doesn't get serialized across executor grpc calls. Here, I opted to implicit add field, rather than explicitly mark it as a measured field, because it's a derived field and to preserve 0.8 behavior where total ticks aren't explicitly marked as a measured field.	2019-03-29 22:34:28 -04:00
Mahmood Ali	b08a2744f8	Merge pull request #5428 from hashicorp/b-dropped-logs-on-task-restart client/logmon: restart log collection correctly when a task is restarted	2019-03-21 14:02:08 -04:00
Nick Ethier	83936bea3c	logmon: fix logmon handling in driver test harness	2019-03-20 21:14:08 -04:00
Mahmood Ali	fb55717b0c	Regenerate Proto files (#5421 ) Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin. The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](`ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)`), and [0.9.0-beta3 preperation](`b849d84f2f`). This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085	2019-03-14 10:56:27 -04:00
Michael Schurter	d74755900e	Generate files for 0.9.0-beta3 release	2019-02-26 09:44:49 -08:00
Michael Schurter	38821954b7	plugins: squelch context Canceled error logs As far as I can tell this is the most straightforward and resilient way to skip error logging on context cancellation with grpc streams. You cannot compare the error against context.Canceled directly as it is of type `*status.statusError`. The next best solution I found was: ```go resp, err := stream.Recv() if code, ok := err.(interface{ Code() code.Code }); ok { if code.Code == code.Canceled { return } } ``` However I think checking ctx.Err() directly makes the code much easier to read and is resilient against grpc API changes.	2019-02-21 15:32:18 -08:00
Alex Dadgar	bc804dda2e	Nomad 0.9.0-beta1 generated code	2019-01-30 10:49:44 -08:00
Nick Ethier	bb9a8afe9b	executor: fix bug and add tests for incorrect stats timestamp reporting	2019-01-28 21:57:45 -05:00
Mahmood Ali	389e043129	drivers: pass logger through driver plugin client This fixes a panic whenever driver plugin attempts to log a message.	2019-01-25 09:38:41 -05:00
Nick Ethier	c41f96943d	plugins/drivers: change stats interval to duration type in proto	2019-01-24 22:19:18 -05:00
Michael Schurter	32daa7b47b	goimports until make check is happy	2019-01-23 06:27:14 -08:00
Michael Schurter	be0bab7c3f	move pluginutils -> helper/pluginutils I wanted a different color bikeshed, so I get to paint it	2019-01-22 15:50:08 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Alex Dadgar	b7a65676fe	gofmt	2019-01-22 15:43:34 -08:00
Alex Dadgar	6c2782f037	move catalog + grpcutils	2019-01-22 15:11:57 -08:00
Nick Ethier	7da9f5eac7	drivers: regen proto	2019-01-18 18:53:45 -05:00
Nick Ethier	e3c6f89b9a	drivers: use consts for task handle version	2019-01-18 18:31:01 -05:00
Nick Ethier	04772aac7c	drivers: fix test	2019-01-18 18:31:01 -05:00
Nick Ethier	e5a6fc9271	executor: add pre 0.9 client and wrapper	2019-01-18 18:30:58 -05:00
Mahmood Ali	9909d98bee	Track Basic Memory Usage as reported by cgroups Track current memory usage, `memory.usage_in_bytes`, in addition to `memory.max_memory_usage_in_bytes` and friends. This number is closer what Docker reports. Related to https://github.com/hashicorp/nomad/issues/5165 .	2019-01-14 18:47:52 -05:00
Nick Ethier	3b395d7100	drivers: plumb grpc client logger	2019-01-12 12:18:23 -05:00
Nick Ethier	7e306afde3	executor: fix failing stats related test	2019-01-12 12:18:23 -05:00
Nick Ethier	b0d9440474	docker: add test for stats collection	2019-01-12 12:18:22 -05:00
Nick Ethier	9fea54e0dc	executor: implement streaming stats API plugins/driver: update driver interface to support streaming stats client/tr: use streaming stats api TODO: * how to handle errors and closed channel during stats streaming * prevent tight loop if Stats(ctx) returns an error drivers: update drivers TaskStats RPC to handle streaming results executor: better error handling in stats rpc docker: better control and error handling of stats rpc driver: allow stats to return a recoverable error	2019-01-12 12:18:22 -05:00
Mahmood Ali	90f3cea187	Merge pull request #5157 from hashicorp/r-drivers-no-cstructs drivers: avoid referencing client/structs package	2019-01-09 13:06:46 -05:00
Mahmood Ali	f679975956	fixup! remove unused field	2019-01-08 12:58:12 -05:00
Mahmood Ali	f015b88ea7	remove unused field	2019-01-08 12:19:44 -05:00
Mahmood Ali	426c981c34	Remove some dead code	2019-01-08 09:11:48 -05:00
Mahmood Ali	64f80343fc	drivers: re-export ResourceUsage structs Re-export the ResourceUsage structs in drivers package to avoid drivers directly depending on the internal client/structs package directly. I attempted moving the structs to drivers, but that caused some import cycles that was a bit hard to disentagle. Alternatively, I added an alias here that's sufficient for our purposes of avoiding external drivers depend on internal packages, while allowing us to restructure packages in future without breaking source compatibility.	2019-01-08 09:11:47 -05:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Mahmood Ali	9369b123de	use drivers.FSIsolation	2019-01-08 09:11:47 -05:00
Danielle Tomlinson	8df20f49f7	drivers: Add internal interface for Shutdown This allows us to correctly terminate internal state during runs of the nomad test suite, e.g closing eventer contexts correctly.	2019-01-08 13:48:49 +01:00
Alex Dadgar	fb5dc9058e	regenerate protos	2019-01-07 14:49:40 -08:00
Alex Dadgar	c9825a9c36	recover	2019-01-07 14:49:40 -08:00
Alex Dadgar	a6b36df4de	remove nil logger	2019-01-07 14:48:01 -08:00
Preetha Appan	2fb2de3cef	Standardize driver health description messages for all drivers	2019-01-06 22:06:38 -06:00
Danielle Tomlinson	43f2dc0c36	chore: Fix environement->environment typo	2019-01-03 13:31:26 +01:00
Danielle Tomlinson	45174ac3e9	Merge pull request #5041 from hashicorp/dani/b-driver-healt drivers: Cleanup root user fingerprinting	2019-01-03 13:16:28 +01:00
Danielle Tomlinson	63b5e1a9e9	plugins: Add consistent message for requires root	2018-12-20 12:54:01 +01:00
Alex Dadgar	9d34802f7a	Store device envs separately and pass to drivers	2018-12-19 14:23:09 -08:00
Alex Dadgar	fff09162aa	proto	2018-12-19 13:54:19 -08:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Alex Dadgar	4c57d2ec4d	Add plugin API versioning to plugin loader and plugins	2018-12-18 16:48:00 -08:00
Alex Dadgar	b653ae2af7	utilities	2018-12-18 15:48:52 -08:00
Alex Dadgar	b9ee03b2c1	protos	2018-12-18 15:48:52 -08:00
Nick Ethier	0c50a51c19	executor: encode mounts and devices correctly when using grpc	2018-12-15 00:08:23 -05:00
Nick Ethier	09dadf0a23	Merge branch 'master' into f-grpc-executor * master: (71 commits) Fix output of 'nomad deployment fail' with no arg Always create a running allocation when testing task state tests: ensure exec tests pass valid task resources (#4992) some changes for more idiomatic code fix iops related tests fixed bug in loop delay gofmt improved code for readability client: updateAlloc release lock after read fixup! device attributes in `nomad node status -verbose` drivers/exec: support device binds and mounts fix iops bug and increase test matrix coverage tests: tag image explicitly changelog ci: install lxc-templates explicitly tests: skip checking rdma cgroup ci: use Ubuntu 16.04 (Xenial) in TravisCI client: update driver info on new fingerprint drivers/docker: enforce volumes.enabled (#4983) client: Style: use fluent style for building loggers ...	2018-12-13 14:41:09 -05:00
Nick Ethier	86e9c11ec2	executor: don't drop errors when configuring libcontainer cfg, add nil check on resources	2018-12-07 14:03:42 -05:00
Nick Ethier	2283cb2c39	executor: use drivers.Resources as resource model	2018-12-06 21:22:02 -05:00
Nick Ethier	29ef54c0ee	executor: merge plugin shim with executor package	2018-12-06 21:13:45 -05:00
Nick Ethier	71353a88d4	executor: remove structs package	2018-12-06 20:54:14 -05:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Nick Ethier	8b20de4801	executor: use grpc instead of netrpc as plugin protocol * Added protobuf spec for executor * Seperated executor structs into their own package	2018-12-05 11:03:56 -05:00
Danielle Tomlinson	8ba0a816f3	plugins: Add support for serving driver plugins	2018-12-01 17:30:54 +01:00
Danielle Tomlinson	393b76ed7f	plugins: Move driver testing support to subpackage this allows us to drop a cyclical import, but is subobptimal as it requires BaseDriver tests to move. This falls firmly into the realm of being a hack. Alternatives welcome.	2018-12-01 17:29:39 +01:00
Danielle Tomlinson	2db5ae38d8	client: Rename drivers/shared/env => client/taskenv	2018-11-30 12:18:39 +01:00
Danielle Tomlinson	ffc5e5d56b	executors: Unify go-plugin handshake	2018-11-30 10:59:23 +01:00
Danielle Tomlinson	fdfe93aa25	fixup: executorplugin: fix rkt build	2018-11-30 10:47:08 +01:00
Danielle Tomlinson	d26a310db0	client: Move executor plugins into own package	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	d582ea1d8b	drivers: Create drivers/shared/structs This creates a drivers/shared/structs package and moves the buffer size checks into it.	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	0544a57abe	drivers: Move client/drivers/executor to drivers/shared/executor	2018-11-30 10:46:13 +01:00
Danielle Tomlinson	1a29811169	drivers: Move client/drivers/env to drivers/shared/env As part of deprecating legacy drivers, we're moving the env package to a new drivers/shared tree, as it is used by the modern docker and rkt driver packages, and is useful for 3rd party plugins.	2018-11-30 10:46:13 +01:00
Chris Baker	b43090a267	Merge pull request #4932 from hashicorp/b-1172-rkt-env-vars change to testing utilities to fix rkt tests	2018-11-29 09:18:10 -05:00
Chris Baker	da35fda145	testing: in MkAllocDir, do not update TaskConfig with All() from the task builder, just with Env() (because it pollutes environment variables with node attributes and fails the rkt tests)	2018-11-28 22:19:48 +00:00
Preetha	1f526db414	Merge pull request #4919 from hashicorp/f-fingerprint-attribute-type Modify fingerprint interface to use typed attribute struct	2018-11-28 14:18:28 -06:00
Preetha Appan	f89dbcd9cc	modify fingerprint interface to use typed attribute struct	2018-11-28 10:01:03 -06:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Chris Baker	a1fb1f3830	Merge pull request #4891 from hashicorp/b-1150-rkt-volume-names drivers/rkt: fix invalid volumes	2018-11-27 18:55:00 -05:00
Chris Baker	c0bc9d069d	change to docs in the driver proto to reflect standard pattern	2018-11-27 23:52:24 +00:00

1 2 3 4 5

206 Commits