open-nomad

Author	SHA1	Message	Date
Tim Gross	7bd61bbf43	docker: generate /etc/hosts file for bridge network mode (#10766 ) When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.	2021-06-16 14:55:22 -04:00
James Rasell	87413ff0cd	plugins: fix test data race.	2021-06-15 09:31:08 +02:00
Tim Gross	8b2ecde5b4	csi: accept list of caps during validation in volume register When `nomad volume create` was introduced in Nomad 1.1.0, we changed the volume spec to take a list of capabilities rather than a single capability, to meet the requirements of the CSI spec. When a volume is registered via `nomad volume register`, we should be using the same fields to validate the volume with the controller plugin.	2021-06-04 07:57:26 -04:00
Ryan Sundberg	d43c5f98a5	CSI: Include MountOptions in capabilities sent to CSI for all RPCs Include the VolumeCapability.MountVolume data in ControllerPublishVolume, CreateVolume, and ValidateVolumeCapabilities RPCs sent to the CSI controller. The previous behavior was to only include the MountVolume capability in the NodeStageVolume request, which on some CSI implementations would be rejected since the Volume was not originally provisioned with the specific mount capabilities requested.	2021-05-24 10:59:54 -04:00
Michael Schurter	e62795798d	core: propagate remote task handles Add a new driver capability: RemoteTasks. When a task is run by a driver with RemoteTasks set, its TaskHandle will be propagated to the server in its allocation's TaskState. If the task is replaced due to a down node or draining, its TaskHandle will be propagated to its replacement allocation. This allows tasks to be scheduled in remote systems whose lifecycles are disconnected from the Nomad node's lifecycle. See https://github.com/hashicorp/nomad-driver-ecs for an example ECS remote task driver.	2021-04-27 15:07:03 -07:00
Nick Ethier	110f982eb3	plugins/drivers: fix deprecated fields	2021-04-16 14:13:29 -04:00
Nick Ethier	db1f697fc0	plugins/driver: add cpuset_cpus back and mark cpuset_mems as reserved	2021-04-15 13:31:18 -04:00
Nick Ethier	fe283c5a8f	executor: add support for cpuset cgroup	2021-04-15 10:24:31 -04:00
Nick Ethier	155a2ca5fb	client/ar: thread through cpuset manager	2021-04-13 13:28:36 -04:00
Tim Gross	cba09a5bcf	CSI: listing from plugins can return EOF The AWS EBS CSI plugin was observed to return a EOF when we get to the end of the paging for `ListSnapshots`, counter to specification. Handle this case gracefully, including for `ListVolumes` (which EBS doesn't support but has similar semantics). Also fixes a timestamp formatting bug on `ListSnapshots`	2021-04-08 13:32:19 -04:00
Drew Bailey	82dbf08e9f	remove flakey exec test that tests non-deterministic docker behavior (#10291 )	2021-04-02 11:15:22 -04:00
Tim Gross	0856483115	CSI: fingerprint detailed node capabilities In order to support new node RPCs, we need to fingerprint plugin capabilities in more detail. This changeset mirrors recent work to fingerprint controller capabilities, but is not yet in use by any Nomad RPC.	2021-04-01 16:00:58 -04:00
Tim Gross	466b620fa4	CSI: volume snapshot	2021-04-01 11:16:52 -04:00
Tim Gross	9fc4cf1419	CSI: fingerprint detailed controller capabilities In order to support new controller RPCs, we need to fingerprint volume capabilities in more detail and perform controller RPCs only when the specific capability is present. This fixes a bug in Ceph support where the plugin can only suport create/delete but we assume that it also supports attach/detach.	2021-03-31 16:37:09 -04:00
Tim Gross	d38008176e	CSI: create/delete/list volume RPCs This commit implements the RPC handlers on the client that talk to the CSI plugins on that client for the Create/Delete/List RPC.	2021-03-31 16:37:09 -04:00
Tim Gross	d97401f60e	CSI: protobuffer mappings for Create/Delete/List volume RPCs Note that unset proto fields for volume create should be nil. The CSI spec handles empty fields and nil fields in the protobuf differently, which may result in validation failures for creating volumes with no prior source (and does in testing with the AWS EBS plugin). Refactor the `CreateVolumeRequest` mapping to the protobuf in the plugin client to avoid this bug.	2021-03-31 16:37:09 -04:00
Mahmood Ali	f44a04454d	oversubscription: driver/exec to honor MemoryMaxMB	2021-03-30 16:55:58 -04:00
Mahmood Ali	275feb5bec	oversubscription: docker to honor MemoryMaxMB values	2021-03-30 16:55:58 -04:00
Adrian Todorov	47e1cb11df	driver/docker: add extra labels ( job name, task and task group name)	2021-03-08 08:59:52 -05:00
Kris Hicks	d71a90c8a4	Fix some errcheck errors (#9811 ) * Throw away result of multierror.Append When given a multierror.Error, it is mutated, therefore the return value is not needed. Simplify MergeMultierrorWarnings, use StringBuilder * Hash.Write() never returns an error * Remove error that was always nil * Remove error from Resources.Add signature When this was originally written it could return an error, but that was refactored away, and callers of it as of today never handle the error. * Throw away results of io.Copy during Bridge * Handle errors when computing node class in test	2021-01-14 12:46:35 -08:00
Chris Baker	9b125b8837	update template and artifact interpolation to use client-relative paths resolves #9839 resolves #6929 resolves #6910 e2e: template env interpolation path testing	2021-01-04 22:25:34 +00:00
Kris Hicks	0cf9cae656	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Kris Hicks	0a3a748053	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Mahmood Ali	98c02851c8	use comment ignores (#9448 ) Use targetted ignore comments for the cases where we are bound by backward compatibility. I've left some file based linters, especially when the file is riddled with linter voilations (e.g. enum names), or if it's a property of the file (e.g. package and file names). I encountered an odd behavior related to RPC_REQUEST_RESPONSE_UNIQUE and RPC_REQUEST_STANDARD_NAME. Apparently, if they target a `stream` type, we must separate them into separate lines so that the ignore comment targets the type specifically.	2020-11-25 16:03:01 -05:00
Mahmood Ali	b2a8752c5f	honor task user when execing into raw_exec task (#9439 ) Fix #9210 . This update the executor so it honors the User when using nomad alloc exec. The bug was that the exec task didn't honor the init command when execing.	2020-11-25 09:34:10 -05:00
Nick Ethier	c9bd7e89ca	command: use correct port mapping syntax in examples	2020-11-23 10:25:30 -06:00
Kris Hicks	511c2e9db2	proto: Switch to using buf (#9308 ) This replaces all usage of `protoc` with `buf`. See `tools/buf/README.md` for more.	2020-11-17 07:01:48 -08:00
Kris Hicks	9d03cf4c5f	protos: Update .proto files not to use Go package name (#9301 ) Previously, it was required that you `go get github.com/hashicorp/nomad` to be able to build protos, as the protoc invocation added an include directive that pointed to `$GOPATH/src`, which is how dependent protos were discovered. As Nomad now uses Go modules, it won't necessarily be cloned to `$GOPATH`. (Additionally, if you _had_ go-gotten Nomad at some point, protoc compilation would have possibly used the _wrong_ protos, as those wouldn't necessarily be the most up-to-date ones.) This change modifies the proto files and the `protoc` invocation to handle discovering dependent protos via protoc plugin modifier statements that are specific to the protoc plugin being used. In this change, `make proto` was run to recompile the protos, which results in changes only to the gzipped `FileDescriptorProto`.	2020-11-10 08:42:35 -08:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Tim Gross	29a5454894	csi: loosen ValidateVolumeCapability requirements (#9049 ) The CSI specification for `ValidateVolumeCapability` says that we shall "reconcile successful capability-validation responses by comparing the validated capabilities with those that it had originally requested" but leaves the details of that reconcilation unspecified. This API is not implemented in Kubernetes, so controller plugins don't have a real-world implementation to verify their behavior against. We have found that CSI plugins in the wild may return "successful" but incomplete `VolumeCapability` responses, so we can't require that all capabilities we expect have been validated, only that the ones that have been validated match. This appears to violate the CSI specification but until that's been resolved in upstream we have to loosen our validation requirements. The tradeoff is that we're more likely to have runtime errors during `NodeStageVolume` instead of at the time of volume registration.	2020-10-08 12:53:24 -04:00
Mahmood Ali	567597e108	Compare to the correct host network setting In systemd-resolved hosts with no DNS customizations, the docker driver DNS setting should be compared to /run/systemd/resolve/resolv.conf while exec/java drivers should be compared to /etc/resolv.conf. When system-resolved is enabled, /etc/resolv.conf is a stub that points to 127.0.0.53. Docker avoids this stub because this address isn't accessible from the container. The exec/java drivers that don't create network isolations use the stub though in the default configuration.	2020-10-01 10:23:14 -04:00
Nick Ethier	e39574be59	docker: support group allocated ports and host_networks (#8623 ) * docker: support group allocated ports * docker: add new ports driver config to specify which group ports are mapped * docker: update port mapping docs	2020-08-11 18:30:22 -04:00
Tim Gross	7d53ed88d6	csi: client RPCs should return wrapped errors for checking (#8605 ) When the client-side actions of a CSI client RPC succeed but we get disconnected during the RPC or we fail to checkpoint the claim state, we want to be able to retry the client RPC without getting blocked by the client-side state (ex. mount points) already having been cleaned up in previous calls.	2020-08-07 11:01:36 -04:00
Mahmood Ali	c2d3c3e431	nvidia: support disabling the nvidia plugin (#8353 )	2020-07-21 10:11:16 -04:00
Tim Gross	3d38592fbb	csi: add VolumeContext to NodeStage/Publish RPCs (#8239 ) In #7957 we added support for passing a volume context to the controller RPCs. This is an opaque map that's created by `CreateVolume` or, in Nomad's case, in the volume registration spec. However, we missed passing this field to the `NodeStage` and `NodePublish` RPC, which prevents certain plugins (such as MooseFS) from making node RPCs.	2020-06-22 13:54:32 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	2588b3bc98	cleanup driver eventor goroutines This fixes few cases where driver eventor goroutines are leaked during normal operations, but especially so in tests. This change makes few modifications: First, it switches drivers to use `Context`s to manage shutdown events. Previously, it relied on callers invoking `.Shutdown()` function that is specific to internal drivers only and require casting. Using `Contexts` provide a consistent idiomatic way to manage lifecycle for both internal and external drivers. Also, I discovered few places where we don't clean up a temporary driver instance in the plugin catalog code, where we dispense a driver to inspect and validate the schema config without properly cleaning it up.	2020-05-26 11:04:04 -04:00
Tim Gross	aa8927abb4	volumes: return better error messages for unsupported task drivers (#8030 ) When an allocation runs for a task driver that can't support volume mounts, the mounting will fail in a way that can be hard to understand. With host volumes this usually means failing silently, whereas with CSI the operator gets inscrutable internals exposed in the `nomad alloc status`. This changeset adds a MountConfig field to the task driver Capabilities response. We validate this when the `csi_hook` or `volume_hook` fires and return a user-friendly error. Note that we don't currently have a way to get driver capabilities up to the server, except through attributes. Validating this when the user initially submits the jobspec would be even better than what we're doing here (and could be useful for all our other capabilities), but that's out of scope for this changeset. Also note that the MountConfig enum starts with "supports all" in order to support community plugins in a backwards compatible way, rather than cutting them off from volume mounting unexpectedly.	2020-05-21 09:18:02 -04:00
Mahmood Ali	751f337f1c	Update hcl2 vendoring The hcl2 library has moved from http://github.com/hashicorp/hcl2 to https://github.com/hashicorp/hcl/tree/hcl2. This updates Nomad's vendoring to start using hcl2 library. Also updates some related libraries (e.g. `github.com/zclconf/go-cty/cty` and `github.com/apparentlymart/go-textseg`).	2020-05-19 15:00:03 -04:00
Tim Gross	0f1946d395	csi: improve plugin error messages and volume validation (#7984 ) Some CSI plugins don't return much for errors over the gRPC socket above and beyond the bare minimum error codes. This changeset improves the operator experience by unpacking the error codes when available and wrapping the error with some user-friendly direction. Improving these errors also revealed a bad comparison with `require.Error` when `require.EqualError` should be used in the test code for plugin errors. This defect in turn was hiding a bug in volume validation where we're being overly permissive in allowing mount flags, which is now fixed.	2020-05-18 08:23:17 -04:00
Tim Gross	6a463dc13a	csi: use a blocking initial connection with timeout (#7965 ) The plugin supervisor lazily connects to plugins, but this means we only get "Unavailable" back from the gRPC call in cases where the plugin can never be reached (for example, if the Nomad client has the wrong permissions for the socket). This changeset improves the operator experience by switching to a blocking `DialWithContext`. It eagerly connects so that we can validate the connection is real and get a "failed to open" error in case where Nomad can't establish the initial connection.	2020-05-15 08:17:11 -04:00
Tim Gross	2082cf738a	csi: support for VolumeContext and VolumeParameters (#7957 ) The MVP for CSI in the 0.11.0 release of Nomad did not include support for opaque volume parameters or volume context. This changeset adds support for both. This also moves args for ControllerValidateCapabilities into a struct. The CSI plugin `ControllerValidateCapabilities` struct that we turn into a CSI RPC is accumulating arguments, so moving it into a request struct will reduce the churn of this internal API, make the plugin code more readable, and make this method consistent with the other plugin methods in that package.	2020-05-15 08:16:01 -04:00
Tim Gross	24aa32c503	csi: use a blocking initial connection with timeout The plugin supervisor lazily connects to plugins, but this means we only get "Unavailable" back from the gRPC call in cases where the plugin can never be reached (for example, if the Nomad client has the wrong permissions for the socket). This changeset improves the operator experience by switching to a blocking `DialWithContext`. It eagerly connects so that we can validate the connection is real and get a "failed to open" error in case where Nomad can't establish the initial connection.	2020-05-14 15:59:19 -04:00
Tim Gross	4f54a633a2	csi: refactor internal client field name to ExternalID (#7958 ) The CSI plugins RPCs require the use of the storage provider's volume ID, rather than the user-defined volume ID. Although changing the RPCs to use the field name `ExternalID` risks breaking backwards compatibility, we can use the `ExternalID` name internally for the client and only use `VolumeID` at the RPC boundaries.	2020-05-14 11:56:07 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Mahmood Ali	938e916d9c	When serializing msgpack, only consider codec tag When serializing structs with msgpack, only consider type tags of `codec`. Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting `codec` tag if it's available, but falls to using `json` if `codec` isn't present. This behavior is surprising in cases where we want to serialize json differently from msgpack, e.g. serializing `ConsulExposeConfig`.	2020-05-11 14:14:10 -04:00
Tim Gross	3cca738478	csi: fix mount validation (#7869 ) Several of the CSI `VolumeCapability` methods return pointers, which we were then comparing to pointers in the request rather than dereferencing them and comparing their contents. This changeset does a more fine-grained comparison of the request vs the capabilities, and adds better error messaging.	2020-05-05 15:13:07 -04:00
Tim Gross	cbae10333c	csi: check returned volume capability validation (#7831 ) This changeset corrects handling of the `ValidationVolumeCapabilities` response: * The CSI spec for the `ValidationVolumeCapabilities` requires that plugins only set the `Confirmed` field if they've validated all capabilities. The Nomad client improperly assumes that the lack of a `Confirmed` field should be treated as a failure. This breaks the Azure and Linode block storage plugins, which don't set this optional field. * The CSI spec also requires that the orchestrator check the validation responses to guard against older versions of a plugin reporting "valid" for newer fields it doesn't understand.	2020-04-30 17:12:32 -04:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00

1 2 3 4 5 ...

330 commits