open-nomad

Author	SHA1	Message	Date
Nick Ethier	88793e92b6	ar: isolate network actions performed by client	2021-02-02 23:24:57 -05:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Tim Gross	64449cddc1	implement alloc runner task restart hook Most allocation hooks don't need to know when a single task within the allocation is restarted. The check watcher for group services triggers the alloc runner to restart all tasks, but the alloc runner's `Restart` method doesn't trigger any of the alloc hooks, including the group service hook. The result is that after the first time a check triggers a restart, we'll never restart the tasks of an allocation again. This commit adds a `RunnerTaskRestartHook` interface so that alloc runner hooks can act if a task within the alloc is restarted. The only implementation is in the group service hook, which will force a re-registration of the alloc's services and fix check restarts.	2021-01-22 10:55:40 -05:00
Seth Hoenig	5abaf1b86d	consul/connect: ensure proxyID in test case	2021-01-20 09:48:12 -06:00
Seth Hoenig	a18e63ed55	client: use closed variable in append	2021-01-20 09:20:50 -06:00
Seth Hoenig	991884e715	consul/connect: Enable running multiple ingress gateways per Nomad agent Connect ingress gateway services were being registered into Consul without an explicit deterministic service ID. Consul would generate one automatically, but then Nomad would have no way to register a second gateway on the same agent as it would not supply 'proxy-id' during envoy bootstrap. Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap. Fixes #9834	2021-01-19 12:58:36 -06:00
Kris Hicks	d71a90c8a4	Fix some errcheck errors (#9811 ) * Throw away result of multierror.Append When given a multierror.Error, it is mutated, therefore the return value is not needed. Simplify MergeMultierrorWarnings, use StringBuilder * Hash.Write() never returns an error * Remove error that was always nil * Remove error from Resources.Add signature When this was originally written it could return an error, but that was refactored away, and callers of it as of today never handle the error. * Throw away results of io.Copy during Bridge * Handle errors when computing node class in test	2021-01-14 12:46:35 -08:00
Tim Gross	d78b4fc1a1	safely handle existing net namespace in default network manager When a client restarts, the network_hook's prerun will call `CreateNetwork`. Drivers that don't implement their own network manager will fall back to the default network manager, which doesn't handle the case where the network namespace is being recreated safely. This results in an error and the task being restarted for `exec` tasks with `network` blocks (this also impacts the community `containerd` and probably other community task drivers). If we get an error when attempting to create the namespace and that error is because the file already exists and is locked by its process, then we'll return a `nil` error with the `created` flag set to false, just as we do with the `docker` driver.	2021-01-11 11:31:03 -05:00
Mahmood Ali	00be4fc63c	tests: deflake TestTaskRunner_StatsHook_Periodic (#9734 ) This PR deflakes TestTaskRunner_StatsHook_Periodic tests and adds backoff when the driver closes the channel. TestTaskRunner_StatsHook_Periodic is currently the most flaky test - failing ~4% of the time (20 out of 486 workflows). A sample failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/14028/workflows/957b674f-cbcc-4228-96d9-1094fdee5b9c/jobs/128563 . This change has two components: First, it updates the StatsHook so that it backs off when stats channel is closed. In the context of the test where the mock driver emits a single stats update and closes the channel, the test may make tens of thousands update during the period. In real context, if a driver doesn't implement the stats handler properly or when a task finishes, we may generate way too many Stats queries in a tight loop. Here, the backoff reduces these queries. I've added a failing test that shows 154,458 stats updates within 500ms in https://app.circleci.com/pipelines/github/hashicorp/nomad/14092/workflows/50672445-392d-4661-b19e-e3561ed32746/jobs/129423 . Second, the test ignores the first stats update after a task exit. Due to the asynchronicity of updates and channel/context use, it's possible that an update is enqueued while the test marks the task as exited, resulting into a spurious update.	2021-01-06 16:03:00 -05:00
Seth Hoenig	b4eafe6f2d	consul: always include task services hook Previously, Nomad would optimize out the services task runner hook for tasks which were initially submitted with no services defined. This causes a problem when the job is later updated to include service(s) on that task, which will result in nothing happening because the hook is not present to handle the service registration in the .Update. Instead, always enable the services hook. The group services alloc runner hook is already always enabled. Fixes #9707	2021-01-05 08:47:19 -06:00
Chris Baker	c7072258af	enabled broken test that is no longer broken	2021-01-04 22:25:35 +00:00
Chris Baker	9b125b8837	update template and artifact interpolation to use client-relative paths resolves #9839 resolves #6929 resolves #6910 e2e: template env interpolation path testing	2021-01-04 22:25:34 +00:00
Tim Gross	1785822386	template: trigger change_mode for dynamic secrets on restore (#9636 ) When a task is restored after a client restart, the template runner will create a new lease for any dynamic secret (ex. Consul or PKI secrets engines). But because this lease is being created in the prestart hook, we don't trigger the `change_mode`. This changeset uses the the existence of the task handle to detect a previously running task that's been restored, so that we can trigger the template `change_mode` if the template is changed, as it will be only with dynamic secrets.	2020-12-16 13:36:19 -05:00
Tim Gross	782c05f8c0	cni: prevent NPE if no interface has sandbox field set When we iterate over the interfaces returned from CNI setup, we filter for one with the `Sandbox` field set. Ensure that if none of the interfaces has that field set that we still return an available interface.	2020-12-16 10:36:03 -05:00
Seth Hoenig	e531e90b1b	build: set linux build tag on CNI networking CNI network configuration is currently only supported on Linux. For now, add the linux build tag so that the deadcode linter does not trip over unused CNI stuff on macOS.	2020-12-14 12:05:16 -06:00
Seth Hoenig	beaa6359d5	consul/connect: fix regression where client connect images ignored Nomad v1.0.0 introduced a regression where the client configurations for `connect.sidecar_image` and `connect.gateway_image` would be ignored despite being set. This PR restores that functionality. There was a missing layer of interpolation that needs to occur for these parameters. Since Nomad 1.0 now supports dynamic envoy versioning through the ${NOMAD_envoy_version} psuedo variable, we basically need to first interpolate ${connect.sidecar_image} => envoyproxy/envoy:v${NOMAD_envoy_version} then use Consul at runtime to resolve to a real image, e.g. envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.16.0 Of course, if the version of Consul is too old to provide an envoy version preference, we then need to know to fallback to the old version of envoy that we used before. envoyproxy/envoy:v${NOMAD_envoy_version} => envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09 Beyond that, we also need to continue to support jobs that set the sidecar task themselves, e.g. sidecar_task { config { image: "custom/envoy" } } which itself could include teh pseudo envoy version variable.	2020-12-14 09:47:55 -06:00
Kris Hicks	0cf9cae656	Apply some suggested fixes from staticcheck (#9598 )	2020-12-10 07:29:18 -08:00
Kris Hicks	0a3a748053	Add gosimple linter (#9590 )	2020-12-09 11:05:18 -08:00
Michael Schurter	15f2b8fe7c	client: skip broken test and fix assertion	2020-11-18 10:01:02 -08:00
Michael Schurter	ff91bba70e	client: fix interpolation in template source While Nomad v0.12.8 fixed `NOMAD_{ALLOC,TASK,SECRETS}_DIR` use in `template.destination`, interpolating these variables in `template.source` caused a path escape error. Why not apply the destination fix to source? The destination fix forces destination to always be relative to the task directory. This makes sense for the destination as a destination outside the task directory would be unreachable by the task. There's no reason to ever render a template outside the task directory. (Using `..` does allow destinations to escape the task directory if `template.disable_file_sandbox = true`. That's just awkward and unsafe enough I hope no one uses it.) There is a reason to source a template outside a task directory. At least if there weren't then I can't think of why we implemented `template.disable_file_sandbox`. So v0.12.8 left the behavior of `template.source` the more straightforward "Interpolate and validate." However, since outside of `raw_exec` every other driver uses absolute paths for `NOMAD__DIR` interpolation, this means those variables are unusable unless `disable_file_sandbox` is set. The Fix* The variables are now interpolated as relative paths only for the purpose of rendering templates. This is an unfortunate special case, but reflects the fact that the templates view of the filesystem is completely different (unconstrainted) vs the task's view (chrooted). Arguably the values of these variables should be context-specific. I think it's more reasonable to think of the "hack" as templating running uncontainerized than that giving templates different paths is a hack. TODO - [ ] E2E tests - [ ] Job validation may still be broken and prevent my fix from working? raw_exec `raw_exec` is actually broken _a different way_ as exercised by tests in this commit. I think we should probably remove these tests and fix that in a followup PR/release, but I wanted to leave them in for the initial review and discussion. Since non-containerized source paths are broken anyway, perhaps there's another solution to this entire problem I'm overlooking?	2020-11-17 22:03:04 -08:00
Wim	4e37897dd9	Use correct interface for netStatus CNI plugins can return multiple interfaces, eg the bridge plugin. We need the interface with the sandbox.	2020-11-14 22:29:30 +01:00
Seth Hoenig	4cc3c01d5b	Merge pull request #9352 from hashicorp/f-artifact-headers jobspec: add support for headers in artifact stanza	2020-11-13 14:04:27 -06:00
Seth Hoenig	bb8a5816a0	jobspec: add support for headers in artifact stanza This PR adds the ability to set HTTP headers when downloading an artifact from an `http` or `https` resource. The implementation in `go-getter` is such that a new `HTTPGetter` must be created for each artifact that sets headers (as opposed to conveniently setting headers per-request). This PR maintains the memoization of the default Getter objects, creating new ones only for artifacts where headers are set. Closes #9306	2020-11-13 12:03:54 -06:00
Jasmine Dahilig	d6110cbed4	lifecycle: add poststop hook (#8194 )	2020-11-12 08:01:42 -08:00
Tim Gross	60874ebe25	csi: Postrun hook should not change mode (#9323 ) The unpublish workflow requires that we know the mode (RW vs RO) if we want to unpublish the node. Update the hook and the Unpublish RPC so that we mark the claim for release in a new state but leave the mode alone. This fixes a bug where RO claims were failing node unpublish. The core job GC doesn't know the mode, but we don't need it for that workflow, so add a mode specifically for GC; the volumewatcher uses this as a sentinel to check whether claims (with their specific RW vs RO modes) need to be claimed.	2020-11-11 13:06:30 -05:00
Nick Ethier	04f5c4ee5f	ar/groupservice: remove drivernetwork (#9233 ) * ar/groupservice: remove drivernetwork * consul: allow host address_mode to accept raw port numbers * consul: fix logic for blank address	2020-11-05 15:00:22 -05:00
Tim Gross	1fb1c9c5d4	artifact/template: make destination path absolute inside taskdir (#9149 ) Prior to Nomad 0.12.5, you could use `${NOMAD_SECRETS_DIR}/mysecret.txt` as the `artifact.destination` and `template.destination` because we would always append the destination to the task working directory. In the recent security patch we treated the `destination` absolute path as valid if it didn't escape the working directory, but this breaks backwards compatibility and interpolation of `destination` fields. This changeset partially reverts the behavior so that we always append the destination, but we also perform the escape check on that new destination after interpolation so the security hole is closed. Also, ConsulTemplate test should exercise interpolation	2020-10-22 15:47:49 -04:00
Tim Gross	6df36e4cdb	artifact/template: prevent file sandbox escapes Ensure that the client honors the client configuration for the `template.disable_file_sandbox` field when validating the jobspec's `template.source` parameter, and not just with consul-template's own `file` function. Prevent interpolated `template.source`, `template.destination`, and `artifact.destination` fields from escaping file sandbox.	2020-10-21 14:34:12 -04:00
Alexander Shtuchkin	90fd8bb85f	Implement 'batch mode' for persisting allocations on the client. (#9093 ) Fixes #9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.	2020-10-20 16:15:37 -04:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Nick Ethier	d45be0b5a6	client: add NetworkStatus to Allocation (#8657 )	2020-10-12 13:43:04 -04:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Tim Gross	b5abf4ec9d	csi: fix incorrect comment on csi_hook context lifetime	2020-10-09 11:03:51 -04:00
Fredrik Hoem Grelland	a015c52846	configure nomad cluster to use a Consul Namespace [Consul Enterprise] (#8849 )	2020-10-02 14:46:36 -04:00
Fredrik Hoem Grelland	953d4de8dd	update consul-template to v0.25.1 (#8988 )	2020-10-01 14:08:49 -04:00
Seth Hoenig	af9543c997	consul: fix validation of task in group-level script-checks When defining a script-check in a group-level service, Nomad needs to know which task is associated with the check so that it can use the correct task driver to execute the check. This PR fixes two bugs: 1) validate service.task or service.check.task is configured 2) make service.check.task inherit service.task if it is itself unset Fixes #8952	2020-09-28 15:02:59 -05:00
Lars Lehtonen	55f0302c46	client/allocrunner/taskrunner: client.Close after err check (#8825 )	2020-09-04 08:12:08 -04:00
Jasmine Dahilig	71a694f39c	Merge pull request #8390 from hashicorp/lifecycle-poststart-hook task lifecycle poststart hook	2020-08-31 13:53:24 -07:00
Jasmine Dahilig	fbe0c89ab1	task lifecycle poststart: code review fixes	2020-08-31 13:22:41 -07:00
Seth Hoenig	dfe179abc5	consul/connect: fixup some comments and context timeout	2020-08-26 13:17:16 -05:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Michael Schurter	de08ae8083	test: add allocrunner test for poststart hooks	2020-08-12 09:54:14 -07:00
Nick Ethier	e39574be59	docker: support group allocated ports and host_networks (#8623 ) * docker: support group allocated ports * docker: add new ports driver config to specify which group ports are mapped * docker: update port mapping docs	2020-08-11 18:30:22 -04:00
Lang Martin	a27913e699	CSI RPC Token (#8626 ) * client/allocrunner/csi_hook: use the Node SecretID * client/allocrunner/csi_hook: include the namespace for Claim	2020-08-11 13:08:39 -04:00
Michael Schurter	e1946b66ce	client: remove shortcircuit preventing poststart hooks from running	2020-08-11 09:48:24 -07:00
Michael Schurter	04a135b57d	client: don't restart poststart sidecars on success	2020-08-11 09:47:18 -07:00

1 2 3 4 5 ...

514 commits