Commit Graph

4228 Commits

Author SHA1 Message Date
Pierre Cauchois e4b739cafd
RPC Timeout/Retries account for blocking requests (#8921)
The current implementation measures RPC request timeout only against
config.RPCHoldTimeout, which is fine for non-blocking requests but will
almost surely be exceeded by long-poll requests that block for minutes
at a time.

This adds an HasTimedOut method on the RPCInfo interface that takes into
account whether the request is blocking, its maximum wait time, and the
RPCHoldTimeout.
2020-09-18 08:58:41 -04:00
Joel May 2adc5bdec7
fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Lars Lehtonen 55f0302c46
client/allocrunner/taskrunner: client.Close after err check (#8825) 2020-09-04 08:12:08 -04:00
Tim Gross 8ad90b4253
fix params for Agent.Host client RPC (#8795)
The parameters for the receiving side of the Agent.Host client RPC did not
take the arguments serialized at the server side. This results in a panic.
2020-08-31 17:14:26 -04:00
Jasmine Dahilig 71a694f39c
Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Jasmine Dahilig fbe0c89ab1 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Seth Hoenig 9f1f2a5673 Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Mahmood Ali 10954bf717 close file when done reading 2020-08-24 20:22:42 -04:00
Mahmood Ali 0be632debf don't lock if ref is nil
Ensure that d.mu is only dereferenced if d is not-nil, to avoid a null
dereference panic.
2020-08-24 20:19:40 -04:00
Seth Hoenig 26e77623e5 consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig a09d1746bf
Merge branch 'master' into consul-v1.7.7 2020-08-24 10:43:00 -05:00
Yoan Blanc 327d17e0dc
fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Mark Lee cd23fd7ca2 refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee cd7aabca72 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Shengjing Zhu 7a4f48795d Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Michael Schurter de08ae8083 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Nick Ethier e39574be59
docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Lang Martin a27913e699
CSI RPC Token (#8626)
* client/allocrunner/csi_hook: use the Node SecretID
* client/allocrunner/csi_hook: include the namespace for Claim
2020-08-11 13:08:39 -04:00
Michael Schurter e1946b66ce client: remove shortcircuit preventing poststart hooks from running 2020-08-11 09:48:24 -07:00
Michael Schurter 04a135b57d client: don't restart poststart sidecars on success 2020-08-11 09:47:18 -07:00
Tim Gross 7d53ed88d6
csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Jasmine Dahilig e8ed6851e2 lifecycle: add allocrunner and task hook coordinator unit tests 2020-07-29 12:39:42 -07:00
Seth Hoenig a392b19b6a consul/connect: fixup some spelling, comments, consts 2020-07-29 09:26:01 -05:00
Seth Hoenig 04bb6c416f consul/connect: organize lock & fields in http/grpc socket hooks 2020-07-29 09:26:01 -05:00
Seth Hoenig dbee956c05 consul/connect: optimze grpc socket hook check for bridge network first 2020-07-29 09:26:01 -05:00
Seth Hoenig 2511f48351 consul/connect: add support for bridge networks with connect native tasks
Before, Connect Native Tasks needed one of these to work:

- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface

None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.

Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.

Fixes #8290
2020-07-29 09:26:01 -05:00
Drew Bailey bd421b6197
Merge pull request #8453 from hashicorp/oss-multi-vault-ns
oss compoments for multi-vault namespaces
2020-07-27 08:45:22 -04:00
Mahmood Ali 2d0b80a0ed
Merge pull request #6517 from hashicorp/b-fingerprint-shutdown-race
client: don't retry fingerprinting on shutdown
2020-07-24 11:56:32 -04:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
Michael Schurter 1400e0480d
Merge pull request #8521 from hashicorp/docs-hearbeat
docs: s/hearbeat/heartbeat and fix link
2020-07-23 14:07:24 -07:00
Tim Gross 56c6dacd38
csi: NodePublish should not create target_path, only its parent dir (#8505)
The NodePublish workflow currently creates the target path and its parent
directory. However, the CSI specification says that the CO shall ensure the
parent directory of the target path exists, and that the SP shall place the
block device or mounted directory at the target path. Much of our testing has
been with CSI plugins that are more forgiving, but our behavior breaks
spec-compliant CSI plugins.

This changeset ensures we only create the parent directory.
2020-07-23 15:52:22 -04:00
Michael Schurter 8340ad4da8 docs: s/hearbeat/heartbeat and fix link
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
2020-07-23 11:33:34 -07:00
Mahmood Ali 91ba9ccefe honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali c2d3c3e431
nvidia: support disabling the nvidia plugin (#8353) 2020-07-21 10:11:16 -04:00
Jasmine Dahilig 44c21bd3c7 fix panic, but poststart is still stalled 2020-07-10 09:03:10 -07:00
Mahmood Ali e9bf3a42f5
Merge pull request #8333 from hashicorp/b-test-tweak-20200701
tests: avoid os.Exit in a test
2020-07-10 11:18:28 -04:00
Jasmine Dahilig 9e27231953 add poststart hook to task hook coordinator & structs 2020-07-08 11:01:35 -07:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Lang Martin 6c22cd587d
api: `nomad debug` new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Mahmood Ali 026d8c6eed tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00
Mahmood Ali 7f460d2706 allocrunner: terminate sidecars in the end
This fixes a bug where a batch allocation fails to complete if it has
sidecars.

If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
2020-06-29 15:12:15 -04:00
Seth Hoenig 011c6b027f connect/native: doc and comment tweaks from PR 2020-06-24 10:13:22 -05:00
Seth Hoenig 03a5706919 connect/native: check for pre-existing consul token 2020-06-24 09:16:28 -05:00
Seth Hoenig 6154181a64 connect/native: update connect native hook tests 2020-06-23 12:07:35 -05:00
Seth Hoenig c5d3f58bee connect/native: give tls files an extension 2020-06-23 12:06:28 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Tim Gross 3d38592fbb
csi: add VolumeContext to NodeStage/Publish RPCs (#8239)
In #7957 we added support for passing a volume context to the controller RPCs.
This is an opaque map that's created by `CreateVolume` or, in Nomad's case,
in the volume registration spec.

However, we missed passing this field to the `NodeStage` and `NodePublish` RPC,
which prevents certain plugins (such as MooseFS) from making node RPCs.
2020-06-22 13:54:32 -04:00