open-nomad/website/content/docs/integrations/consul-connect.mdx

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

380 lines
11 KiB
Plaintext
Raw Normal View History

---
layout: docs
page_title: Consul Service Mesh
2020-02-06 23:45:31 +00:00
description: >-
Learn how to use Nomad with Consul service mesh to enable secure service to service
2020-02-06 23:45:31 +00:00
communication
---
# Consul Service Mesh
~> **Note:** Nomad's service mesh integration requires Linux network namespaces.
Consul service mesh will not run on Windows or macOS.
[Consul service mesh](/consul/docs/connect) provides
2019-09-04 21:25:06 +00:00
service-to-service connection authorization and encryption using mutual
Transport Layer Security (TLS). Applications can use sidecar proxies in a
service mesh configuration to automatically establish TLS connections for
inbound and outbound connections without being aware of the service mesh at all.
# Nomad with Consul Service Mesh Integration
2019-09-04 21:25:06 +00:00
Nomad integrates with Consul to provide secure service-to-service communication
between Nomad jobs and task groups. To support Consul service mesh, Nomad
2019-09-04 21:25:06 +00:00
adds a new networking mode for jobs that enables tasks in the same task group to
share their networking stack. With a few changes to the job specification, job
authors can opt into service mesh integration. When service mesh is enabled, Nomad will
2019-09-04 21:25:06 +00:00
launch a proxy alongside the application in the job file. The proxy (Envoy)
provides secure communication with other applications in the cluster.
Nomad job specification authors can use Nomad's Consul service mesh integration to
implement [service segmentation](https://www.consul.io/use-cases/multi-platform-service-mesh) in a
2019-09-04 21:25:06 +00:00
microservice architecture running in public clouds without having to directly
manage TLS certificates. This is transparent to job specification authors as
security features in service mesh continue to work even as the application scales up
2019-09-04 21:25:06 +00:00
or down or gets rescheduled by Nomad.
For using the Consul service mesh integration with Consul ACLs enabled, see the
[Secure Nomad Jobs with Consul Service Mesh](/nomad/tutorials/integrate-consul/consul-service-mesh)
guide.
# Nomad Consul Service Mesh Example
The following section walks through an example to enable secure communication
between a web dashboard and a backend counting service. The web dashboard and
the counting service are managed by Nomad. Nomad additionally configures Envoy
proxies to run along side these applications. The dashboard is configured to
connect to the counting service via localhost on port 9001. The proxy is managed
by Nomad, and handles mTLS communication to the counting service.
## Prerequisites
### Consul
The Consul service mesh integration with Nomad requires [Consul 1.6 or
later.](https://releases.hashicorp.com/consul/1.6.0/) The Consul agent can be
run in dev mode with the following command:
~> **Note:** Nomad's Consul service mesh integration requires Consul in your `$PATH`
2020-05-18 20:53:06 +00:00
```shell-session
$ consul agent -dev
```
To use service mesh on a non-dev Consul agent, you will minimally need to enable the
2019-10-07 14:05:40 +00:00
GRPC port and set `connect` to enabled by adding some additional information to
client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309) * client: accommodate Consul 1.14.0 gRPC and agent self changes. Consul 1.14.0 changed the way in which gRPC listeners are configured, particularly when using TLS. Prior to the change, a single listener was responsible for handling plain-text and encrypted gRPC requests. In 1.14.0 and beyond, separate listeners will be used for each, defaulting to 8502 and 8503 for plain-text and TLS respectively. The change means that Nomad’s Consul Connect integration would not work when integrated with Consul clusters using TLS and running 1.14.0 or greater. The Nomad Consul fingerprinter identifies the gRPC port Consul has exposed using the "DebugConfig.GRPCPort" value from Consul’s “/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only represents the plain-text gRPC port which is likely to be disbaled in clusters running TLS. In order to fix this issue, Nomad now takes into account the Consul version and configured scheme to optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent self return. The “consul_grcp_socket” allocrunner hook has also been updated so that the fingerprinted gRPC port attribute is passed in. This provides a better fallback method, when the operator does not configure the “consul.grpc_address” option. * docs: modify Consul Connect entries to detail 1.14.0 changes. * changelog: add entry for #15309 * fixup: tidy tests and clean version match from review feedback. * fixup: use strings tolower func.
2022-11-21 15:19:09 +00:00
your Consul client configurations, depending on format. Consul agents running TLS
and a version greater than [1.14.0](https://releases.hashicorp.com/consul/1.14.0)
should set the `grpc_tls` configuration parameter instead of `grpc`. Please see
the Consul [port documentation](https://nomadproject.io/consul_ports) for further reference material.
2019-10-07 14:05:40 +00:00
For HCL configurations:
```hcl
# ...
ports {
grpc = 8502
2019-10-07 14:05:40 +00:00
}
connect {
enabled = true
}
```
For JSON configurations:
```javascript
{
// ...
"ports": {
"grpc": 8502
},
"connect": {
"enabled": true
}
}
```
#### Consul TLS
~> **Note:** Consul 1.14+ made a [backwards incompatible change][consul_grpc_tls]
in how TLS enabled grpc listeners work. When using Consul 1.14 with TLS enabled users
will need to specify additional Nomad agent configuration to work with Connect. The
`consul.grpc_ca_file` value must now be configured (introduced in Nomad 1.4.4),
and `consul.grpc_address` will most likely need to be set to use the new standard
`grpc_tls` port of `8503`.
```hcl
consul {
grpc_ca_file = "/etc/tls/consul-agent-ca.pem"
grpc_address = "127.0.0.1:8503"
ca_file = "/etc/tls/consul-agent-ca.pem"
cert_file = "/etc/tls/dc1-client-consul-0.pem"
key_file = "/etc/tls/dc1-client-consul-0-key.pem"
ssl = true
address = "127.0.0.1:8501"
}
```
#### Consul ACLs
~> **Note:** Starting in Nomad v1.3.0, Consul Service Identity ACL tokens automatically
generated by Nomad on behalf of Connect enabled services are now created in [`Local`]
rather than Global scope, and are no longer replicated globally.
To facilitate cross-Consul datacenter requests of Connect services registered by
Nomad, Consul agents will need to be configured with [default anonymous][anon_token]
ACL tokens with ACL policies of sufficient permissions to read service and node
metadata pertaining to those requests. This mechanism is described in Consul [#7414][consul_acl].
A typical Consul agent anonymous token may contain an ACL policy such as:
```hcl
service_prefix "" { policy = "read" }
node_prefix "" { policy = "read" }
```
### Nomad
2019-07-08 11:31:07 +00:00
Nomad must schedule onto a routable interface in order for the proxies to
connect to each other. The following steps show how to start a Nomad dev agent
configured for Consul service mesh.
2020-05-18 20:53:06 +00:00
```shell-session
$ sudo nomad agent -dev-connect
2019-07-08 11:31:07 +00:00
```
### CNI Plugins
Nomad uses CNI reference plugins to configure the network namespace used to secure the
Consul service mesh sidecar proxy. All Nomad client nodes using network namespaces
must have these CNI plugins [installed][cni_install].
## Run the Service Mesh-enabled Services
Once Nomad and Consul are running, submit the following service mesh-enabled services
to Nomad by copying the HCL into a file named `servicemesh.nomad.hcl` and running:
`nomad job run servicemesh.nomad.hcl`
```hcl
2019-10-07 14:05:40 +00:00
job "countdash" {
datacenters = ["dc1"]
group "api" {
network {
mode = "bridge"
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {}
}
}
task "web" {
driver = "docker"
config {
image = "hashicorpdev/counter-api:v3"
2019-10-07 14:05:40 +00:00
}
}
}
group "dashboard" {
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
}
service {
name = "count-dashboard"
2021-09-04 00:34:40 +00:00
port = "http"
2019-10-07 14:05:40 +00:00
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
}
task "dashboard" {
driver = "docker"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpdev/counter-dashboard:v3"
2019-10-07 14:05:40 +00:00
}
}
}
}
```
2019-07-09 12:44:35 +00:00
The job contains two task groups: an API service and a web frontend.
2019-07-09 12:44:35 +00:00
### API Service
The API service is defined as a task group with a bridge network:
```hcl
2019-10-07 14:05:40 +00:00
group "api" {
network {
mode = "bridge"
}
2019-07-09 12:44:35 +00:00
2019-10-07 14:05:40 +00:00
# ...
}
2019-07-09 12:44:35 +00:00
```
Since the API service is only accessible via Consul service mesh, it does not define
any ports in its network. The service block enables service mesh.
2019-07-09 12:44:35 +00:00
```hcl
2019-10-07 14:05:40 +00:00
group "api" {
# ...
2019-07-09 12:44:35 +00:00
2019-10-07 14:05:40 +00:00
service {
name = "count-api"
port = "9001"
2019-07-09 12:44:35 +00:00
2019-10-07 14:05:40 +00:00
connect {
sidecar_service {}
}
}
2019-07-09 12:44:35 +00:00
2019-10-07 14:05:40 +00:00
# ...
}
2019-07-09 12:44:35 +00:00
```
The `port` in the service block is the port the API service listens on. The
2019-07-09 12:44:35 +00:00
Envoy proxy will automatically route traffic to that port inside the network
namespace. Note that currently this cannot be a named port; it must be a
hard-coded port value. See [GH-9907].
2019-07-09 12:44:35 +00:00
### Web Frontend
The web frontend is defined as a task group with a bridge network and a static
forwarded port:
```hcl
2019-10-07 14:05:40 +00:00
group "dashboard" {
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
}
# ...
}
```
2019-07-09 12:44:35 +00:00
The `static = 9002` parameter requests the Nomad scheduler reserve port 9002 on
a host network interface. The `to = 9002` parameter forwards that host port to
port 9002 inside the network namespace.
This allows you to connect to the web frontend in a browser by visiting
2019-09-05 16:20:34 +00:00
`http://<host_ip>:9002` as show below:
2019-09-04 22:00:49 +00:00
[![Count Dashboard][count-dashboard]][count-dashboard]
2019-07-09 12:44:35 +00:00
The web frontend connects to the API service via Consul service mesh.
2019-07-09 12:44:35 +00:00
```hcl
2019-10-07 14:05:40 +00:00
service {
name = "count-dashboard"
2021-09-04 00:34:40 +00:00
port = "http"
2019-10-07 14:05:40 +00:00
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
}
2019-07-09 12:44:35 +00:00
```
The `upstreams` block defines the remote service to access (`count-api`) and
2019-07-09 12:44:35 +00:00
what port to expose that service on inside the network namespace (`8080`).
The web frontend is configured to communicate with the API service with an
environment variable:
```hcl
2019-10-07 14:05:40 +00:00
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
2019-07-09 12:44:35 +00:00
```
The web frontend is configured via the `$COUNTING_SERVICE_URL`, so you must
interpolate the upstream's address into that environment variable. Note that
dashes (`-`) are converted to underscores (`_`) in environment variables so
`count-api` becomes `count_api`.
## Limitations
- The minimum Consul version to use Connect with Nomad is Consul v1.8.0.
2020-02-06 23:45:31 +00:00
- The `consul` binary must be present in Nomad's `$PATH` to run the Envoy
proxy sidecar on client nodes.
- Consul service mesh using network namespaces is only supported on Linux.
- Prior to Consul 1.9, the Envoy sidecar proxy will drop and stop accepting
connections while the Nomad agent is restarting.
2020-02-06 23:45:31 +00:00
## Troubleshooting
If the sidecar service is not running correctly, you can investigate
potential `envoy` failures in the following ways:
* Task logs in the associated `connect-*` task
* Task secrets (may contain sensitive information):
* envoy CLI command: `secrets/.envoy_bootstrap.cmd`
* environment variables: `secrets/.envoy_bootstrap.env`
* An extra Allocation log file: `alloc/logs/envoy_bootstrap.stderr.0`
For example, with an allocation ID starting with `b36a`:
```shell-session
nomad alloc status -short b36a # to get the connect-* task name
nomad alloc logs -task connect-proxy-count-api -stderr b36a
nomad alloc exec -task connect-proxy-count-api b36a cat secrets/.envoy_bootstrap.cmd
nomad alloc exec -task connect-proxy-count-api b36a cat secrets/.envoy_bootstrap.env
nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0
```
Note: If the alloc is unable to start successfully, debugging files may
only be accessible from the host filesystem. However, the sidecar task secrets
directory may not be available in systems where it is mounted in a temporary
filesystem.
2020-02-06 23:45:31 +00:00
[count-dashboard]: /img/count-dashboard.png
[consul_acl]: https://github.com/hashicorp/consul/issues/7414
[gh-9907]: https://github.com/hashicorp/nomad/issues/9907
[`Local`]: /consul/docs/security/acl/acl-tokens#token-attributes
[anon_token]: /consul/docs/security/acl/acl-tokens#special-purpose-tokens
[consul_ports]: /consul/docs/agent/config/config-files#ports
[consul_grpc_tls]: /consul/docs/upgrading/upgrade-specific#changes-to-grpc-tls-configuration
[cni_install]: /nomad/docs/install#post-installation-steps