Merge pull request #7823 from hashicorp/docs-wanfed-mesh
Redo PR #7430 for new website (docs for WAN federation over mesh gateways)
This commit is contained in:
commit
272733a5c1
|
@ -155,6 +155,7 @@ export default [
|
||||||
content: ['envoy', 'built-in', 'integrate'],
|
content: ['envoy', 'built-in', 'integrate'],
|
||||||
},
|
},
|
||||||
'mesh_gateway',
|
'mesh_gateway',
|
||||||
|
'wan-federation-via-mesh-gateways',
|
||||||
{
|
{
|
||||||
category: 'registration',
|
category: 'registration',
|
||||||
content: ['service-registration', 'sidecar-service'],
|
content: ['service-registration', 'sidecar-service'],
|
||||||
|
|
|
@ -0,0 +1,185 @@
|
||||||
|
---
|
||||||
|
layout: docs
|
||||||
|
page_title: Connect - WAN Federation via Mesh Gateways
|
||||||
|
sidebar_title: WAN Federation via Mesh Gateways <sup> Beta </sup>
|
||||||
|
description: |-
|
||||||
|
WAN federation via mesh gateways allows for Consul servers in different datacenters to be federated exclusively through mesh gateways.
|
||||||
|
---
|
||||||
|
|
||||||
|
# WAN Federation via Mesh Gateways <sup>Beta</sup>
|
||||||
|
|
||||||
|
-> **1.8.0+:** This feature is available in Consul versions 1.8.0 and higher
|
||||||
|
|
||||||
|
~> This topic requires familiarity with [mesh gateways](/docs/connect/mesh_gateway).
|
||||||
|
|
||||||
|
WAN federation via mesh gateways allows for Consul servers in different datacenters
|
||||||
|
to be federated exclusively through mesh gateways.
|
||||||
|
|
||||||
|
When setting up a
|
||||||
|
[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters)
|
||||||
|
Consul cluster, operators must ensure that all Consul servers in every
|
||||||
|
datacenter must be directly connectable over their WAN-advertised network
|
||||||
|
address from each other.
|
||||||
|
|
||||||
|
If you are using Kubernetes, refer to our [Kubernetes Multi Cluster](/docs/k8s/installation/multi-cluster) documentation.
|
||||||
|
|
||||||
|
This requires that operators setting up the virtual machines or containers
|
||||||
|
hosting the servers take additional steps to ensure the necessary routing and
|
||||||
|
firewall rules are in place to allow the servers to speak to each other over
|
||||||
|
the WAN.
|
||||||
|
|
||||||
|
Sometimes this prerequisite is difficult or undesirable to meet:
|
||||||
|
|
||||||
|
* **Difficult:** The datacenters may exist in multiple Kubernetes clusters that
|
||||||
|
unfortunately have overlapping pod IP subnets, or may exist in different
|
||||||
|
cloud provider VPCs that have overlapping subnets.
|
||||||
|
|
||||||
|
* **Undesirable:** Network security teams may not approve of granting so many
|
||||||
|
firewall rules. When using platform autoscaling, keeping rules up to date becomes untenable.
|
||||||
|
|
||||||
|
Operators looking to simplify their WAN deployment and minimize the exposed
|
||||||
|
security surface area can elect to join these datacenters together using [mesh
|
||||||
|
gateways](/docs/connect/mesh_gateways.html) to do so.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
There are two main kinds of communication that occur over the WAN link spanning
|
||||||
|
the gulf between disparate Consul datacenters:
|
||||||
|
|
||||||
|
* **WAN gossip:** We leverage the serf and memberlist libraries to gossip
|
||||||
|
around failure detector knowledge about Consul servers in each datacenter.
|
||||||
|
By default this operates point to point between servers over `8302/udp` with
|
||||||
|
a fallback to `8302/tcp` (which logs a warning indicating the network is
|
||||||
|
misconfigured).
|
||||||
|
|
||||||
|
* **Cross-datacenter RPCs:** Consul servers expose a special multiplexed port
|
||||||
|
over `8300/tcp`. Several distinct kinds of messages can be received on this
|
||||||
|
port, such as RPC requests forwarded from servers in other datacenters.
|
||||||
|
|
||||||
|
|
||||||
|
In this network topology individual Consul client agents on a LAN in one
|
||||||
|
datacenter never need to directly dial servers in other datacenters. This
|
||||||
|
means you could introduce a set of firewall rules prohibiting `10.0.0.0/24`
|
||||||
|
from sending any traffic at all to `10.1.2.0/24` for security isolation.
|
||||||
|
|
||||||
|
You may already have configured [mesh
|
||||||
|
gateways](https://learn.hashicorp.com/consul/developer-mesh/connect-gateways)
|
||||||
|
to allow for services in the service mesh to freely connect between datacenters
|
||||||
|
regardless of the lateral connectivity of the nodes hosting the Consul client
|
||||||
|
agents.
|
||||||
|
|
||||||
|
By activating WAN federation via mesh gateways the servers
|
||||||
|
can similarly use the existing mesh gateways to reach each other without
|
||||||
|
themselves being directly reachable.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### TLS
|
||||||
|
|
||||||
|
All Consul servers in all datacenters should have TLS configured with certificates containing
|
||||||
|
these SAN fields:
|
||||||
|
|
||||||
|
server.<this_datacenter>.<domain> (normal)
|
||||||
|
<node_name>.server.<this_datacenter>.<domain> (needed for wan federation)
|
||||||
|
|
||||||
|
This can be achieved using any number of tools, including `consul tls cert
|
||||||
|
create` with the `-node` flag.
|
||||||
|
|
||||||
|
### Mesh Gateways
|
||||||
|
|
||||||
|
There needs to be at least one mesh gateway configured to opt-in to exposing
|
||||||
|
the servers in its configuration. When using the `consul connect envoy` CLI
|
||||||
|
this is done by using the flag `-expose-servers`. All this does is to register
|
||||||
|
the mesh gateway into the catalog with the additional piece of service metadata
|
||||||
|
of `{"consul-wan-federation":"1"}`. If you are registering the mesh gateways
|
||||||
|
into the catalog out of band you may simply add this to your existing
|
||||||
|
registration payload.
|
||||||
|
|
||||||
|
!> Before activating the feature on an existing cluster you should ensure that
|
||||||
|
there is at least one mesh gateway prepared to expose the servers registered in
|
||||||
|
each datacenter otherwise the WAN will become only partly connected.
|
||||||
|
|
||||||
|
### Consul Server Options
|
||||||
|
|
||||||
|
There are a few necessary additional pieces of configuration beyond those
|
||||||
|
required for standing up a
|
||||||
|
[multi-datacenter](https://learn.hashicorp.com/consul/security-networking/datacenters)
|
||||||
|
Consul cluster.
|
||||||
|
|
||||||
|
Consul servers in the _primary_ datacenter should add this snippet to the
|
||||||
|
configuration file:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
connect {
|
||||||
|
enabled = true
|
||||||
|
enable_mesh_gateway_wan_federation = true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Consul servers in all _secondary_ datacenters should add this snippet to the
|
||||||
|
configuration file:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
primary_gateways = [ "<primary-mesh-gateway-ip>:<primary-mesh-gateway-port>", ... ]
|
||||||
|
connect {
|
||||||
|
enabled = true
|
||||||
|
enable_mesh_gateway_wan_federation = true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Any references to [`start_join_wan`](/docs/agent/options#start_join_wan) or [`retry_join_wan`](/docs/agent/options#retry_join_wan) should be omitted.
|
||||||
|
|
||||||
|
-> The `primary_gateways` configuration can also use `go-discover` syntax just
|
||||||
|
like `retry_join_wan`.
|
||||||
|
|
||||||
|
### Bootstrapping
|
||||||
|
|
||||||
|
For ease of debugging (such as avoiding a flurry of misleading error messages)
|
||||||
|
when intending to activate WAN federation via mesh gateways it is best to
|
||||||
|
follow this general procedure:
|
||||||
|
|
||||||
|
### New secondary
|
||||||
|
|
||||||
|
1. Upgrade to the desired version of the consul binary for all servers,
|
||||||
|
clients, and CLI.
|
||||||
|
2. Start all consul servers and clients on the new version in the primary
|
||||||
|
datacenter.
|
||||||
|
3. Ensure the primary datacenter has at least one running, registered mesh gateway with
|
||||||
|
the service metadata key of `{"consul-wan-federation":"1"}` set.
|
||||||
|
4. Ensure you are _prepared_ to launch corresponding mesh gateways in all
|
||||||
|
secondaries. When ACLs are enabled actually registering these requires
|
||||||
|
upstream connectivity to the primary datacenter to authorize catalog
|
||||||
|
registration.
|
||||||
|
5. Ensure all servers in the primary datacenter have updated configuration and
|
||||||
|
restart.
|
||||||
|
6. Ensure all servers in the secondary datacenter have updated configuration.
|
||||||
|
7. Start all consul servers and clients on the new version in the secondary
|
||||||
|
datacenter.
|
||||||
|
8. When ACLs are enabled, shortly afterwards it should become possible to
|
||||||
|
resolve ACL tokens from the secondary, at which time it should be possible
|
||||||
|
to launch the mesh gateways in the secondary datacenter.
|
||||||
|
|
||||||
|
|
||||||
|
### Existing secondary
|
||||||
|
|
||||||
|
1. Upgrade to the desired version of the consul binary for all servers,
|
||||||
|
clients, and CLI.
|
||||||
|
2. Restart all consul servers and clients on the new version.
|
||||||
|
3. Ensure each datacenter has at least one running, registered mesh gateway with the
|
||||||
|
service metadata key of `{"consul-wan-federation":"1"}` set.
|
||||||
|
4. Ensure all servers in the primary datacenter have updated configuration and
|
||||||
|
restart.
|
||||||
|
5. Ensure all servers in the secondary datacenter have updated configuration and
|
||||||
|
restart.
|
||||||
|
|
||||||
|
### Verification
|
||||||
|
|
||||||
|
From any two datacenters joined together double check the following give you an
|
||||||
|
expected result:
|
||||||
|
|
||||||
|
* Check that `consul members -wan` lists all servers in all datacenters with
|
||||||
|
their _local_ ip addresses and are listed as `alive`.
|
||||||
|
|
||||||
|
* Ensure any API request that activates datacenter request forwarding. such as
|
||||||
|
[`/v1/catalog/services?dc=<OTHER_DATACENTER_NAME>`](/api/catalog.html#dc-1)
|
||||||
|
succeeds.
|
Loading…
Reference in New Issue