Expand service mesh dev docs (#15867)

This commit is contained in:
Freddy 2022-12-22 12:18:38 -07:00 committed by GitHub
parent ddba394070
commit 0cc8f45f28
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 196 additions and 30 deletions

View File

@ -1,16 +1,84 @@
# Service Mesh (Connect)
## Terminology
### Data plane
The data plane refers to gateways, sidecar proxies, or native application libraries that are in the request path of applications and embed service mesh logic around routing, authorization, and observability.
- call out: envoy/proxy is the data plane, Consul is the control plane
- [Configuration Entries](./config-entries)
- [xDS Server] - a gRPC service that implements [xDS] and handles requests from an [envoy proxy].
- [agent/proxycfg]
- [Certificate Authority](./ca) for issuing TLS certs for services and client agents
- command/connect/envoy - bootstrapping and running envoy
- command/connect/proxy - built-in proxy that is dev-only and not supported
for production.
- `connect/` - "Native" service mesh
For production deployments we primarily support [Envoy](https://www.envoyproxy.io/) proxy. Active development of service mesh functionality is focused on Envoy.
[xDS Server]: ./xds.md
[xDS]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol
[envoy proxy]: https://www.consul.io/docs/connect/proxies/envoy
[agent/proxycfg]: https://github.com/hashicorp/consul/blob/main/agent/proxycfg
### Control plane
At a high level, the primary goal of the control plane in a service mesh is to provide configuration for the data plane. Consul's service mesh is composed of: server agents, client agents, and [consul-dataplane](https://github.com/hashicorp/consul-dataplane) proxies.
The control plane allows users to configure policies for the service mesh, and then translates these into configuration that the data plane components will use to execute the intended functionality.
A key distinction from the data plane is that the control plane is largely not in the request path of service to service traffic. The notable exception to this rule is the [/agent/connect/authorize](https://developer.hashicorp.com/consul/api-docs/agent/connect#authorize), discussed in the Native integration below.
### Connect Native
Consul's service mesh supports a "native" app integration. In this setup users must explicitly request leaf and root certificates from Consul for use in service-to-service mTLS. Additionally, to consider intentions for authorization, applications can issue an authorization check to a Consul agent.
The Go client library for this integration exists in the [connect](https://github.com/hashicorp/consul/tree/main/connect) package.
**APIs:**
* [/agent/connect/authorize](https://developer.hashicorp.com/consul/api-docs/agent/connect#authorize) can be used to evaluate whether intentions allow connections by a client to some target service.
* [/agent/connect/ca/leaf/:service](https://developer.hashicorp.com/consul/api-docs/agent/connect#service-leaf-certificate) can be used to request a leaf certificate for a service instance. This is the certificate to present during the mTLS handshake
* [/agent/connect/ca/roots](https://developer.hashicorp.com/consul/api-docs/agent/connect#certificate-authority-ca-roots) can be used to request the trusted certificate authority's root certificates. These are the certificates used to verify leaf certificates presented in the mTLS handshake.
### Built-in Proxy
Consul's service mesh was released with a built-in proxy. This proxy provides basic functionality as outlined in its [documentation](https://developer.hashicorp.com/consul/docs/connect/proxies/built-in). This proxy is not supported for production deployments and has not been under active development for several years.
The core of the built-in proxy is implemented in the [connect/proxy](https://github.com/hashicorp/consul/tree/main/connect/proxy) package, and is launched by the [command/connect/proxy](https://github.com/hashicorp/consul/tree/main/command/connect/proxy) package.
## Configuration Lifecycle
![Configuring Envoy](./configuring-envoy.png)
The high-level flow of configuring Envoy is:
1. The initial "bootstrap" configuration is generated for an Envoy proxy by a consul-dataplane instance or a Consul client agent.
2. Envoy dials the xDS server, requesting configuration to act as a particular proxy or gateway instance. The xDS server will either be a Consul server or a Consul client agent.
3. Consul in initializes internal watches for the snapshot of data necessary to configure Envoy. This snapshot will contain data as collected from Consul's state.
4. As these snapshots are generated and updated, Consul will generate and push Envoy configuration for the various xDS resource types if there were changes.
### Bootstrapping Envoy proxies
Consul generates the initial "bootstrap" configuration file for Envoy proxy instances, and can optionally launch the Envoy process itself.
The basic information provided in Envoy's bootstrap configuration contains:
* The listener address and port for [Envoy's administration interface](https://www.envoyproxy.io/docs/envoy/latest/operations/admin).
* The ID, namespace, and admin partition of the corresponding sidecar proxy registration in Consul's catalog.
* Configuration on how to reach Consul, and the Consul ACL token to present.
This process is handled by two different components depending on whether Consul client agents are in use:
* [consul-dataplane](https://github.com/hashicorp/consul-dataplane) is used in "agentless" Consul.
* [command/connect/envoy](https://github.com/hashicorp/consul/tree/main/command/connect/envoy) is used with Consul agents.
### Internal resource watches
The `proxycfg-*` family of packages drive the process of generating snapshots containing all of the data necessary to configure an Envoy proxy. This snapshot is populated via internal watches to resources such as configuration entries and service registrations.
When initialized on a client agent these watches flow through the agent cache, which manages the associated blocking queries. On the other hand, when initialized on a Consul server these watches are done directly against the server's in-memory state store.
For additional details see: [proxycfg](./proxycfg.md).
### Generating xDS Configuration
The `agent/xds` package implements the gRPC service used by Envoy to fetch configuration. At the core of the package is [delta.go](https://github.com/hashicorp/consul/blob/main/agent/xds/delta.go), which contains the implementation of the **Incremental ADS** protocol variant. With this variant there is a single stream between Consul and an Envoy proxy, and on that stream we send configuration diffs based on Envoy's current state.
This package also contains files that generate xDS resources such as Clusters, Endpoints, Listeners, and Routes from snapshots generated by `proxycfg`. These files handle the conversion from Consul's data model to Envoy's.
For additional details see: [xDS Server](./xds.md) and [Envoy's documentation](https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol) on the xDS protocol.
## Additional Components
### Certificate Authority
Consul's certificate authority is the component responsible for certificate management in the service mesh. Certificates issued by the certificate authority are used for two primary reasons:
* mTLS between mesh-enabled applications.
* Intention enforcement.
* TLS between components of the control plane, such as: client agent to server agent when using [auto-config](https://developer.hashicorp.com/consul/tutorials/security-operations/docker-compose-auto-config), or leader server agent to the leader server of a peer cluster.
For additional details see: [Certificate Authority](./ca) and the [public documentation](https://developer.hashicorp.com/consul/docs/connect/ca).
### Configuration Entries
Configuration entries are the primary way to apply configuration or policies uniformly across the mesh. They are stored centrally on Consul's servers, and can be scoped to a service, namespace, admin partition, or a federation of datacenters.
For additional details see: [Configuration Entries](./config-entries) and the [public documentation](https://developer.hashicorp.com/consul/docs/connect/config-entries).

View File

@ -27,4 +27,10 @@ For sidecar proxies the fundamental config entries are [service-defaults](https:
As defaults, their data is of lower precedence compared to data stored in proxy registrations. In Consul, data present in individual proxy registrations **always** has a higher precedence than the equivalent stored in a configuration entry.
## Additional Information:
- [Config Resolution](config-resolution.md): Summary of the mechanics of how configuration entries are resolved for sidecar proxies.
- [Config Resolution](config-resolution.md): Summary of the mechanics of how configuration entries are resolved for sidecar proxies.
## Lifecycle
The diagram below shows the lifecycle of a configuration entry along with the locations where the concrete types are stored and common tests. The steps highlighted along these paths are ones that will likely require code or test updates when modifying a config entry or adding a new config entry kind.
![Life of a Config Entry](./life-of-a-config-entry.png)

BIN
docs/service-mesh/config-entries/life-of-a-config-entry.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
docs/service-mesh/configuring-envoy.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
docs/service-mesh/proxycfg-snapshot-building.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
docs/service-mesh/proxycfg-snapshot-sharing.png (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -0,0 +1,45 @@
# Proxycfg
The `proxycfg-*` family of packages drive the process of generating snapshots containing all of the data necessary to configure an Envoy proxy. This snapshot is populated via internal watches to resources such as configuration entries and service registrations.
When initialized on a client agent these watches flow through the agent cache, which manages the associated blocking queries. On the other hand, when initialized on a Consul server these watches are done directly against the server's in-memory state store.
## agent/proxycfg
### Manager
The `proxycfg.Manager` is responsible for initializing or tearing down the machinery required to watch the internal data required to configure an Envoy proxy. This includes initializing the snapshots of internal data for proxy configuration, kicking off the long-running update routines, and managing the delivery of snapshots to the xDS server.
![Snapshot sharing](./proxycfg-snapshot-sharing.png)
### State management
Building a snapshot of data to configure a proxy is done with a long-running event-processing state machine. When a proxy is first registered with the manager we initialize the known watches that are needed based on the kind of proxy or gateway being watched. Each of these watches will contain the necessary request type, as well as a `CorrelationID`, which acts as a key for the watch. If a watch will not exist for the duration of a proxy instance, we also store a context cancellation function so that the watch can be torn down later.
The results of these watches are then consumed as a stream of update events to a channel. Any time a new event is received, the `handleUpdate` function is called, which contains kind-specific logic. For each new event the `CorrelationID` is inspected to determine what watch the event corresponds to. From an event we may store the data directly or initialize/destroy additional watches.
Since the event updates are processed concurrently, the way to ensure ordering is via chained watches. For example, the discovery chain dictates what upstream instances need to be watched for a logical upstream. Once a discovery chain update is received we then kick off a service discovery watch for the appropriate targets.
![Snapshot building](./proxycfg-snapshot-building.png)
## agent/proxycfg-glue
The dependencies to watch data on Consul's servers are encoded in `proxycfg.DataSources`. For any given resource to watch there is a corresponding data source, which is contained in the `DataSources` type as an interface. These interfaces are uniform:
```go
type <RESOURCE> interface {
Notify(ctx context.Context, req *structs.ServiceDumpRequest, correlationID string, ch chan<- UpdateEvent) error
}
```
Implementations for these interfaces exist within the `proxycfg-glue` package. When using the agentless consul-dataplane the implementation names have the structure: `Server<Resource>`, and when using client agents these implementations have the structure: `Cache<Resource>`.
For each resource there are parallel implementations that use the agent's cache as the data source or the server's state store. Requests to the state store may use subscriptions to Consul's internal event publisher, or a memdb WatchSet. For more information about the event publisher see the [streaming documentation](/docs/rpc/streaming).
If the event publisher contains the necessary data it is preferable to use that as the server datasource over a memdb WatchSet. Memdb's watch sets are susceptible to spurious wake-ups and may lead to doing more work than strictly necessary when a change occurs. The event publisher watches memdb tables for changes and broadcasts incremental events based on the data that changed. It explicitly avoids re-generating all the data for the key being watched.
## agent/proxycfg-sources
Contains implementations of the `agent/xds/ProxyConfigSource` interface, which ensures that proxy instances are registered or deregistered with the `proxycfg.Manager`.
There are two distinct implementations split across two packages. Both of these registers, re-registers, or deregisters watches with the `proxycfg.Manager`.
* `/agent/proxycfg-sources/local`: Path exercised by Consul client agents.
* `/agent/proxycfg-sources/catalog`: Path exercised by Consul server agents.
The primary reason why these two implementations are separate is due to how proxy service registrations are handled in agentless and agentful deployments:
* Server agents watch the catalog for proxy registration changes, while client agents watch their local state
* Server agents merge data from service-defaults and proxy-defaults configuration entries at the `catalog.ConfigSource` sync function, while client agents merge them by hooking into the service registration code path.

View File

@ -1,25 +1,60 @@
# xDS Server
The `agent/xds` package implements the streaming `DeltaAggregatedResources` gRPC service used by Envoy to fetch configuration. At the core of the package is [delta.go](https://github.com/hashicorp/consul/blob/main/agent/xds/delta.go), which contains the implementation of the **Incremental ADS** protocol variant. With this variant there is a single stream between Consul and an Envoy proxy, and on that stream we send configuration diffs based on Envoy's current state.
The xDS Server is a gRPC service that implements [xDS] and handles requests from
an [envoy proxy].
[xDS]: https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol
[envoy proxy]: https://www.consul.io/docs/connect/proxies/envoy
The remainder of this package contains the logic necessary for generating xDS configuration such as Clusters, Endpoints, Listeners, and Routes from snapshots generated by `proxycfg`.
## Authorization
The xDS server authorizes requests by looking at the proxy ID in the request and ensuring the ACL token has `service:write` access to either the destination service (for kind=ConnectProxy), or the gateway service (for other kinds).
Requests to the xDS server are authorized based on an assumption of how
`proxycfg.ConfigSnapshot` are constructed. Most interfaces (HTTP, DNS, RPC)
This authorization strategy is based on the assumption of how the corresponding
`proxycfg.ConfigSnapshot` is constructed. Most interfaces (HTTP, DNS, RPC)
authorize requests by authorizing the data in the response, or by filtering
out data that the requester is not authorized to view. The xDS server authorizes
requests by looking at the proxy ID in the request and ensuring the ACL token has
`service:write` access to either the destination service (for kind=ConnectProxy), or
the gateway service (for other kinds).
out data that the requester is not authorized to view.
This authorization strategy requires that [agent/proxycfg] only fetches data using a
This authorization strategy requires that [agent/proxycfg](https://github.com/hashicorp/consul/blob/main/agent/proxycfg) only fetches data using a
token with the same permissions, and that it only stores data by proxy ID. We assume
that any data in the snapshot was already filtered, which allows this authorization to
only perform a shallow check against the proxy ID.
that any data in the snapshot was already filtered, which allows this authorization at the
xDS server to only perform a shallow check against the proxy ID.
[agent/proxycfg]: https://github.com/hashicorp/consul/blob/main/agent/proxycfg
## Config Generation
The xDS types that Consul supports as of v1.14 are: Clusters, Endpoints, Listeners, and Routes. For each of these resource types there is a corresponding file such as [listeners.go](https://github.com/hashicorp/consul/blob/main/agent/xds/listeners.go). There, the entry-point will take a proxycfg snapshot and generate xDS configuration depending on the kind of proxy being configured. There are diverging paths depending on whether a sidecar is being configured, or a gateway.
## Testing
Testing changes to this package is generally done at two layers:
- Against golden files, where each test case tests against a fixed file containing the JSON representation of an xDS resource.
- In integration tests, which spin up live instances of Consul and Envoy and make assertions against Envoy's metrics or configuration.
### Golden files
Tests against golden files exists in functions with names such as `TestAllResourcesFromSnapshot`, `TestListenersFromSnapshot`, etc. These tests generate xDS configuration from a `proxycfg.ConfigSnapshot`, mimicking how we generate configuration for Envoy.
The primary source for the test snapshots is `proxycfg.TestConfigSnapshot`. This function will construct a snapshot from a list of events by calling `initialize` and `handleUpdate` as we do in production code. You can attach new update events to the snapshot, or override existing events by emitting a replacement event for an existing `CorrelationID`.
When a new test case is added, the corresponding Golden files can be generated using:
```
go test ./agent/xds -update -run TestAllResourcesFromSnapshot
```
The new golden files then must be **manually** inspected to ensure that the Envoy configuration was generated as expected. Tests against golden files do not assert that the configuration works as intended, but rather that it _looks_ as intended.
### Integration tests
#### New consul-container integration tests
TODO
#### Legacy bash-driven integration tests
Updating one of these integration tests may be appropriate if fixing a bug in functionality that is tested there, or making an improvement to functionality tested there. If the new test involves significant modifications to the bash helpers you should consider using adding a `consul-container` integration test instead.
For more information refer to their [documentation](test/integration/connect/envoy/).
## Delta (Incremental) xDS Protocol
Consul's implementation of the incremental xDS protocol exists in the file [delta.go](https://github.com/hashicorp/consul/blob/main/agent/xds/delta.go). The interactions between Envoy follow the general guidance from [Envoy's documentation](https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol),
The xDS stream is a bidirectional stream, and messages from Envoy can be either a: subscription request for resources, an ACK for resources it successfully stored, or a NACK for resources that it did not store. ACKs and NACKs can be associated with specific messages by tracking a "nonce", which Consul generates and pushes in every response sent to Envoy. Consul will only ever send xDS resources, into the stream and will only send resources if it believes that Envoy does not already have them.
For mock examples of how interactions over the stream can play out, refer to the tests in [delta_test.go](https://github.com/hashicorp/consul/blob/main/agent/xds/delta_test.go).
To achieve the aim of only sending incremental diffs to Envoy there are maps tracking the state of both Envoy and Consul. The [xDSDeltaType](https://github.com/hashicorp/consul/blob/c7ef04c5979dbc311ff3c67b7bf3028a93e8b0f1/agent/xds/delta.go#L459) struct contains this data for each xDS resource type:
* `subscriptions` tracks the names that Envoy is interested in for a given xDS type. For example, if Envoy is subscribed to Routes, then this map would contain the names of the routes that Envoy is interested in receiving updates about.
* `resourceVersions` tracks the hash of resources that Envoy has ACK'd. This way we can avoid sending data that Envoy already has.
* `pendingUpdates` holds a map used for tracking data that has been sent to Envoy but has not been ACK'd yet. Certain xDS types have strict ordering requirements to avoid dropping traffic. Tracking pending updates allows us to block sending another update of the same type or sending resources that depend on a previous one being acknowledged.