Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.
In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.
This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.
If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.
Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.
The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.
The resulting behavior from this commit is:
`ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
`ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
`ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
Previously, public referred to gRPC services that are both exposed on
the dedicated gRPC port and have their definitions in the proto-public
directory (so were considered usable by 3rd parties). Whereas private
referred to services on the multiplexed server port that are only usable
by agents and other servers.
Now, we're splitting these definitions, such that external/internal
refers to the port and public/private refers to whether they can be used
by 3rd parties.
This is necessary because the peering replication API needs to be
exposed on the dedicated port, but is not (yet) suitable for use by 3rd
parties.
Peer replication is intended to be between separate Consul installs and
effectively should be considered "external". This PR moves the peer
stream replication bidirectional RPC endpoint to the external gRPC
server and ensures that things continue to function.
This is the OSS portion of enterprise PR 2056.
This commit provides server-local implementations of the proxycfg.ConfigEntry
and proxycfg.ConfigEntryList interfaces, that source data from streaming events.
It makes use of the LocalMaterializer type introduced for peering replication,
adding the necessary support for authorization.
It also adds support for "wildcard" subscriptions (within a topic) to the event
publisher, as this is needed to fetch service-resolvers for all services when
configuring mesh gateways.
Currently, events will be emitted for just the ingress-gateway, service-resolver,
and mesh config entry types, as these are the only entries required by proxycfg
— the events will be emitted on topics named IngressGateway, ServiceResolver,
and MeshConfig topics respectively.
Though these events will only be consumed "locally" for now, they can also be
consumed via the gRPC endpoint (confirmed using grpcurl) so using them from
client agents should be a case of swapping the LocalMaterializer for an
RPCMaterializer.
Having this type live in the agent/consul package makes it difficult to
put anything that relies on token resolution (e.g. the new gRPC services)
in separate packages without introducing import cycles.
For example, if package foo imports agent/consul for the ACLResolveResult
type it means that agent/consul cannot import foo to register its service.
We've previously worked around this by wrapping the ACLResolver to
"downgrade" its return type to an acl.Authorizer - aside from the
added complexity, this also loses the resolved identity information.
In the future, we may want to move the whole ACLResolver into the
acl/resolver package. For now, putting the result type there at least,
fixes the immediate import cycle issues.
Once a peering is marked for deletion a new leader routine will now
clean up all imported resources and then the peering itself.
A lot of the logic was grabbed from the namespace/partitions deferred
deletions but with a handful of simplifications:
- The rate limiting is not configurable.
- Deleting imported nodes/services/checks is done by deleting nodes with
the Txn API. The services and checks are deleted as a side-effect.
- There is no "round rate limiter" like with namespaces and partitions.
This is because peerings are purely local, and deleting a peering in
the datacenter does not depend on deleting data from other DCs like
with WAN-federated namespaces. All rate limiting is handled by the
Raft rate limiter.
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
The importing peer will need to know what SNI and SPIFFE name
corresponds to each exported service. Additionally it will need to know
at a high level the protocol in use (L4/L7) to generate the appropriate
connection pool and local metrics.
For replicated connect synthetic entities we edit the `Connect{}` part
of a `NodeService` to have a new section:
{
"PeerMeta": {
"SNI": [
"web.default.default.owt.external.183150d5-1033-3672-c426-c29205a576b8.consul"
],
"SpiffeID": [
"spiffe://183150d5-1033-3672-c426-c29205a576b8.consul/ns/default/dc/dc1/svc/web"
],
"Protocol": "tcp"
}
}
This data is then replicated and saved as-is at the importing side. Both
SNI and SpiffeID are slices for now until I can be sure we don't need
them for how mesh gateways will ultimately work.
Introduces two new public gRPC endpoints (`Login` and `Logout`) and
includes refactoring of the equivalent net/rpc endpoints to enable the
majority of logic to be reused (i.e. by extracting the `Binder` and
`TokenWriter` types).
This contains the OSS portions of the following enterprise commits:
- 75fcdbfcfa6af21d7128cb2544829ead0b1df603
- bce14b714151af74a7f0110843d640204082630a
- cc508b70fbf58eda144d9af3d71bd0f483985893
* update raft to v1.3.7
* add changelog
* fix compilation error
* fix HeartbeatTimeout
* fix ElectionTimeout to reload only if value is valid
* fix default values for `ElectionTimeout` and `HeartbeatTimeout`
* fix test defaults
* bump raft to v1.3.8
* Implement the ServerDiscovery.WatchServers gRPC endpoint
* Fix the ConnectCA.Sign gRPC endpoints metadata forwarding.
* Unify public gRPC endpoints around the public.TraceID function for request_id logging
Adds a new gRPC endpoint to get envoy bootstrap params. The new consul-dataplane service will use this
endpoint to generate an envoy bootstrap configuration.
Introduces a gRPC endpoint for signing Connect leaf certificates. It's also
the first of the public gRPC endpoints to perform leader-forwarding, so
establishes the pattern of forwarding over the multiplexed internal RPC port.
Previously we had 1 EventPublisher per state.Store. When a state store was closed/abandoned such as during a consul snapshot restore, this had the behavior of force closing subscriptions for that topic and evicting event snapshots from the cache.
The intention of this commit is to keep all that behavior. To that end, the shared EventPublisher now supports the ability to refresh a topic. That will perform the force close + eviction. The FSM upon abandoning the previous state.Store will call RefreshTopic for all the topics with events generated by the state.Store.
* Fixes a lint warning about t.Errorf not supporting %w
* Enable running autopilot on all servers
On the non-leader servers all they do is update the state and do not attempt any modifications.
* Fix the RPC conn limiting tests
Technically they were relying on racey behavior before. Now they should be reliable.
Adds a new gRPC service and endpoint to return the list of supported
consul dataplane features. The Consul Dataplane will use this API to
customize its interaction with that particular server.
Adds a new gRPC streaming endpoint (WatchRoots) that dataplane clients will
use to fetch the current list of active Connect CA roots and receive new
lists whenever the roots are rotated.
Introduces the capability to configure TLS differently for Consul's
listeners/ports (i.e. HTTPS, gRPC, and the internal multiplexed RPC
port) which is useful in scenarios where you may want the HTTPS or
gRPC interfaces to present a certificate signed by a well-known/public
CA, rather than the certificate used for internal communication which
must have a SAN in the form `server.<dc>.consul`.
This commit syncs ENT changes to the OSS repo.
Original commit details in ENT:
```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date: Thu Feb 10 10:23:33 2022 -0800
Vendor fork net rpc (#1538)
* replace net/rpc w consul-net-rpc/net/rpc
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* replace msgpackrpc and go-msgpack with fork from mono repo
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* gofmt all files touched
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>