In current implementation of Consul, check alias cannot determine
if a service exists or not. Because a service without any check
is semantically considered as passing, so when no healthchecks
are found for an agent, the check was considered as passing.
But this make little sense as the current implementation does not
make any difference between:
* a non-existing service (passing)
* a service without any check (passing as well)
In order to make it work, we have to ensure that when a check did
not find any healthcheck, the service does indeed exists. If it
does not, lets consider the check as failing.
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.
If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.
Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
The ACL.GetPolicy RPC endpoint was supposed to return the “parent” policy and not always the default policy. In the case of legacy management tokens the parent policy was supposed to be “manage”. The result of us not sending this properly was that operations that required specifically a management token such as saving a snapshot would not work in secondary DCs until they were upgraded.
* testing: replace most goe/verify.Values with require.Equal
One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.
Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.
* Remove github.com/pascaldekloe/goe/verify
Reduce the number of assertion packages we use from 2 to 1
In the past TLS usage was enforced with these variables, but these days
this decision is made by TLSConfigurator and there is no reason to keep
using the variables.
The version field has been used to decide which multiplexing to use. It
was introduced in 2457293dceec95ecd12ef4f01442e13710ea131a. But this is
6y ago and there is no need for this differentiation anymore.
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.
In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
The main fix here is to always union the `primary-gateways` list with
the list of mesh gateways in the primary returned from the replicated
federation states list. This will allow any replicated (incorrect) state
to be supplemented with user-configured (correct) state in the config
file. Eventually the game of random selection whack-a-mole will pick a
winning entry and re-replicate the latest federation states from the
primary. If the user-configured state is actually the incorrect one,
then the same eventual correct selection process will work in that case,
too.
The secondary fix is actually to finish making wanfed-via-mgws actually
work as originally designed. Once a secondary datacenter has replicated
federation states for the primary AND managed to stand up its own local
mesh gateways then all of the RPCs from a secondary to the primary
SHOULD go through two sets of mesh gateways to arrive in the consul
servers in the primary (one hop for the secondary datacenter's mesh
gateway, and one hop through the primary datacenter's mesh gateway).
This was neglected in the initial implementation. While everything
works, ideally we should treat communications that go around the mesh
gateways as just provided for bootstrapping purposes.
Now we heuristically use the success/failure history of the federation
state replicator goroutine loop to determine if our current mesh gateway
route is working as intended. If it is, we try using the local gateways,
and if those don't work we fall back on trying the primary via the union
of the replicated state and the go-discover configuration flags.
This can be improved slightly in the future by possibly initializing the
gateway choice to local on startup if we already have replicated state.
This PR does not address that improvement.
Fixes#7339
* Standardize support for Tagged and BindAddresses in Ingress Gateways
This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.
* Update command/connect/envoy/envoy.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* PR comments
* Check to see if address is an actual IP address
* Update agent/xds/listeners.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix whitespace
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Currently checks of type gRPC will emit log messages such as,
2020/02/12 13:48:22 [INFO] parsed scheme: ""
2020/02/12 13:48:22 [INFO] scheme "" not registered, fallback to default scheme
Without adding full support for using custom gRPC schemes (maybe that's
right long-term path) we can just supply the default scheme as provided
by the grpc library.
Fixes https://github.com/hashicorp/consul/issues/7274
and https://github.com/hashicorp/nomad/issues/7415
Errors are values. We can use the error value to identify the 'comparison failed' case which makes the function easier to use and should make it harder to miss handle the error case
Based on work done in https://github.com/hashicorp/memberlist/pull/196
this allows to restrict the IP ranges that can join a given Serf cluster
and be a member of the cluster.
Restrictions on IPs can be done separatly using 2 new differents flags
and config options to restrict IPs for LAN and WAN Serf.
Handling errors at the end of a log switch/case block is somewhat
brittle. This block included a couple cases where errors were ignored,
but it was not obvious the way it was written.
This change moves all error handling into each case block. There is
still potentially one case where err is ignored, which will be handled
in a follow up.
The usage was removed in 8e22d80e3550592a32144f2d1c84ebf6becb957b,
however it seems there may be a bug here because the cluster name
is not updated when the target changes.
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
This commit converts the previous error into just a Warn-level log
message. By returning an error when the requested service was not a
gateway, we did not appropriately update envoy because the cache Fetch
returned an error and thus did not propagate the update through proxycfg
and xds packages.
Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests.
With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.
* Return early from updateGatewayServices if nothing to update
Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table
* PR comment and better formatting
go test will only run tests in parallel within a single package. In this case the package test run time is exactly the same with or without t.Parallel() (~0.7s).
In generally we should avoid t.Parallel() as it causes a number of problems with `go test` not reporting failure messages correctly. I encountered one of these problems, which is what prompted this change. Since `t.Parallel` is not providing any benefit in this package, this commit removes it.
The change was automated with:
git grep -l 't.Parallel' | xargs sed -i -e '/t.Parallel/d'
This is useful when updating an config entry with no services, and the
expected behavior is that envoy closes all listeners and clusters.
We also allow empty routes because ingress gateways name route
configurations based on the port of the listener, so it is important we
remove any stale routes. Then, if a new listener with an old port is
added, we will not have to deal with stale routes hanging around routing
to the wrong place.
Endpoints are associated with clusters, and thus by deleting the
clusters we don't have to care about sending empty endpoint responses.
- Use correct enterprise metadata for finding config entry
- nil out cancel functions on config snapshot copy
- Look at HostsSet when checking validity
We require any non-wildcard services to match the protocol defined in
the listener on write, so that we can maintain a consistent experience
through ingress gateways. This also helps guard against accidental
misconfiguration by a user.
- Update tests that require an updated protocol for ingress gateways
We can only allow host names that are valid domain names because we put
these hosts into a DNSSAN. In addition, we validate that the wildcard
specifier '*' is only present as the leftmost label to allow for a
wildcard DNSSAN and associated wildcard Host routing in the ingress
gateway proxy.
This now requires some type of protocol setting in ingress gateway tests
to ensure the services are not filtered out.
- small refactor to add a max(x, y) function
- Use internal configEntryTxn function and add MaxUint64 to lib
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.
To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.
- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
route configuration when on an ingress gateway. This allows default
services to easily use host header routing
- ingress-gateways have a single route config for each listener
that utilizes domain matching to route to different services.
This will emit warnings about the configs not doing anything but still allow them to be parsed.
This also added the warnings for enterprise fields that we already had in OSS but didn’t change their enforcement behavior. For example, attempting to use a network segment will cause a hard error in OSS.
This is a collection of refactors that make upcoming PRs easier to digest.
The main change is the introduction of the authmethod.Identity struct.
In the one and only current auth method (type=kubernetes) all of the
trusted identity attributes are both selectable and projectable, so they
were just passed around as a map[string]string.
When namespaces were added, this was slightly changed so that the
enterprise metadata can also come back from the login operation, so
login now returned two fields.
Now with some upcoming auth methods it won't be true that all identity
attributes will be both selectable and projectable, so rather than
update the login function to return 3 pieces of data it seemed worth it
to wrap those fields up and give them a proper name.
Also ensure that WatchSets in tests are reset between calls to watchFired.
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.
In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
This commit copies many of the connect-proxy xds testcases and reuses
for ingress gateways. This allows us to more easily see changes to the
envoy configuration when make updates to ingress gateways.
While investigating and fixing an issue on our 1.5.1 branch,
I saw you also/already fixed the bug I found (tags not updated
for existing servers), but comment is misplaced.
This UnmarshalJSON was never called. The decode function is passed a map[string]interface
so it has no way of knowing that this function exists.
Tested by adding a panic to this function and watching the tests pass.
I attempted to use this Unmarshal function by passing in the type, however the tests
showed that it does not work. The test was failing to parse the request.
If the performance of this endpoint is indeed critical we can solve the problem by adding
all the fields to the request struct and handling the normalziation without a custom Unmarshal.
This makes it so that both OSS and enterprise tests pass correctly
In the api tests, explicitly set namespace to empty string so that tests
can be shared.
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.