The integration test TestEnvoy/case-ingress-gateway-multiple-services is flaky
and this possibly reduces the flakiness by explicitly waiting for services to show
up in the catalog as healthy before waiting for them to show up in envoy as
healthy which gives it just a bit more time to sync.
more readable in CI.
```
Running primary verification step for case-ingress-gateway-multiple-services...
�[34;1mverify.bats
�[0m�[1G ingress proxy admin is up on :20000�[K�[75G 1/12�[2G�[1G ✓ ingress proxy admin is up on :20000�[K
�[0m�[1G s1 proxy admin is up on :19000�[K�[75G 2/12�[2G�[1G ✓ s1 proxy admin is up on :19000�[K
�[0m�[1G s2 proxy admin is up on :19001�[K�[75G 3/12�[2G�[1G ✓ s2 proxy admin is up on :19001�[K
�[0m�[1G s1 proxy listener should be up and have right cert�[K�[75G 4/12�[2G�[1G ✓ s1 proxy listener should be up and have right cert�[K
�[0m�[1G s2 proxy listener should be up and have right cert�[K�[75G 5/12�[2G�[1G ✓ s2 proxy listener should be up and have right cert�[K
�[0m�[1G ingress-gateway should have healthy endpoints for s1�[K�[75G 6/12�[2G�[31;1m�[1G ✗ ingress-gateway should have healthy endpoints for s1�[K
�[0m�[31;22m (from function `assert_upstream_has_endpoints_in_status' in file /workdir/primary/bats/helpers.bash, line 385,
```
versus
```
Running primary verification step for case-ingress-gateway-multiple-services...
1..12
ok 1 ingress proxy admin is up on :20000
ok 2 s1 proxy admin is up on :19000
ok 3 s2 proxy admin is up on :19001
ok 4 s1 proxy listener should be up and have right cert
ok 5 s2 proxy listener should be up and have right cert
not ok 6 ingress-gateway should have healthy endpoints for s1
not ok 7 s1 proxy should have been configured with max_connections in services
ok 8 ingress-gateway should have healthy endpoints for s2
```
* feat(ingress gateway: support configuring limits in ingress-gateway config entry
- a new Defaults field with max_connections, max_pending_connections, max_requests
is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
individual services to overwrite the value in Default
- added unit test and integration test
- updated doc
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Without this change, you'd see this error:
```
./run-tests.sh: line 49: LAMBDA_TESTS_ENABLED: unbound variable
./run-tests.sh: line 49: LAMBDA_TESTS_ENABLED: unbound variable
```
Locally, always run integration tests using amd64, even if running
on an arm mac. This ensures the architecture locally always matches
the CI/CD environment.
In addition:
* Use consul:local for envoy integration and upgrade tests. Previously,
consul:local was used for upgrade tests and consul-dev for integration
tests. I didn't see a reason to use separate images as it's more
confusing.
* By default, disable the requirement that aws credentials are set.
These are only needed for the lambda tests and make it so you
can't run any tests locally, even if you're not running the
lambda tests. Now they'll only run if the LAMBDA_TESTS_ENABLED
env var is set.
* Split out the building of the Docker image for integration
tests into its own target from `dev-docker`. This allows us to always
use an amd64 image without messing up the `dev-docker` target.
* Add support for passing GO_TEST_FLAGs to `test-envoy-integ` target.
* Add a wait_for_leader function because tests were failing locally
without it.
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.
Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.
Listener filter names were updated to better match the existing regex.
Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
Ensure that the peer stream replication rpc can successfully be used with TLS activated.
Also:
- If key material is configured for the gRPC port but HTTPS is not
enabled now TLS will still be activated for the gRPC port.
- peerstream replication stream opened by the establishing-side will now
ignore grpc.WithBlock so that TLS errors will bubble up instead of
being awkwardly delayed or suppressed
Peer replication is intended to be between separate Consul installs and
effectively should be considered "external". This PR moves the peer
stream replication bidirectional RPC endpoint to the external gRPC
server and ensures that things continue to function.
Require use of mesh gateways in order for service mesh data plane
traffic to flow between peers.
This also adds plumbing for envoy integration tests involving peers, and
one starter peering test.
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
The secondary DC now takes longer to populate the MGW snapshot because
it needs to wait for the secondary CA to be initialized before it can
receive roots and generate xDS config.
Previously MGWs could receive empty roots before the CA was
initialized. This wasn't necessarily a problem since the cluster ID in
the trust domain isn't verified.
We launch one container as part of the test with --pid=host but
apparently within that container it launches a copy of "tini" as a
process supervisor that prefers to be PID 1.
Because it's not PID 1 it logs a warning message about this to the envoy
integration test logs that can lead to thinking somehow that a test
failure is related when in fact it's completely unrelated.
Adding this environment variable avoids the warning.