Commit Graph

10 Commits

Author SHA1 Message Date
cskh b698e04abd
flaky test: use retry long to wait for config entry upgrade (#16068)
* flaky test: use retry long to wait for config entry upgrade

* increase wait for rbac policy
2023-01-26 11:01:17 -05:00
R.B. Boyer adff2316c9
test: use direct service registration in envoy integration tests (#9138)
This has the biggest impact on enterprise test cases that use namespaced
registrations, which prior to this change sometimes failed the initial
registration because the namespace was not yet created.
2020-11-09 13:59:46 -06:00
R.B. Boyer ed3a6bb59d
Fix even more test flakes in intentions related envoy integration tests (#9013)
The key thing here is to use `curl --no-keepalive` so that envoy
pre-1.15 tests will reliably use the latest listener every time.

Extra:

- Switched away from editing line-item intentions the legacy way.

- Removed some teardown scripts, as we don't share anything between cases anyway

- Removed unnecessary use of `run` in some places.
2020-10-26 17:04:35 -05:00
R.B. Boyer 846b80e8a5
fix flaky envoy integration tests involving intentions (#8996)
There is a delay between an intentions change being made, and it being
reflected in the Envoy runtime configuration. Now that the enforcement
happens inside of Envoy instead of over in the agent, our tests need to
explicitly wait until the xDS reconfiguration is complete before
attempting to assert intentions worked.

Also remove a few double retry loops.
2020-10-22 14:30:28 -05:00
Matt Keeler 155cdf022f
Envoy Mesh Gateway integration tests (#6187)
* Allow setting the mesh gateway mode for an upstream in config files

* Add envoy integration test for mesh gateways

This necessitated many supporting changes in most of the other test cases.

Add remote mode mesh gateways integration test
2019-07-24 17:01:42 -04:00
R.B. Boyer e060748d3f
tests: adding new envoy integration tests for L7 service-resolvers (#6129)
Additionally:

- wait for bootstrap config entries to be applied

- run the verify container in the host's PID namespace so we can kill
envoys without mounting the docker socket

* assert that we actually send HEALTHY and UNHEALTHY endpoints down in EDS during failover
2019-07-23 20:08:36 -05:00
R.B. Boyer c7df80ebf9
tests: further reduce envoy integration test flakiness (#6112)
In addition to waiting until s2 shows up healthy in the Catalog, wait
until s2 endpoints show up healthy via EDS in the s1 upstream clusters.
2019-07-12 11:12:56 -05:00
R.B. Boyer 2165e97efa
test: for envoy integration tests, wait until 's2' is healthy in consul before interrogating envoy (#6108)
When the envoy healthy panic threshold was explicitly disabled as part
of L7 traffic management it changed how envoy decided to load balance to
endpoints in a cluster. This only matters when envoy is in "panic mode"
aka "when you have a bunch of unhealthy endpoints". Panic mode sends
traffic to unhealthy instances in certain circumstances.

Note: Prior to explicitly disabling the healthy panic threshold, the
default value is 50%.

What was happening is that the test harness was bringing up consul the
sidecars, and the service instances all at once and sometimes the
proxies wouldn't have time to be checked by consul to be labeled as
'passing' in the catalog before a round of EDS happened.

The xDS server in consul effectively queries /v1/health/connect/s2 and
gets 1 result, but that one result has a 'critical' check so the xDS
server sends back that endpoint labeled as UNHEALTHY.

Envoy sees that 100% of the endpoints in the cluster are unhealthy and
would enter panic mode and still send traffic to s2. This is why the
test suites PRIOR to disabling the healthy panic threshold worked. They
were _incorrectly_ passing.

When the healthy panic threshol is disabled, envoy never enters panic
mode in this situation and thus the cluster has zero healthy endpoints
so load balancing goes nowhere and the tests fail.

Why does this only affect the test suites for envoy 1.8.0? My guess is
that https://github.com/envoyproxy/envoy/pull/4442 was merged into the
1.9.x series and somehow that plays a role.

This PR modifies the bats scripts to explicitly wait until the upstream
sidecar is healthy as measured by /v1/health/connect/s2?passing BEFORE
trying to interrogate envoy which should make the tests less racy.
2019-07-10 15:58:25 -05:00
Paul Banks 078f4cf5bb Add integration test for central config; fix central config WIP (#5752)
* Add integration test for central config; fix central config WIP

* Add integration test for central config; fix central config WIP

* Set proxy protocol correctly and begin adding upstream support

* Add upstreams to service config cache key and start new notify watcher if they change.

This doesn't update the tests to pass though.

* Fix some merging logic get things working manually with a hack (TODO fix properly)

* Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway

* Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does

* Fix up service manageer and API test failures

* Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally

* Remove version.go hack - will make integration test fail until release

* Remove unused code from commands and upstream merge

* Re-bump version to 1.5.0
2019-05-01 16:39:31 -07:00
Paul Banks d6c0557e86
Connect: allow configuring Envoy for L7 Observability (#5558)
* Add support for HTTP proxy listeners

* Add customizable bootstrap configuration options

* Debug logging for xDS AuthZ

* Add Envoy Integration test suite with basic test coverage

* Add envoy command tests to cover new cases

* Add tracing integration test

* Add gRPC support WIP

* Merged changes from master Docker. get CI integration to work with same Dockerfile now

* Make docker build optional for integration

* Enable integration tests again!

* http2 and grpc integration tests and fixes

* Fix up command config tests

* Store all container logs as artifacts in circle on fail

* Add retries to outer part of stats measurements as we keep missing them in CI

* Only dump logs on failing cases

* Fix typos from code review

* Review tidying and make tests pass again

* Add debug logs to exec test.

* Fix legit test failure caused by upstream rename in envoy config

* Attempt to reduce cases of bad TLS handshake in CI integration tests

* bring up the right service

* Add prometheus integration test

* Add test for denied AuthZ both HTTP and TCP

* Try ANSI term for Circle
2019-04-29 17:27:57 +01:00