open-consul/test/integration/connect/envoy/case-prometheus/verify.bats

#!/usr/bin/env bats

load helpers

@test "s1 proxy admin is up on :19000" {
  retry_default curl -f -s localhost:19000/stats -o /dev/null
}

@test "s2 proxy admin is up on :19001" {
  retry_default curl -f -s localhost:19001/stats -o /dev/null
}

@test "s1 proxy listener should be up and have right cert" {
  assert_proxy_presents_cert_uri localhost:21000 s1
}

@test "s2 proxy listener should be up and have right cert" {
  assert_proxy_presents_cert_uri localhost:21001 s2
}

@test "s2 proxy should be healthy" {
  assert_service_has_healthy_instances s2 1
}

@test "s1 upstream should have healthy endpoints for s2" {
  # protocol is configured in an upstream override so the cluster name is customized here
  assert_upstream_has_endpoints_in_status 127.0.0.1:19000 1a47f6e1~s2.default.primary HEALTHY 1
}

@test "s1 upstream should be able to connect to s2 with http/1.1" {
  run retry_default curl --http1.1 -s -f -d hello localhost:5000
  [ "$status" -eq 0 ]
  [ "$output" = "hello" ]
}

@test "s1 proxy should be exposing metrics to prometheus" {
  # Should have http metrics. This is just a sample one. Require the metric to
  # be present not just found in a comment (anchor the regexp).
  retry_default \
    must_match_in_prometheus_response localhost:1234 \
    '^envoy_http_downstream_rq_active'

  # Should be labelling with consul_source_service.
  retry_default \
    must_match_in_prometheus_response localhost:1234 \
    '[\{,]consul_source_service="s1"[,}] '

  # Should be labelling with http listener prefix.
  retry_default \
    must_match_in_prometheus_response localhost:1234 \
    '[\{,]envoy_http_conn_manager_prefix="public_listener"[,}]'
}
Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00			`#!/usr/bin/env bats`

			`load helpers`

			`@test "s1 proxy admin is up on :19000" {`
			`retry_default curl -f -s localhost:19000/stats -o /dev/null`
			`}`

			`@test "s2 proxy admin is up on :19001" {`
			`retry_default curl -f -s localhost:19001/stats -o /dev/null`
			`}`

			`@test "s1 proxy listener should be up and have right cert" {`
			`assert_proxy_presents_cert_uri localhost:21000 s1`
			`}`

			`@test "s2 proxy listener should be up and have right cert" {`
			`assert_proxy_presents_cert_uri localhost:21001 s2`
			`}`

test: for envoy integration tests, wait until 's2' is healthy in consul before interrogating envoy (#6108) When the envoy healthy panic threshold was explicitly disabled as part of L7 traffic management it changed how envoy decided to load balance to endpoints in a cluster. This only matters when envoy is in "panic mode" aka "when you have a bunch of unhealthy endpoints". Panic mode sends traffic to unhealthy instances in certain circumstances. Note: Prior to explicitly disabling the healthy panic threshold, the default value is 50%. What was happening is that the test harness was bringing up consul the sidecars, and the service instances all at once and sometimes the proxies wouldn't have time to be checked by consul to be labeled as 'passing' in the catalog before a round of EDS happened. The xDS server in consul effectively queries /v1/health/connect/s2 and gets 1 result, but that one result has a 'critical' check so the xDS server sends back that endpoint labeled as UNHEALTHY. Envoy sees that 100% of the endpoints in the cluster are unhealthy and would enter panic mode and still send traffic to s2. This is why the test suites PRIOR to disabling the healthy panic threshold worked. They were _incorrectly_ passing. When the healthy panic threshol is disabled, envoy never enters panic mode in this situation and thus the cluster has zero healthy endpoints so load balancing goes nowhere and the tests fail. Why does this only affect the test suites for envoy 1.8.0? My guess is that https://github.com/envoyproxy/envoy/pull/4442 was merged into the 1.9.x series and somehow that plays a role. This PR modifies the bats scripts to explicitly wait until the upstream sidecar is healthy as measured by /v1/health/connect/s2?passing BEFORE trying to interrogate envoy which should make the tests less racy. 2019-07-10 20:58:25 +00:00			`@test "s2 proxy should be healthy" {`
			`assert_service_has_healthy_instances s2 1`
			`}`

tests: further reduce envoy integration test flakiness (#6112) In addition to waiting until s2 shows up healthy in the Catalog, wait until s2 endpoints show up healthy via EDS in the s1 upstream clusters. 2019-07-12 16:12:56 +00:00			`@test "s1 upstream should have healthy endpoints for s2" {`
connect: reconcile how upstream configuration works with discovery chains (#6225) * connect: reconcile how upstream configuration works with discovery chains The following upstream config fields for connect sidecars sanely integrate into discovery chain resolution: - Destination Namespace/Datacenter: Compilation occurs locally but using different default values for namespaces and datacenters. The xDS clusters that are created are named as they normally would be. - Mesh Gateway Mode (single upstream): If set this value overrides any value computed for any resolver for the entire discovery chain. The xDS clusters that are created may be named differently (see below). - Mesh Gateway Mode (whole sidecar): If set this value overrides any value computed for any resolver for the entire discovery chain. If this is specifically overridden for a single upstream this value is ignored in that case. The xDS clusters that are created may be named differently (see below). - Protocol (in opaque config): If set this value overrides the value computed when evaluating the entire discovery chain. If the normal chain would be TCP or if this override is set to TCP then the result is that we explicitly disable L7 Routing and Splitting. The xDS clusters that are created may be named differently (see below). - Connect Timeout (in opaque config): If set this value overrides the value for any resolver in the entire discovery chain. The xDS clusters that are created may be named differently (see below). If any of the above overrides affect the actual result of compiling the discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op override to "tcp") then the relevant parameters are hashed and provided to the xDS layer as a prefix for use in naming the Clusters. This is to ensure that if one Upstream discovery chain has no overrides and tangentially needs a cluster named "api.default.XXX", and another Upstream does have overrides for "api.default.XXX" that they won't cross-pollinate against the operator's wishes. Fixes #6159 2019-08-02 03:03:34 +00:00			`# protocol is configured in an upstream override so the cluster name is customized here`
xds: improve how envoy metrics are emitted (#6312) Since generated envoy clusters all are named using (mostly) SNI syntax we can have envoy read the various fields out of that structure and emit it as stats labels to the various telemetry backends. I changed the delimiter for the 'customization hash' from ':' to '~' because ':' is always reencoded by envoy as '_' when generating metrics keys. 2019-08-16 14:30:17 +00:00			`assert_upstream_has_endpoints_in_status 127.0.0.1:19000 1a47f6e1~s2.default.primary HEALTHY 1`
tests: further reduce envoy integration test flakiness (#6112) In addition to waiting until s2 shows up healthy in the Catalog, wait until s2 endpoints show up healthy via EDS in the s1 upstream clusters. 2019-07-12 16:12:56 +00:00			`}`

Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00			`@test "s1 upstream should be able to connect to s2 with http/1.1" {`
			`run retry_default curl --http1.1 -s -f -d hello localhost:5000`
			`[ "$status" -eq 0 ]`
			`[ "$output" = "hello" ]`
			`}`

			`@test "s1 proxy should be exposing metrics to prometheus" {`
			`# Should have http metrics. This is just a sample one. Require the metric to`
			`# be present not just found in a comment (anchor the regexp).`
Add integration test for central config; fix central config WIP (#5752) * Add integration test for central config; fix central config WIP * Add integration test for central config; fix central config WIP * Set proxy protocol correctly and begin adding upstream support * Add upstreams to service config cache key and start new notify watcher if they change. This doesn't update the tests to pass though. * Fix some merging logic get things working manually with a hack (TODO fix properly) * Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway * Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does * Fix up service manageer and API test failures * Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally * Remove version.go hack - will make integration test fail until release * Remove unused code from commands and upstream merge * Re-bump version to 1.5.0 2019-05-01 23:39:31 +00:00			`retry_default \`
Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00			`must_match_in_prometheus_response localhost:1234 \`
			`'^envoy_http_downstream_rq_active'`

Add DC and NS support for Envoy metrics (#9207) This PR updates the tags that we generate for Envoy stats. Several of these come with breaking changes, since we can't keep two stats prefixes for a filter. 2020-11-16 23:37:19 +00:00			`# Should be labelling with consul_source_service.`
Add integration test for central config; fix central config WIP (#5752) * Add integration test for central config; fix central config WIP * Add integration test for central config; fix central config WIP * Set proxy protocol correctly and begin adding upstream support * Add upstreams to service config cache key and start new notify watcher if they change. This doesn't update the tests to pass though. * Fix some merging logic get things working manually with a hack (TODO fix properly) * Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway * Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does * Fix up service manageer and API test failures * Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally * Remove version.go hack - will make integration test fail until release * Remove unused code from commands and upstream merge * Re-bump version to 1.5.0 2019-05-01 23:39:31 +00:00			`retry_default \`
Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00			`must_match_in_prometheus_response localhost:1234 \`
Add DC and NS support for Envoy metrics (#9207) This PR updates the tags that we generate for Envoy stats. Several of these come with breaking changes, since we can't keep two stats prefixes for a filter. 2020-11-16 23:37:19 +00:00			`'[\{,]consul_source_service="s1"[,}] '`
Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00
			`# Should be labelling with http listener prefix.`
Add integration test for central config; fix central config WIP (#5752) * Add integration test for central config; fix central config WIP * Add integration test for central config; fix central config WIP * Set proxy protocol correctly and begin adding upstream support * Add upstreams to service config cache key and start new notify watcher if they change. This doesn't update the tests to pass though. * Fix some merging logic get things working manually with a hack (TODO fix properly) * Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway * Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does * Fix up service manageer and API test failures * Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally * Remove version.go hack - will make integration test fail until release * Remove unused code from commands and upstream merge * Re-bump version to 1.5.0 2019-05-01 23:39:31 +00:00			`retry_default \`
Connect: allow configuring Envoy for L7 Observability (#5558) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle 2019-04-29 16:27:57 +00:00			`must_match_in_prometheus_response localhost:1234 \`
Add DC and NS support for Envoy metrics (#9207) This PR updates the tags that we generate for Envoy stats. Several of these come with breaking changes, since we can't keep two stats prefixes for a filter. 2020-11-16 23:37:19 +00:00			`'[\{,]envoy_http_conn_manager_prefix="public_listener"[,}]'`
test: for envoy integration tests, wait until 's2' is healthy in consul before interrogating envoy (#6108) When the envoy healthy panic threshold was explicitly disabled as part of L7 traffic management it changed how envoy decided to load balance to endpoints in a cluster. This only matters when envoy is in "panic mode" aka "when you have a bunch of unhealthy endpoints". Panic mode sends traffic to unhealthy instances in certain circumstances. Note: Prior to explicitly disabling the healthy panic threshold, the default value is 50%. What was happening is that the test harness was bringing up consul the sidecars, and the service instances all at once and sometimes the proxies wouldn't have time to be checked by consul to be labeled as 'passing' in the catalog before a round of EDS happened. The xDS server in consul effectively queries /v1/health/connect/s2 and gets 1 result, but that one result has a 'critical' check so the xDS server sends back that endpoint labeled as UNHEALTHY. Envoy sees that 100% of the endpoints in the cluster are unhealthy and would enter panic mode and still send traffic to s2. This is why the test suites PRIOR to disabling the healthy panic threshold worked. They were _incorrectly_ passing. When the healthy panic threshol is disabled, envoy never enters panic mode in this situation and thus the cluster has zero healthy endpoints so load balancing goes nowhere and the tests fail. Why does this only affect the test suites for envoy 1.8.0? My guess is that https://github.com/envoyproxy/envoy/pull/4442 was merged into the 1.9.x series and somehow that plays a role. This PR modifies the bats scripts to explicitly wait until the upstream sidecar is healthy as measured by /v1/health/connect/s2?passing BEFORE trying to interrogate envoy which should make the tests less racy. 2019-07-10 20:58:25 +00:00			`}`