There are a few changes being made to RedHat's registry on October 20, 2022 that affect the way images need to be tagged prior to being pushed to the registry. This PR changes the tag to conform to the new standard.
We have other work queued up in crt-workflows-common and actions-docker-build to support the other required changes.
This PR should be merged to `main` and all release branches on or after October 20, 2022, and MUST be merged before your next production release. Otherwise, the automation to push to the RedHat registry will not work.
----
A detailed list of changes shared from RedHat (as an FYI):
The following changes will occur for container certification projects that leverage the Red Hat hosted registry [[registry.connect.redhat.com](http://registry.connect.redhat.com/)] for image distribution:
- All currently published images are migrating to a NEW, Red Hat hosted quay registry. Partners do not have to do anything for this migration, and this will not impact customers. The registry will still utilize [registry.connect.redhat.com](http://registry.connect.redhat.com/) as the registry URL.
- The registry URL currently used to push, tag, and certify images, as well as the registry login key, will change. You can see these changes under the “Images” tab of the container certification project. You will now see a [quay.io](http://quay.io/) address and will no longer see [scan.connect.redhat.com](http://scan.connect.redhat.com/).
- Partners will have the opportunity to auto-publish images by selecting “Auto-publish” in the Settings tab of your certification project. This will automatically publish images that pass all certification tests.
- For new container image projects, partners will have the option to host within their own chosen image registry while using [registry.connect.redhat.com](http://registry.connect.redhat.com/) as a proxy address. This means the end user can authenticate to the Red Hat registry to pull a partner image without having to provide additional authentication to the partner’s registry.
* peering: skip register duplicate node and check from the peer
* Prebuilt the nodes map and checks map to avoid repeated for loop
* use key type to struct: node id, service id, and check id
The integration test TestEnvoy/case-ingress-gateway-multiple-services is flaky
and this possibly reduces the flakiness by explicitly waiting for services to show
up in the catalog as healthy before waiting for them to show up in envoy as
healthy which gives it just a bit more time to sync.
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.
Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
Care must be taken when replacing mesh gateways in the primary
datacenter, because if the old addresses become unreachable before the
secondary datacenters receive the new addresses then the primary
datacenter overall will become unreachable.
This commit adds docs related to this class of upgrades.
In 1.13.2 we added a new flag called use_auto_cert to address issues
previously documented in the upgrade guide. Originally there was no way
to disable TLS for gRPC when auto-encrypt was in use, because TLS was
enabled for gRPC due to the presence of auto-encrypt certs.
As of 1.13.2, using auto-encrypt certs as the signal to enable TLS for
gRPC is opt-in only. Meaning that if anyone who had upgraded to 1.13
relied on that side-effect, they now need to explicitly configure it.
Use local-storage service, prototyped here https://github.com/LevelbossMike/local-storage-service, to manage local storage usage in an octane way. Does not write to local storage in tests by default and is easy to stub out.
There is a bug in the error handling code for the Agent cache subsystem discovered:
1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
a loop (which backs off on-error up to 1 minute)
2. getWithIndex calls fetch if there’s no valid entry in the cache
3. fetch starts a goroutine which calls Fetch on the cache-type, waits
for a while (again with backoff up to 1 minute for errors) and then
calls fetch to trigger a refresh
The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.
This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.
In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
In practice this was masked by #14956 and was only uncovered fixing the
other bug.
go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal
would fail when only #14956 was fixed.