Currently, if an acceptor peer deletes a peering the dialer's peering
will eventually get to a "terminated" state. If the two clusters need to
be re-peered the acceptor will re-generate the token but the dialer will
encounter this error on the call to establish:
"failed to get addresses to dial peer: failed to refresh peer server
addresses, will continue to use initial addresses: there is no active
peering for "<<<ID>>>""
This is because in `exchangeSecret().GetDialAddresses()` we will get an
error if fetching addresses for an inactive peering. The peering shows
up as inactive at this point because of the existing terminated state.
Rather than checking whether a peering is active we can instead check
whether it was deleted. This way users do not need to delete terminated
peerings in the dialing cluster before re-establishing them.
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
* Protobuf Modernization
Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf
Marshallers (protobuf and json) needed some changes to account for different APIs.
Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.
This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.
* Fix go-mod-tidy make target to work on all modules
Re-add ServerExternalAddresses parameter in GenerateToken endpoint
This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.
Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.
This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.
When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.
Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.
The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.