Secondary CA initialization steps are:
• Wait until the primary will be capable of signing intermediate certs. We use serf metadata to check the versions of servers in the primary which avoids needing a token like the previous implementation that used RPCs. We require at least one alive server in the primary and the all alive servers meet the version requirement.
• Initialize the secondary CA by getting the primary to sign an intermediate
When a primary dc is configured, if no existing CA is initialized and for whatever reason we cannot initialize a secondary CA the secondary DC will remain without a CA. As soon as it can it will initialize the secondary CA by pulling the primaries roots and getting the primary to sign an intermediate.
This also fixes a segfault that can happen during leadership revocation. There was a spot in the secondaryCARootsWatch that was getting the CA Provider and executing methods on it without nil checking. Under normal circumstances it wont be nil but during leadership revocation it gets nil'ed out. Therefore there is a period of time between closing the stop chan and when the go routine is actually stopped where it could read a nil provider and cause a segfault.
Auto-encrypt meant to fallback to the default port when it wasn't provided, but it hadn't been because of an issue with the error handling. We were checking against an incomplete error value:
"missing port in address" vs "address $HOST: missing port in address"
Additionally, all RPCs to AutoEncrypt.Sign were using a.config.ServerPort, so those were updated to use ports resolved by resolveAddrs, if they are available.
All these changes should have no side-effects or change behavior:
- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
I can only assume we want to check for the retrieved `updatedToken` to not be
nil, before accessing it below.
`token` can't possibly be nil at this point, as we accessed `token.AccessorID`
just before.
* Ensure the mesh gateway configuration comes back in the api within each upstream
* Add a test for the MeshGatewayConfig in the ToAPI functions
* Ensure we don’t use gateways for dc local connections
* Update the svc kind index for deletions
* Replace the proxycfg.state cache with an interface for testing
Also start implementing proxycfg state testing.
* Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.
Also:
- add back an internal http endpoint to dump a compiled discovery chain for debugging purposes
Before the CompiledDiscoveryChain.IsDefault() method would test:
- is this chain just one resolver step?
- is that resolver step just the default?
But what I forgot to test:
- is that resolver step for the same service that the chain represents?
This last point is important because if you configured just one config
entry:
kind = "service-resolver"
name = "web"
redirect {
service = "other"
}
and requested the chain for "web" you'd get back a **default** resolver
for "other". In the xDS code the IsDefault() method is used to
determine if this chain is "empty". If it is then we use the
pre-discovery-chain logic that just uses data embedded in the Upstream
object (and still lets the escape hatches function).
In the example above that means certain parts of the xDS code were going
to try referencing a cluster named "web..." despite the other parts of
the xDS code maintaining clusters named "other...".
* Retry the creation of the test server three times.
* Reduce the retry timeout for the API wait to 2 seconds, opting to fail faster and start over.
* Remove wait for leader from server creation. This wait can be added on a test by test basis now that the function is being exported.
* Remove wait for anti-entropy sync. This is built into the existing WaitForSerfCheck func, so that can be used if the anti-entropy wait is needed
Previously a sequence of events like:
Start
Stop
Start
Stop
would segfault on the second stop because the original ctx and cancel func were only initialized during the constructor and not during Start.
* Make cluster names SNI always
* Update some tests
* Ensure we check for prepared query types
* Use sni for route cluster names
* Proper mesh gateway mode defaulting when the discovery chain is used
* Ignore service splits from PatchSliceOfMaps
* Update some xds golden files for proper test output
* Allow for grpc/http listeners/cluster configs with the disco chain
* Update stats expectation
maxIndexWatchTxn was only watching the IndexEntry of the max index of all the entries. It needed to watch all of them regardless of which was the max.
Also plumbed the query source through in the proxy config to help better track requests.
The general problem was that a the CA config which contained the trust domain was happening outside of the blocking mechanism so if the client started the blocking query before the primary dcs roots had been set then a state trust domain was being pushed down.
This was fixed here but in the future we should probably fixup the CA initialization code to not initialize the CA config twice when it doesn’t need to.
* Prune Servers from WAN and LAN
* cleaned up and fixed LAN to WAN
* moving things around
* force-leave remove from serfWAN, create pruneSerfWAN
* removed serfWAN remove, reduced complexity, fixed comments
* add another place to remove from serfWAN
* add nil check
* Update agent/consul/server.go
Co-Authored-By: Paul Banks <banks@banksco.de>
With this you should be able to fetch all of the relevant discovery
chain config entries from the state store in one query and then feed
them into the compiler outside of a transaction.
There are a lot of TODOs scattered through here, but they're mostly
around handling fun edge cases and can be deferred until more of the
plumbing works completely.
With ACLs enabled if an agent is wiped and restarted without a leave
it can no longer deregister the services it had previously registered
because it no longer has the tokens the services were registered with.
To remedy that we allow service deregistration from tokens with node
write permission.
* Support for maximum size for Output of checks
This PR allows users to limit the size of output produced by checks at the agent
and check level.
When set at the agent level, it will limit the output for all checks monitored
by the agent.
When set at the check level, it can override the agent max for a specific check but
only if it is lower than the agent max.
Default value is 4k, and input must be at least 1.