Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs. With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.
This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
It's possible that a Nomad Client is heartbeating with a Nomad server that
has become issolated from the quorum of Nomad Servers. When 3x the
heartbeatTTL has been exceeded, append the Consul server list to the primary
primary server list. When the next RPCProxy rebalance occurs, there is a
chance one of the servers discovered from Consul will be in the majority.
When client reattaches to a Nomad Server in the majority, it will include
a heartbeat and will reset the TTLs *AND* will clear the primary server list
to include only values from the heartbeat.
This has been done to allow the Server and Client to reuse the same
Syncer because the Agent may be running Client, Server, or both
simultaneously and we only want one Syncer object alive in the agent.
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
In addition to the API changing, consul.Syncer can now be signaled
to shutdown via the Shutdown() method, which will call the Run()'ing
sync task to exit gracefully.
While breaking the API within this PR, break out the individual
arguments to RefreshServerLists. The servers parameter is reusing
`structs.NodeServerInfo` for the time being, but this can be revisited
if the needs of the strucutre diverge in the future.
With an over abundance of caution, preevnt future copy/pasta by
using the right locks when bootstrapping a Client. Strictly speaking
this is not necessary, but it makes explicit the locking semantics
and guards against future concurrent or parallel initialization.
When an agent is running a server, the list of servers includes the
Raft peers. When the agent is running a client (which is always the
case?), include a list of the servers found in the Client's RpcProxy.
Dedupe and provide a unique list back to the caller.
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).
Background: hashicorp/consul/issues/1346#issuecomment-151663152