- Improve resilience of testrpc.WaitForLeader()
- Add additionall retry to CI
- Increase "go test" timeout to 8m
- Add wait for cluster leader to several tests in the agent package
- Add retry to some tests in the api and command packages
* Relaxes Autopilot promotion logic.
When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.
That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.
Fixes#3611
* Gets rid of unnecessary extra not-a-voter check.
* Changes default Raft protocol to 3.
* Changes numPeers() to report only voters.
This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.
* Fixes TestLeader_RollRaftServer.
* Fixes TestOperator_RaftRemovePeerByAddress.
* Fixes TestServer_*.
Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.
* Fixes TestACL_*.
These now just check for Raft replication to be set up, and don't
care about the number of voter peers.
* Fixes TestOperator_Raft_ListPeers.
* Fixes TestAutopilot_CleanupDeadServerPeriodic.
* Fixes TestCatalog_ListNodes_ConsistentRead_Fail.
* Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away
sockets when it sees io.EOF.
* Changes version to 1.0.0 in the options doc.
* Makes metrics test more deterministic with autopilot metrics possible.