There's likely a race (related to https://github.com/hashicorp/consul/issues/2644) where the catalog update might be in but the leader tracking doesn't report a leader, so this blocks forever and then times out. As a workaround we can lower the query wait time to always allow for a few retries.
The last change here made the time overall theoretically the same, but the overhead of running so quickly before probably meant that we were spending longer. Tests seemed marginal in Travis so doubling this to see how things go.
This bit me on CI. The current behavior of the testutil server is to skip if consul isn't present. When lots of output is scrolling by, you're likely to miss the message that the test was skipped. Instead, I propose that we hard fatal if consul doesn't exist, and upstream consumers can skip the tests if they want.
The testutil server uses an atomic incrementer to generate unique port
numbers. This works great until tests are run in parallel, _across
packages_. Because each package starts at the same "offset" idx, they
collide.
One way to overcome this is to run each packages' test in isolation, but
that makes the test suite much longer as it does not maximize
parallelization. Alternatively, instead of having "predictable" ports,
we can let the OS choose a random open port automatically.
This still has a (albeit smaller) race condition in that the OS could
return an open port twice, before the server has a chance to actually
start and occupy said port. In practice, I have not been able to hit
this race condition, so it either doesn't happen or it happens far less
frequently that the existing implementation.
I'm not sure how I feel about the panic, but this is just test code, so
I'm including to say it's okay?
I've removed unneeded conversions by performing the following commands:
$ go get -u github.com/mdempsky/unconvert
$ go list ./... | grep -v vendor | xargs unconvert -apply
Signed-off-by: Miquel Sabaté Solà <msabate@suse.com>
Two of the changes are in tests; the one of consequence is in the API.
As explained in #1308 this can cause conflicts with downstream programs.
Fixes#1308.
On my laptop, I'm currently seeing a huge number of intermittent test
failures all related to WaitForLeader returning after the test node has
become a leader, but before it has actually finished starting up, in
particular, performing the serf/Raft reconciliation.
Waiting for the index to become nonzero makes the tests pass reliably,
by also blocking until the new leader has started committing state.