tests: deflake test-api job (#9742)

Deflake test-api job, currently failing at around 7.6% (44 out of 578
workflows), by ensuring that test nomad agent use a small dedicated port
range that doesn't conflict with the kernel ephemeral range.

The failures are disproportionatly related to port allocation, where a
nomad agent fails to start when the http port is already bound to
another process. The failures are intermitent and aren't specific to any
test in particular. The following is a representative failure:
https://app.circleci.com/pipelines/github/hashicorp/nomad/13995/workflows/6cf6eb38-f93c-46f8-8aa0-f61e62fe7694/jobs/128169
.

Upon investigation, the issue seems to be that the api freeport library
picks a port block within 10,000-14,500, but that overlaps with the
kernel ephemeral range 32,769-60,999! So, freeport may allocate a free
port to the nomad agent, just to be used by another process before the
nomad agent starts!

This happened for example in
https://app.circleci.com/pipelines/github/hashicorp/nomad/14111/workflows/e1fcd7ff-f0e0-4796-8719-f57f510b1ffa/jobs/129684
.  `freeport` allocated port 41662 to serf, but `google_accounts`
raced to use it to connect to the CirleCI vm metadata service.

We avoid such races by using a dedicated port range that's disjoint from
the kernel ephemeral port range.
This commit is contained in:
Mahmood Ali 2021-01-06 16:18:28 -05:00 committed by GitHub
parent 00be4fc63c
commit 050ad6b6f4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 3 deletions

View File

@ -16,14 +16,14 @@ const (
// blockSize is the size of the allocated port block. ports are given out
// consecutively from that block with roll-over for the lifetime of the
// application/test run.
blockSize = 1500
blockSize = 100
// maxBlocks is the number of available port blocks.
// lowPort + maxBlocks * blockSize must be less than 65535.
maxBlocks = 30
maxBlocks = 10
// lowPort is the lowest port number that should be used.
lowPort = 10000
lowPort = 8000
// attempts is how often we try to allocate a port block
// before giving up.