From 050ad6b6f4b534a94e0e98f0a91e19e15c39474d Mon Sep 17 00:00:00 2001 From: Mahmood Ali Date: Wed, 6 Jan 2021 16:18:28 -0500 Subject: [PATCH] tests: deflake test-api job (#9742) Deflake test-api job, currently failing at around 7.6% (44 out of 578 workflows), by ensuring that test nomad agent use a small dedicated port range that doesn't conflict with the kernel ephemeral range. The failures are disproportionatly related to port allocation, where a nomad agent fails to start when the http port is already bound to another process. The failures are intermitent and aren't specific to any test in particular. The following is a representative failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/13995/workflows/6cf6eb38-f93c-46f8-8aa0-f61e62fe7694/jobs/128169 . Upon investigation, the issue seems to be that the api freeport library picks a port block within 10,000-14,500, but that overlaps with the kernel ephemeral range 32,769-60,999! So, freeport may allocate a free port to the nomad agent, just to be used by another process before the nomad agent starts! This happened for example in https://app.circleci.com/pipelines/github/hashicorp/nomad/14111/workflows/e1fcd7ff-f0e0-4796-8719-f57f510b1ffa/jobs/129684 . `freeport` allocated port 41662 to serf, but `google_accounts` raced to use it to connect to the CirleCI vm metadata service. We avoid such races by using a dedicated port range that's disjoint from the kernel ephemeral port range. --- api/internal/testutil/freeport/freeport.go | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/api/internal/testutil/freeport/freeport.go b/api/internal/testutil/freeport/freeport.go index 806449ba4..f21698de7 100644 --- a/api/internal/testutil/freeport/freeport.go +++ b/api/internal/testutil/freeport/freeport.go @@ -16,14 +16,14 @@ const ( // blockSize is the size of the allocated port block. ports are given out // consecutively from that block with roll-over for the lifetime of the // application/test run. - blockSize = 1500 + blockSize = 100 // maxBlocks is the number of available port blocks. // lowPort + maxBlocks * blockSize must be less than 65535. - maxBlocks = 30 + maxBlocks = 10 // lowPort is the lowest port number that should be used. - lowPort = 10000 + lowPort = 8000 // attempts is how often we try to allocate a port block // before giving up.