050ad6b6f4
Deflake test-api job, currently failing at around 7.6% (44 out of 578 workflows), by ensuring that test nomad agent use a small dedicated port range that doesn't conflict with the kernel ephemeral range. The failures are disproportionatly related to port allocation, where a nomad agent fails to start when the http port is already bound to another process. The failures are intermitent and aren't specific to any test in particular. The following is a representative failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/13995/workflows/6cf6eb38-f93c-46f8-8aa0-f61e62fe7694/jobs/128169 . Upon investigation, the issue seems to be that the api freeport library picks a port block within 10,000-14,500, but that overlaps with the kernel ephemeral range 32,769-60,999! So, freeport may allocate a free port to the nomad agent, just to be used by another process before the nomad agent starts! This happened for example in https://app.circleci.com/pipelines/github/hashicorp/nomad/14111/workflows/e1fcd7ff-f0e0-4796-8719-f57f510b1ffa/jobs/129684 . `freeport` allocated port 41662 to serf, but `google_accounts` raced to use it to connect to the CirleCI vm metadata service. We avoid such races by using a dedicated port range that's disjoint from the kernel ephemeral port range. |
||
---|---|---|
.. | ||
freeport.go |