This PR tries to make API tests run fast, as an experiment to later apply
to all packages. Key changes include
- Swapping freeport for test/portal for port allocations
- Swappng some uses of WaitForResult with test/wait
- Turning on parallelism in api/testutil/slow.go
- Switching to custom public runner (32 vcpu)
There's also chunk of cleanup brought in for the ride
This is a followup to running tests in serial in CI.
Since the API package cannot import anything outside of api/,
copy the ci.Parallel function into api/internal/testutil, and
have api tests use that.
* Fixup uses of `sanity`
* Remove unnecessary comments.
These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.
* Update nomad/fsm_test.go
If the user has disabled Prometheus metrics and a request is
sent to the metrics endpoint requesting Prometheus formatted
metrics, then the request should fail.
Deflake test-api job, currently failing at around 7.6% (44 out of 578
workflows), by ensuring that test nomad agent use a small dedicated port
range that doesn't conflict with the kernel ephemeral range.
The failures are disproportionatly related to port allocation, where a
nomad agent fails to start when the http port is already bound to
another process. The failures are intermitent and aren't specific to any
test in particular. The following is a representative failure:
https://app.circleci.com/pipelines/github/hashicorp/nomad/13995/workflows/6cf6eb38-f93c-46f8-8aa0-f61e62fe7694/jobs/128169
.
Upon investigation, the issue seems to be that the api freeport library
picks a port block within 10,000-14,500, but that overlaps with the
kernel ephemeral range 32,769-60,999! So, freeport may allocate a free
port to the nomad agent, just to be used by another process before the
nomad agent starts!
This happened for example in
https://app.circleci.com/pipelines/github/hashicorp/nomad/14111/workflows/e1fcd7ff-f0e0-4796-8719-f57f510b1ffa/jobs/129684
. `freeport` allocated port 41662 to serf, but `google_accounts`
raced to use it to connect to the CirleCI vm metadata service.
We avoid such races by using a dedicated port range that's disjoint from
the kernel ephemeral port range.
Ensure that api test agent is terminated gracefully. This is desired for
two purposes:
First, to ensure that the logs are fully flished out. If the agent is
killed mid log line and go test doesn't emit a new line before `---
PASS:` indicator, the test may be marked as failed, even if it passed.
Sample failure is https://circleci.com/gh/hashicorp/nomad/72360 .
Second, ensure that the agent terminates any auxiliary processes (e.g.
logmon, tasks).
* Divest api/ package of deps elsewhere in the nomad repo.
This will allow making api/ a module without then pulling in the
external repo, leading to a package name conflict.
This required some migration of tests to an apitests/ folder (can be
moved anywhere as it has no deps on it). It also required some
duplication of code, notably some test helpers from api/ -> apitests/
and part (but not all) of testutil/ -> api/testutil/.
Once there's more separation and an e.g. sdk/ folder those can be
removed in favor of a dep on the sdk/ folder, provided the sdk/ folder
doesn't depend on api/ or /.
* Also remove consul dep from api/ package
* Fix stupid linters
* Some restructuring