Commit graph

6 commits

Author SHA1 Message Date
Michael Schurter 9cac60dbed
test: use port collision instead of cpu exhaustion (#14994)
Originally this test relied on Job 1 blocking Job 2 until Job 1 had a
terminal *ClientStatus.* Job 2 ensured it would get blocked using 2
mechanisms:

1. A constraint requiring it is placed on the same node as Job 1.
2. Job 2 would require all unreserved CPU on the node to ensure it would
   be blocked until Job 1's resources were free.

That 2nd assertion breaks if *any previous job is still running on the
target node!* That seems very likely to happen in the flaky world of our
e2e tests. In fact there may be some jobs we intentionally want running
throughout; in hindsight it was never safe to assume my test would be
the only thing scheduled when it ran.

*Ports to the rescue!* Reserving a static port means that both Job 2
will now block on Job 1 being terminal. It will only conflict with other
tests if those tests use that port *on every node.* I ensured no
existing tests were using the port I chose.

Other changes:
- Gave job a bit more breathing room resource-wise.
- Tightened timings a bit since previous failure ran into the `go test`
  time limit.
- Cleaned up the DumpEvals output. It's quite nice and handy now!
2022-10-21 07:53:26 -07:00
Michael Schurter 01d90d18f6
test: expand timing and debugging for overlap test (#14920)
attempt #9000
2022-10-18 13:02:18 -07:00
Michael Schurter 21eced0a4e
test: extend timing and output of overlap e2e test (#14894)
Keeps failing in the nightly e2e test with unhelpful output like:
```
Failed
=== RUN   TestOverlap
    overlap_test.go:92: Followup job overlap93ee1d2b blocked. Sleeping for the rest of overlap48c26c39's shutdown_delay (9.2/10s)
    overlap_test.go:105: 1500/2000 retries reached for github.com/hashicorp/nomad/e2e/overlap.TestOverlap (err=timed out before an allocation was found for overlap93ee1d2b)
    overlap_test.go:105: timeout: timed out before an allocation was found for overlap93ee1d2b
--- FAIL: TestOverlap (38.96s)
```

I have not been able to replicate it in my own e2e cluster, so I added
the EvalDump helper to add detailed eval information like:

```
=== RUN   TestOverlap
1/1 Job overlap7b0e90ec Eval c38c9919-a4f0-5baf-45f7-0702383c682a
  Type:         service
  TriggeredBy:  job-register
  Deployment:
  Status:       pending ()
  NextEval:
  PrevEval:
  BlockedEval:
   -- No placement failures --
  QueuedAllocs:
  SnapshotIdx:  0
  CreateIndex:  96
  ModifyIndex:  96

...
```

Hopefully helpful when debugging other tests as well!
2022-10-14 14:15:07 -07:00
Michael Schurter bdb639b3e2
test: simplify overlap job placement logic (#14811)
* test: simplify overlap job placement logic

Trying to fix #14806

Both the previous approach as well as this one worked on e2e clusters I
spun up.

* simplify code flow
2022-10-12 11:21:28 -07:00
Michael Schurter ed3218c3dd
Fixing flaky TestOverlap test (#14780)
* test: ensure feasible node selected in overlap test

* test: warn when getting close to retry limit
2022-10-03 14:35:02 -07:00
Michael Schurter 6161b417f3
test: add e2e for non-overlapping placements (#14646)
* test: add e2e for non-overlapping placements

Followup to #10446

Fails (as expected) against 1.3.x at the wait for blocked eval (because
the allocs are allowed to overlap).

Passes against 1.4.0-beta.1 (as expected).

* Update e2e/overlap/overlap_test.go

Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2022-09-22 13:06:17 -07:00