Seth Hoenig
db2347a86c
cleanup: prevent leaks from time.After
...
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.
Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Luiz Aoqui
f1b9055d21
Add metrics for blocked eval resources ( #10454 )
...
* add metrics for blocked eval resources
* docs: add new blocked_evals metrics
* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Lang Martin
83d20169f6
blocked_evals reset system evals on Flush
2019-07-18 10:32:13 -04:00
Lang Martin
3bf618f217
blocked_evals system evals indexed by job and node
2019-07-18 10:32:12 -04:00
Michael Schurter
689794e08d
nomad: fix deadlock in UnblockClassAndQuota
...
Previous commit could introduce a deadlock if the capacityChangeCh was
full and the receiving side exited before freeing a slot for the sending
side could send. Flush would then block forever waiting to acquire the
lock just to throw the pending update away.
The race is around getting/setting the chan field, not chan operations,
so only lock around getting the chan field.
2019-05-20 15:41:52 -07:00
Michael Schurter
8c99214f69
nomad: fix race in BlockedEvals
...
I assume the mutex was being released before sending on capacityChangeCh
to avoid blocking in the critical section, but:
1. This is race.
2. capacityChangeCh has a *huge* buffer (8096). If it's full things
already seem Very Bad, and a little backpressure seems appropriate.
2019-05-20 15:26:20 -07:00
Michael Schurter
9732bc37ff
nomad: refactor waitForIndex into SnapshotAfter
...
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Michael Schurter
1c137690c4
test: fix race around block eval chans
...
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.
Output from `go test -race -run 'Server_RPC$' in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e10 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 75 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e50 by goroutine 77:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 77 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Alex Dadgar
a90dc978e1
Handle new eval being the duplicate properly
2018-11-12 16:02:23 -08:00
Alex Dadgar
991791a513
typo fix
2018-11-08 13:28:27 -08:00
Alex Dadgar
b1c5d52817
Track jobs by namespace
2018-11-07 10:22:08 -08:00
Alex Dadgar
6d8bb3a7bd
Duplicate blocked evals cancelling improved
...
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Josh Soref
e4b6eed49b
spelling: only
2018-03-11 18:33:52 +00:00
Josh Soref
3c72f0208e
spelling: needed
2018-03-11 18:30:06 +00:00
Josh Soref
f7e09cbecf
spelling: correlate
2018-03-11 17:51:05 +00:00
Alex Dadgar
c1cc51dbee
sync
2017-10-13 14:36:02 -07:00
Alex Dadgar
86980e08f0
Cancel blocked evals upon successful one for job
...
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.
Fixes https://github.com/hashicorp/nomad/issues/2124
2017-01-04 16:16:04 -08:00
Alex Dadgar
fd3e469d5e
Remove requeue because it is a subset of EnqueueAll now
2016-06-24 10:14:34 -07:00
Alex Dadgar
2f8bb4b235
When enqueuing into eval broker always pass blocked eval's token
2016-06-23 22:40:22 -07:00
Alex Dadgar
b1c2a9ddb9
UnblockFailed needs to untrack the job
2016-06-23 15:26:26 -07:00
Sean Chittenden
a658299235
Misc typos
2016-06-16 16:17:17 -07:00
Alex Dadgar
b064b392fc
Only unblock if missed class was added after eval snapshot index
2016-06-10 15:24:06 -07:00
Alex Dadgar
5f3e27ecd8
Fix case in periodic dispatch and blocked evals where lock was not released
2016-06-03 13:46:57 -07:00
Alex Dadgar
060318845f
Comments addressed
2016-05-31 11:39:03 -07:00
Alex Dadgar
1f9f015c1b
Fix race condition in which a reblocked evaluation could be dropped
2016-05-27 16:53:10 -07:00
Alex Dadgar
6a236872b4
address comment
2016-05-25 10:30:47 -07:00
Alex Dadgar
a3336b7761
test fixes and delete
2016-05-24 20:20:06 -07:00
Alex Dadgar
3fd51ecece
Periodically unblock failed evaluations
2016-05-24 20:10:56 -07:00
Alex Dadgar
bfdd5846e1
Track unblock indexes and check evals on block to see if they missed an update while in the scheduler
2016-05-24 20:10:56 -07:00
Sean Chittenden
dc28ab0cb5
Speling police
2016-05-15 09:41:34 -07:00
Alex Dadgar
f6e0349d3b
go vet
2016-02-12 16:08:58 -08:00
Alex Dadgar
36df3aaac7
Remove running, system scheduler, and fix tg overriding eligibility
2016-01-31 20:56:52 -08:00
Alex Dadgar
c55eb0816c
Address comments
2016-01-31 18:46:45 -08:00
Alex Dadgar
dc978066e2
dedup blocked evals by job id
2016-01-31 18:04:45 -08:00
Alex Dadgar
dd19b7e848
Buffered unblock
2016-01-31 18:04:45 -08:00
Alex Dadgar
151fe5ed88
Make computed node class a string and add versioning
2016-01-31 18:04:45 -08:00
Alex Dadgar
3601acfe08
Rename counters
2016-01-31 18:04:45 -08:00
Alex Dadgar
74135f02a4
Blocked Eval tracker
2016-01-31 18:04:45 -08:00
Alex Dadgar
9045d7e989
Schedulers create blocked eval if there are failed allocations
2016-01-31 18:04:45 -08:00