Seth Hoenig
bff6c88683
cleanup: remove more copies of min/max from helper
2022-08-24 09:56:15 -05:00
Piotr Kazmierczak
b63944b5c1
cleanup: replace TypeToPtr helper methods with pointer.Of ( #14151 )
...
Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.
2022-08-17 18:26:34 +02:00
Michael Schurter
cdf5a74998
core: fix data races in blocked eval chan handling ( #14142 )
...
Similar to the deployment watcher fix in #14121 - the server code loves these mutable structs so we need to guard access to the struct fields with locks.
Capturing ch := b.capacityChangeCh is sufficient to satisfy the data race detector, but I noticed it was also possible to leak goroutines:
Since the watchCapacity loop is in charge of receiving from capacityChangeCh and exits when stopCh is closed, senders to capacityChangeCh also must exit when stopCh is closed. Otherwise they may block forever if capacityChangeCh is full because it will never be received on again. I did not find evidence of this occurring in my meager smattering of prod goroutine dumps I have laying around, but this isn't surprising as the chan has a buffer of 8096! I would imagine that is sufficient to handle "late" sends and then just get GC'd away when the last reference to the old chan is dropped. This is just additional safety/correctness.
2022-08-16 12:33:53 -07:00
Seth Hoenig
c23da281a1
metrics: even classless blocked evals get metrics
...
This PR fixes a bug where blocked evaluations with no class set would
not have metrics exported at the dc:class scope.
Fixes #13759
2022-07-15 14:12:44 -05:00
Seth Hoenig
a5943da0c7
core: add tests for blocked evals math
2022-05-24 09:05:18 -05:00
Seth Hoenig
0c145ac1e4
core: remove correct set of resources on blocked eval
2022-05-23 15:18:55 -05:00
Yoan Blanc
5e8254beda
feat: remove dependency to consul/lib
...
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-09 13:22:44 +02:00
Seth Hoenig
db2347a86c
cleanup: prevent leaks from time.After
...
This PR replaces use of time.After with a safe helper function
that creates a time.Timer to use instead. The new function returns
both a time.Timer and a Stop function that the caller must handle.
Unlike time.NewTimer, the helper function does not panic if the duration
set is <= 0.
2022-02-02 14:32:26 -06:00
Luiz Aoqui
f1b9055d21
Add metrics for blocked eval resources ( #10454 )
...
* add metrics for blocked eval resources
* docs: add new blocked_evals metrics
* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Lang Martin
83d20169f6
blocked_evals reset system evals on Flush
2019-07-18 10:32:13 -04:00
Lang Martin
3bf618f217
blocked_evals system evals indexed by job and node
2019-07-18 10:32:12 -04:00
Michael Schurter
689794e08d
nomad: fix deadlock in UnblockClassAndQuota
...
Previous commit could introduce a deadlock if the capacityChangeCh was
full and the receiving side exited before freeing a slot for the sending
side could send. Flush would then block forever waiting to acquire the
lock just to throw the pending update away.
The race is around getting/setting the chan field, not chan operations,
so only lock around getting the chan field.
2019-05-20 15:41:52 -07:00
Michael Schurter
8c99214f69
nomad: fix race in BlockedEvals
...
I assume the mutex was being released before sending on capacityChangeCh
to avoid blocking in the critical section, but:
1. This is race.
2. capacityChangeCh has a *huge* buffer (8096). If it's full things
already seem Very Bad, and a little backpressure seems appropriate.
2019-05-20 15:26:20 -07:00
Michael Schurter
9732bc37ff
nomad: refactor waitForIndex into SnapshotAfter
...
Generalize wait for index logic in the state store for reuse elsewhere.
Also begin plumbing in a context to combine handling of timeouts and
shutdown.
2019-05-17 13:30:23 -07:00
Michael Schurter
1c137690c4
test: fix race around block eval chans
...
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.
Output from `go test -race -run 'Server_RPC$' in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e10 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 75 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0002b4e50 by goroutine 77:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 77 (finished) created at:
github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Alex Dadgar
a90dc978e1
Handle new eval being the duplicate properly
2018-11-12 16:02:23 -08:00
Alex Dadgar
991791a513
typo fix
2018-11-08 13:28:27 -08:00
Alex Dadgar
b1c5d52817
Track jobs by namespace
2018-11-07 10:22:08 -08:00
Alex Dadgar
6d8bb3a7bd
Duplicate blocked evals cancelling improved
...
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Josh Soref
e4b6eed49b
spelling: only
2018-03-11 18:33:52 +00:00
Josh Soref
3c72f0208e
spelling: needed
2018-03-11 18:30:06 +00:00
Josh Soref
f7e09cbecf
spelling: correlate
2018-03-11 17:51:05 +00:00
Alex Dadgar
c1cc51dbee
sync
2017-10-13 14:36:02 -07:00
Alex Dadgar
86980e08f0
Cancel blocked evals upon successful one for job
...
This PR causes blocked evaluations to be cancelled if there is a
subsequent successful evaluation for the job. This fixes UX problems
showing failed placements when there are not any in reality and makes GC
possible for these jobs in certain cases.
Fixes https://github.com/hashicorp/nomad/issues/2124
2017-01-04 16:16:04 -08:00
Alex Dadgar
fd3e469d5e
Remove requeue because it is a subset of EnqueueAll now
2016-06-24 10:14:34 -07:00
Alex Dadgar
2f8bb4b235
When enqueuing into eval broker always pass blocked eval's token
2016-06-23 22:40:22 -07:00
Alex Dadgar
b1c2a9ddb9
UnblockFailed needs to untrack the job
2016-06-23 15:26:26 -07:00
Sean Chittenden
a658299235
Misc typos
2016-06-16 16:17:17 -07:00
Alex Dadgar
b064b392fc
Only unblock if missed class was added after eval snapshot index
2016-06-10 15:24:06 -07:00
Alex Dadgar
5f3e27ecd8
Fix case in periodic dispatch and blocked evals where lock was not released
2016-06-03 13:46:57 -07:00
Alex Dadgar
060318845f
Comments addressed
2016-05-31 11:39:03 -07:00
Alex Dadgar
1f9f015c1b
Fix race condition in which a reblocked evaluation could be dropped
2016-05-27 16:53:10 -07:00
Alex Dadgar
6a236872b4
address comment
2016-05-25 10:30:47 -07:00
Alex Dadgar
a3336b7761
test fixes and delete
2016-05-24 20:20:06 -07:00
Alex Dadgar
3fd51ecece
Periodically unblock failed evaluations
2016-05-24 20:10:56 -07:00
Alex Dadgar
bfdd5846e1
Track unblock indexes and check evals on block to see if they missed an update while in the scheduler
2016-05-24 20:10:56 -07:00
Sean Chittenden
dc28ab0cb5
Speling police
2016-05-15 09:41:34 -07:00
Alex Dadgar
f6e0349d3b
go vet
2016-02-12 16:08:58 -08:00
Alex Dadgar
36df3aaac7
Remove running, system scheduler, and fix tg overriding eligibility
2016-01-31 20:56:52 -08:00
Alex Dadgar
c55eb0816c
Address comments
2016-01-31 18:46:45 -08:00
Alex Dadgar
dc978066e2
dedup blocked evals by job id
2016-01-31 18:04:45 -08:00
Alex Dadgar
dd19b7e848
Buffered unblock
2016-01-31 18:04:45 -08:00
Alex Dadgar
151fe5ed88
Make computed node class a string and add versioning
2016-01-31 18:04:45 -08:00
Alex Dadgar
3601acfe08
Rename counters
2016-01-31 18:04:45 -08:00
Alex Dadgar
74135f02a4
Blocked Eval tracker
2016-01-31 18:04:45 -08:00
Alex Dadgar
9045d7e989
Schedulers create blocked eval if there are failed allocations
2016-01-31 18:04:45 -08:00