PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.
The race can be prevented by having SetEnabled pass updateCh to run.
Race detector output from `go test -race -run TestServer_RPC` in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0001d3f48 by goroutine 515:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177
Goroutine 75 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 515 (running) created at:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).
Fixes https://github.com/hashicorp/nomad/issues/4940
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.
I don't anticipate high contention, so simple mutex suffices.
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.
TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
Seems like the stats field is a micro-optimization that doesn't justify
the complexity it introduces. Removing it and computing the stats from
revoking field directly.
Vault's RenewSelf(...) API may return (nil, nil). We failed to check if
secret was nil before attempting to use it.
RenewSelf:
e3eee5b4fb/api/auth_token.go (L138-L155)
Calls ParseSecret:
e3eee5b4fb/api/secret.go (L309-L311)
If anyone has an idea on how to test this I didn't see any options. We
use a real Vault service, so there's no opportunity to mock the
response.
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:
```hcl
constraint {
attribute = "${attrs.type}"
operator = "!="
value = "database"
}
constraint {
attribute = "${attrs.type}"
operator = "is_set"
}
```
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like. So raising it to 3s here that should be
enough.
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.
Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.