This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes#2359
Cloess #1180
Consider currently dequeued Evaluation's ModifyIndex when determining
its WaitIndex. Normally the Evaluation itself would already be in the
state store snapshot used to determine the WaitIndex. However, since the FSM
applies Raft messages to the state store concurrently with Dequeueing,
it's possible the currently dequeued Evaluation won't yet exist in the
state store snapshot used by JobsForEval.
This can be solved by always considering the current eval's modify index
and using it if it is greater than all of the evals returned by the
state store.
Track the download status of each artifact independently so that if only
one of many artifacts fails to download, completed artifacts aren't
downloaded again.
This seems to fix TestClientAllocations_GarbageCollectAll_Remote being
flaky.
This test confuses me. It joins 2 servers, but then goes out of its way
to make sure the test client only interacts with one. There are not
enough comments for me to figure out the precise assertions this test is
trying to make.
A good old fashioned wait-for-the-client-to-register seems to fix the
flakiness though. The error was that the node could not be found, so
this makes some sense. However, lots of other tests seem to use the same
"wait for node" logic and don't appear to be flaky, so who knows why
waiting fixes this one.
Passes with -race.
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change. As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).
Here, we add a code generator that imports the relevant constants from
`nomad/structs`.
I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication. `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead. Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
Race was test only and due to unlocked map access.
Panic was test only and due to checking a field on a struct even when we
knew the struct was nil.
Race output that was fixed:
```
==================
WARNING: DATA RACE
Read at 0x00c000697dd0 by goroutine 768:
runtime.mapaccess2()
/usr/local/go/src/runtime/map.go:439 +0x0
github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402
+0xe6
github.com/hashicorp/nomad/testutil.WaitForResultRetries()
/home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30
+0x5a
github.com/hashicorp/nomad/testutil.WaitForResult()
/home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22
+0x57
github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401
+0xb53
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous write at 0x00c000697dd0 by goroutine 569:
runtime.mapassign()
/usr/local/go/src/runtime/map.go:549 +0x0
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Add()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224
+0x2eb
github.com/hashicorp/nomad/nomad.(*Server).restorePeriodicDispatcher()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394
+0x29a
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234
+0x593
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Goroutine 768 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x650
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:1150 +0x221
Goroutine 569 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
==================
```
Similar to previous commits the delayed eval update chan was set and
access from different goroutines causing a race. Passing the chan on the
stack resolves the race.
Race output from `go test -race -run 'Server_RPC$'` in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c000339150 by goroutine 63:
github.com/hashicorp/nomad/nomad.(*EvalBroker).flush()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708
+0x3dc
github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174
+0xc4
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718
+0x1fd
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c000339150 by goroutine 73:
github.com/hashicorp/nomad/nomad.(*EvalBroker).runDelayedEvalsWatcher()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771
+0x176
Goroutine 63 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 73 (running) created at:
github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170
+0x173
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207
+0x355
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.
The race can be prevented by having SetEnabled pass updateCh to run.
Race detector output from `go test -race -run TestServer_RPC` in nomad/
```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
Previous read at 0x00c0001d3f48 by goroutine 515:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177
Goroutine 75 (running) created at:
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
Goroutine 515 (running) created at:
github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
/home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).
Fixes https://github.com/hashicorp/nomad/issues/4940
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.
I don't anticipate high contention, so simple mutex suffices.