We were incorrectly returning a 0 duration to the taskrunner when
determining when a task should restart. This would cause tasks to be
restarted immediately, ignoring the restart {} stanza in a users
configuration.
This commit causes us to return the restart duration to the task runner
so it may correctly delay further execution.
This change makes few compromises:
* Looks up the devices associated with tasks at look up time. Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine. However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.
* `client.Client` performs the task devices->statistics lookup. It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
* Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
* The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)
* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
* Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
* With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
Saves a tiny bit of cpu and some IO. Sadly doesn't prevent all IO on
duplicate writes as the transactions are still created and committed.
$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/hashicorp/nomad/helper/boltdd
BenchmarkWriteDeduplication_On-4 500 4059591 ns/op 23736 B/op 56 allocs/op
BenchmarkWriteDeduplication_Off-4 300 4115319 ns/op 25942 B/op 55 allocs/op