Commit graph

69 commits

Author SHA1 Message Date
Mahmood Ali f72e599ee7 Populate alloc stats API with device stats
This change makes few compromises:

* Looks up the devices associated with tasks at look up time.  Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine.  However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.

* `client.Client` performs the task devices->statistics lookup.  It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
  * Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
  * The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)

* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
  * Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
  * With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
2018-11-16 10:26:32 -05:00
Michael Schurter a4e6a92d18 client: update alloc status when terminating
Defensively update alloc status whenever killing all tasks.
2018-11-05 15:11:10 -08:00
Michael Schurter b994f51990 client: fix tr lifecycle logic and shutdown delay
ShutdownDelay must be honored whenever the task is killed or restarted.
Services were not being deregistered prior to restarting.
2018-11-05 12:32:05 -08:00
Michael Schurter d29d09023e client: do not run terminal allocs 2018-11-05 12:32:05 -08:00
Michael Schurter 2bbd88888c client: first pass at implementing task restoring
Task restoring works but dead tasks may be restarted
2018-11-05 12:32:05 -08:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Michael Schurter cefbf00bf0 ar: refactor task killing into 1 method
Update comments and address some PR comments from #4775
2018-10-17 10:06:59 -07:00
Michael Schurter 222f6b5741 ar: fix task leader, update, and stop handling 2018-10-17 10:06:59 -07:00
Nick Ethier 3183b33d24 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Nick Ethier f192c3752a client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Alex Dadgar 45e41cca03 allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar 6c9d9d5173 move files around 2018-10-16 16:56:55 -07:00
Michael Schurter 9d1ea3b228 client: hclog-ify most of the client
Leaving fingerprinters in case that interface changes with plugins.
2018-10-16 16:53:30 -07:00
Michael Schurter e42154fc46 implement stopping, destroying, and disk migration
* Stopping an alloc is implemented via Updates but update hooks are
  *not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
  its own package to avoid cycles. Now only depends on alloc broadcaster
  instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
  latest version.
* Made AllocDir safe for concurrent use

Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.
2018-10-16 16:53:30 -07:00
Michael Schurter 820af27171 wrap boltdb in a write deduplicator
Saves a tiny bit of cpu and some IO. Sadly doesn't prevent all IO on
duplicate writes as the transactions are still created and committed.

$ go test -bench=. -benchmem
goos: linux
goarch: amd64
pkg: github.com/hashicorp/nomad/helper/boltdd
BenchmarkWriteDeduplication_On-4             500           4059591 ns/op           23736 B/op         56 allocs/op
BenchmarkWriteDeduplication_Off-4            300           4115319 ns/op           25942 B/op         55 allocs/op
2018-10-16 16:53:30 -07:00
Michael Schurter 5383d20505 removing old restoration path before api change 2018-10-16 16:53:30 -07:00
Alex Dadgar 9bab9edf27 test fixes 2018-06-12 17:45:39 -07:00
Alex Dadgar 90c2108bfb Fix gc tests + parallel destroy + small test fixes 2018-06-12 10:23:45 -07:00
Alex Dadgar f5ff509fa5 Refactor - wip 2018-06-12 10:23:45 -07:00
Renamed from client/alloc_runner.go (Browse further)