Commit graph

13009 commits

Author SHA1 Message Date
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
oleksii.shyman 118e3fe7e9 Introduce nvidia-plugin reserve
- added reserve functionality that returns OCI compliant env variables
  specifying GPU IDs to be injected inside the container
2018-10-04 14:55:34 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Omar Khawaja b3937e3fc6
Monitoring and Alerting Guide with Prometheus [WIP] (#4706)
* add prometheus configuration guide

* fixing sub navigation issue

* Add detail to Next Steps

* add alerting component to guide

* update

* change docker image name and shorten job templates

* re-arrange to fix broken links
2018-10-04 17:15:10 -04:00
Omar Khawaja adfd89ded8
Load Balancing with Fabio Guide (#4445)
* add load-balancing guide

* restructure load balancing section

* defining consul lb strategies inline and giving fabio its own bullet point

* update docker image name and shorten job template

* changing system scheduler link to relative link and moving load balancing navigation link right to right above Web UI
2018-10-04 16:18:52 -04:00
oleksii.shyman 0ea1dc1776 Introduce Nvidia-plugin stats
- created go-nvml wrapper for stats
 - added stats feature to nvidia-plugin
2018-10-03 15:12:05 -07:00
oleksii.shyman b4a4b395e3 Introduce nvidia-plugin fingerprinting
- created go-nvml wrapper for fingerprinting
  - added fingerprinting feature to nvidia-plugin
2018-10-03 15:11:56 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Chris Baker 33328c973d docs: amended description per @dadgar suggestions in https://github.com/hashicorp/nomad/pull/4743 2018-10-02 13:02:56 -04:00
Chris Baker 307d66590c docs: make explicit the communication pattern on RPC port (4647) 2018-10-02 12:19:37 -04:00
Sébastien Portebois 2d1f082a1d Add missing link to vault in task spec documentation 2018-10-01 20:13:03 -04:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Michael Lange ca631ee217 Override the a11y title and description for the stats time series chart
Since this is a use case specific chart, we can use use case specific
language in our labels.
2018-09-27 12:55:52 -07:00
Michael Lange cdb1831ceb Add a11y features to the line-chart component
- Treat it as an image
- Add a title and a description
- Hide the axes, just in case
2018-09-27 12:55:52 -07:00
Michael Lange 866b74be19 Add a longForm option to format-duration 2018-09-27 12:55:17 -07:00
Alex Dadgar 564da575e1 changelog 2018-09-26 14:53:15 -07:00
Michael Lange 4a98bf989f Make the global logo link to the jobs page (home page) 2018-09-26 11:19:24 -07:00
Michael Lange ea87417d4f Add utilization stats to the task rows on allocation detail 2018-09-26 10:59:26 -07:00
Michael Lange 5736b71f00 Remove no longer used allocation-stats class 2018-09-26 10:59:26 -07:00
Michael Lange 9b90683e6b Use the StatsTracker method of getting alloc stats in alloc row 2018-09-26 10:59:26 -07:00
Michael Lange 95988440dc
Merge pull request #4704 from hashicorp/f-ui-applied-stat-charts
UI: Stat charts everywhere
2018-09-26 10:58:06 -07:00
Alex Dadgar c75dc3d1e2
Merge pull request #4723 from hashicorp/b-autopilot-cli
Fix autopilot set enable custom upgrades flag
2018-09-25 13:53:52 -07:00
Alex Dadgar c031b22d03 Fix autopilot set enable custom upgrades flag 2018-09-25 13:49:35 -07:00
Alex Dadgar 9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar 99c386c076 skip e2e/vault if integration isn't set 2018-09-25 11:29:09 -07:00
Alex Dadgar 10dee5108d
Merge pull request #4712 from hashicorp/b-failed-trigger-reason
Add a missing eval trigger reason
2018-09-25 10:50:16 -07:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan a10118c461 Add failed follow up to the list of allowed eval trigger reasons
needs unit test
2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 6bdd241641
Merge pull request #4717 from barda999/master
changed ${nomad.class} to ${node.class}
2018-09-24 16:51:27 -07:00
barda999 2c9f212dea
changed ${nomad.class} to ${node.class}
I guess that was an unintentional mistake
2018-09-24 16:48:06 -07:00
Alex Dadgar 759a36dc53
Merge pull request #4698 from hashicorp/t-vault-matrix
Vault test matrix
2018-09-24 16:34:35 -07:00
Alex Dadgar f9c60c91d8 proper variable capture 2018-09-24 16:34:15 -07:00
Alex Dadgar a7de6d1bb1
Merge pull request #4716 from hashicorp/f-no-reuse-triggerby
Unique TriggerBy for blocked evals
2018-09-24 16:08:31 -07:00
Alex Dadgar 3497c3c345 Merge branch 'b-plan' into b-jet-fixes 2018-09-24 16:07:29 -07:00
Alex Dadgar 6fa7071194
Merge pull request #4709 from hashicorp/b-deployments
Fix deployment watcher index usage
2018-09-24 16:05:02 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar e1a102f58c test allocs fit 2018-09-24 13:59:01 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar de442226ae Fix other instances of blocking queries 2018-09-24 13:52:39 -07:00
Preetha Appan f8d9d7a179
update changelog 2018-09-24 11:19:51 -05:00
Preetha 63b58aa92c
Merge pull request #4702 from hashicorp/b-non-voter-boostrap
Do not bootstrap with non voters
2018-09-24 11:14:36 -05:00
Alex Dadgar 7f0d241ef4 always handle failed allocation 2018-09-21 15:13:54 -07:00
Alex Dadgar b2449ae1ce Fix deployment watcher index usage
Fixes three issues:
1. Retrieving the latest evaluation index was not properly selecting the
greatest index. This would undermine checks we had to reduce the number
of evaluations created when the latest eval index was greater than any
alloc change
2. Fix an issue where the blocking query code was using the incorrect
index such that the index was higher than necassary.
3. Special case handling of blocked evaluation since the create/snapshot
index is no particularly useful since they can be reblocked.
2018-09-21 13:59:11 -07:00
Michael Lange c694fcb0ba Update stat tracker unit tests 2018-09-19 19:30:18 -07:00
Alex Dadgar 5009566503 do not bootstrap with non voters 2018-09-19 17:17:39 -07:00
Michael Lange 09497b20b8 Acceptance test coverage for all the pages with resource utilization graphs 2018-09-19 16:33:51 -07:00