Commit graph

12662 commits

Author SHA1 Message Date
Alex Dadgar a2a56a930c Diff 2018-10-08 17:02:58 -07:00
Alex Dadgar 87cacb427f parse devices 2018-10-08 16:09:41 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 38e75bdffe
Merge pull request #4750 from hashicorp/f-allocated-resources
Split Resource struct
2018-10-08 14:50:40 -07:00
Chris Baker 0d0da84462
Merge pull request #4743 from hashicorp/doc-rpc-port-discussion
docs: make explicit the communication pattern on RPC port (4647)
2018-10-08 16:01:41 -04:00
Chris Baker 333b767b78
Merge pull request #4755 from hashicorp/f-update-go-version-to-1.11
Vagrant: Update go version to 1.11
2018-10-08 15:33:24 -04:00
Alex Dadgar 300926552b Fix example drain API request 2018-10-08 10:06:39 -07:00
Alex Dadgar 0183fb4e5c nvidia package restructue + build non-linux 2018-10-05 13:56:04 -07:00
Chris Baker bd55ff8f30 renamed vagrant script to accurately reflect non-privileged requirement 2018-10-05 10:07:05 -04:00
Chris Baker 014184749f vagrant: updated go_version to 1.11 in vagrant-linux go provisioning script 2018-10-04 19:06:35 -04:00
Chris Baker 6c480ac408 vagrant: modified UI provisioning script to run as non-privileged 2018-10-04 19:06:35 -04:00
Omar Khawaja b833e4e8f6
editing monitoring.html (#4754) 2018-10-04 18:40:13 -04:00
Omar Khawaja ceb2eed3e5
editing lb guide (#4753) 2018-10-04 18:26:51 -04:00
Alex Dadgar b6d50726e2
Merge pull request #4638 from oleksii-shyman/nvidia-plugin
WIP :: Nvidia Plugin
2018-10-04 15:24:36 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
oleksii.shyman 118e3fe7e9 Introduce nvidia-plugin reserve
- added reserve functionality that returns OCI compliant env variables
  specifying GPU IDs to be injected inside the container
2018-10-04 14:55:34 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Omar Khawaja b3937e3fc6
Monitoring and Alerting Guide with Prometheus [WIP] (#4706)
* add prometheus configuration guide

* fixing sub navigation issue

* Add detail to Next Steps

* add alerting component to guide

* update

* change docker image name and shorten job templates

* re-arrange to fix broken links
2018-10-04 17:15:10 -04:00
Omar Khawaja adfd89ded8
Load Balancing with Fabio Guide (#4445)
* add load-balancing guide

* restructure load balancing section

* defining consul lb strategies inline and giving fabio its own bullet point

* update docker image name and shorten job template

* changing system scheduler link to relative link and moving load balancing navigation link right to right above Web UI
2018-10-04 16:18:52 -04:00
oleksii.shyman 0ea1dc1776 Introduce Nvidia-plugin stats
- created go-nvml wrapper for stats
 - added stats feature to nvidia-plugin
2018-10-03 15:12:05 -07:00
oleksii.shyman b4a4b395e3 Introduce nvidia-plugin fingerprinting
- created go-nvml wrapper for fingerprinting
  - added fingerprinting feature to nvidia-plugin
2018-10-03 15:11:56 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Chris Baker 33328c973d docs: amended description per @dadgar suggestions in https://github.com/hashicorp/nomad/pull/4743 2018-10-02 13:02:56 -04:00
Chris Baker 307d66590c docs: make explicit the communication pattern on RPC port (4647) 2018-10-02 12:19:37 -04:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 564da575e1 changelog 2018-09-26 14:53:15 -07:00
Alex Dadgar c75dc3d1e2
Merge pull request #4723 from hashicorp/b-autopilot-cli
Fix autopilot set enable custom upgrades flag
2018-09-25 13:53:52 -07:00
Alex Dadgar c031b22d03 Fix autopilot set enable custom upgrades flag 2018-09-25 13:49:35 -07:00
Alex Dadgar 9b793531d6
Merge pull request #4720 from hashicorp/b-jet-fixes
Series of scheduler fixes / debugging enhancements
2018-09-25 13:25:11 -07:00
Alex Dadgar 99c386c076 skip e2e/vault if integration isn't set 2018-09-25 11:29:09 -07:00
Alex Dadgar 10dee5108d
Merge pull request #4712 from hashicorp/b-failed-trigger-reason
Add a missing eval trigger reason
2018-09-25 10:50:16 -07:00
Alex Dadgar bd420692f3 fix logging 2018-09-25 10:49:55 -07:00
Preetha Appan a10118c461 Add failed follow up to the list of allowed eval trigger reasons
needs unit test
2018-09-25 10:49:55 -07:00
Preetha Appan 86e725e84c Added logging around nacked evals in the scheduler worker 2018-09-25 10:49:02 -07:00
Alex Dadgar 6bdd241641
Merge pull request #4717 from barda999/master
changed ${nomad.class} to ${node.class}
2018-09-24 16:51:27 -07:00
barda999 2c9f212dea
changed ${nomad.class} to ${node.class}
I guess that was an unintentional mistake
2018-09-24 16:48:06 -07:00
Alex Dadgar 759a36dc53
Merge pull request #4698 from hashicorp/t-vault-matrix
Vault test matrix
2018-09-24 16:34:35 -07:00
Alex Dadgar f9c60c91d8 proper variable capture 2018-09-24 16:34:15 -07:00
Alex Dadgar a7de6d1bb1
Merge pull request #4716 from hashicorp/f-no-reuse-triggerby
Unique TriggerBy for blocked evals
2018-09-24 16:08:31 -07:00
Alex Dadgar 3497c3c345 Merge branch 'b-plan' into b-jet-fixes 2018-09-24 16:07:29 -07:00
Alex Dadgar 6fa7071194
Merge pull request #4709 from hashicorp/b-deployments
Fix deployment watcher index usage
2018-09-24 16:05:02 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar e1a102f58c test allocs fit 2018-09-24 13:59:01 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Alex Dadgar de442226ae Fix other instances of blocking queries 2018-09-24 13:52:39 -07:00
Preetha Appan f8d9d7a179
update changelog 2018-09-24 11:19:51 -05:00
Preetha 63b58aa92c
Merge pull request #4702 from hashicorp/b-non-voter-boostrap
Do not bootstrap with non voters
2018-09-24 11:14:36 -05:00