Commit Graph

2051 Commits

Author SHA1 Message Date
Nick Ethier b81e4e18f0
agent: suppose filter_default telemetry option 2018-11-19 23:21:48 -05:00
Nick Ethier 85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Mahmood Ali 9479015f51
Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali 6f9126f475 show Device Stats header in alloc status 2018-11-16 17:34:37 -05:00
Mahmood Ali 00ffd02ced Show stable order of device attributes 2018-11-16 17:34:37 -05:00
Preetha 5f094633fa
Merge pull request #4889 from hashicorp/f-service-meta
Pass service metadata "external-source" for consul UI integration
2018-11-16 12:24:21 -06:00
Preetha Appan 18708d3f0b
Pass service metadata "external-source" for consul UI integration 2018-11-16 11:28:56 -06:00
Mahmood Ali 33c96a803a tweak whitespace in device stats output 2018-11-16 10:37:39 -05:00
Mahmood Ali 159b8f866a Display device stats in `nomad alloc status` 2018-11-16 10:26:32 -05:00
Mahmood Ali d88a3f8413 Prepare to reuse device resources printing 2018-11-16 10:26:32 -05:00
Michael Schurter 6f3712ed48 gofmt -s -w command/helper_stats_test.go
Fixes the static checks build
2018-11-15 14:14:05 -08:00
Mahmood Ali 24b37e0aaf Display StatsObject nested objects as well 2018-11-15 08:09:54 -05:00
Mahmood Ali ee9353fbd6 Use disk display format for devices 2018-11-14 22:13:23 -05:00
Mahmood Ali 0712be643f Print verbose device in `nomad node status -stats` 2018-11-14 22:13:23 -05:00
Mahmood Ali 93e8fc53f9 device stats summary in `node status`
Sample output with a mock device:

```
Host Resource Utilization
CPU             Memory          Disk
2651/26400 MHz  9.6 GiB/16 GiB  98 GiB/234 GiB

Device Resource Utilization
nomad/file/mock[README.md]    511 bytes
nomad/file/mock[e2e.go]       239 bytes
nomad/file/mock[e2e_test.go]  128 bytes

Allocations
No allocations placed
```
2018-11-14 22:13:23 -05:00
Mahmood Ali c62ec124c0 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Mahmood Ali c7610d8c22 mark and skip failing consul failing tests 2018-11-13 10:21:40 -05:00
Preetha Appan fd0ba320da
change path to v1/scheduler/configuration 2018-11-12 15:57:45 -06:00
Preetha Appan 3a10a589d7
Fix failing test 2018-11-10 19:53:47 -06:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Preetha Appan f1a69e529c
Fix vet error 2018-11-08 09:48:43 -06:00
Preetha Appan da75388a9b
review feedback 2018-11-08 09:48:43 -06:00
Preetha Appan d88ec0267f
Comments 2018-11-08 09:48:43 -06:00
Preetha Appan 5f0a9d2cfd
Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Alex Dadgar 204ca8230c Device manager
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
2018-11-07 10:43:15 -08:00
Michael Schurter 392d548b85
Merge pull request #4828 from hashicorp/b-restore
Implement client agent restarting
2018-11-05 18:50:15 -06:00
Alex Dadgar 1c31970464 Fix multiple tgs with progress deadline handling
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.

Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
2018-11-05 16:06:17 -08:00
Michael Schurter 6bdbfb8129 tests: get consul integration tests building 2018-11-05 12:32:05 -08:00
Preetha b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan d03201adf8
Fix formatting of allocation score metrics 2018-10-30 12:03:23 -05:00
Preetha Appan 9d316cbbef
Fix return type in tests after refactor 2018-10-30 11:10:46 -05:00
Preetha Appan 8f7eb61823
Introduce a response object for scheduler configuration 2018-10-30 11:06:32 -05:00
Preetha Appan c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Michael Schurter d71a1b4547 tests: more fixes due to api changes 2018-10-29 15:25:22 -07:00
Michael Schurter 2b1b3d7e1e tests: get tests building if not yet passing 2018-10-16 16:56:57 -07:00
Michael Schurter 1a29337e48 register drivers by default
Do not register mock_driver on release builds.
2018-10-16 16:56:56 -07:00
Nick Ethier 3183b33d24 client: review comments and fixup/skip tests 2018-10-16 16:56:56 -07:00
Nick Ethier f192c3752a client: refactor post allocrunnerv2 finalization 2018-10-16 16:56:56 -07:00
Nick Ethier 4a4c7dbbfc client: begin driver plugin integration
client: fingerprint driver plugins
2018-10-16 16:56:56 -07:00
Alex Dadgar 7946a14aa8 Fix lints 2018-10-16 16:56:56 -07:00
Alex Dadgar 45e41cca03 allocrunnerv2 -> allocrunner 2018-10-16 16:56:56 -07:00
Alex Dadgar 6c9d9d5173 move files around 2018-10-16 16:56:55 -07:00
Michael Schurter f279b1d1b1 tests: test logs endpoint against pending task
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.
2018-10-16 16:56:55 -07:00
Michael Schurter 6bcf772f3c tests: test via ServeMux so http codes are set 2018-10-16 16:56:55 -07:00
Michael Schurter 960f3be76c client: expose task state to client
The interesting decision in this commit was to expose AR's state and not
a fully materialized Allocation struct. AR.clientAlloc builds an Alloc
that contains the task state, so I considered simply memoizing and
exposing that method.

However, that would lead to AR having two awkwardly similar methods:
 - Alloc() - which returns the server-sent alloc
 - ClientAlloc() - which returns the fully materialized client alloc

Since ClientAlloc() could be memoized it would be just as cheap to call
as Alloc(), so why not replace Alloc() entirely?

Replacing Alloc() entirely would require Update() to immediately
materialize the task states on server-sent Allocs as there may have been
local task state changes since the server received an Alloc update.

This quickly becomes difficult to reason about: should Update hooks use
the TaskStates? Are state changes caused by TR Update hooks immediately
reflected in the Alloc? Should AR persist its copy of the Alloc? If so,
are its TaskStates canonical or the TaskStates on TR?

So! Forget that. Let's separate the static Allocation from the dynamic
AR & TR state!

 - AR.Alloc() is for static Allocation access (often for the Job)
 - AR.AllocState() is for the dynamic AR & TR runtime state (deployment
   status, task states, etc).

If code needs to know the status of a task: AllocState()
If code needs to know the names of tasks: Alloc()

It should be very easy for a developer to reason about which method they
should call and what they can do with the return values.
2018-10-16 16:56:55 -07:00
Michael Schurter 1c9ccdeab5 tests: fix races caused by sharing a buffer
httptest.ResponseRecorder exposes a bytes.Buffer which we were reading
and writing concurrently to test streaming log APIs. This is a race, so
I wrapped the struct in a lock with some helpers.
2018-10-16 16:56:55 -07:00
Alex Dadgar 84ce8c3487 extra logging 2018-10-16 16:56:55 -07:00