Preetha Appan
b46728a88b
Make spread weight a pointer with default value if unset
2019-01-11 10:31:21 -06:00
Michael Schurter
8a6b1acaa6
drain: fix node drain monitoring
...
The whole approach to monitoring drains has ordering issues and lacks
state to output useful error messages.
AFAICT to get the tests passing reliably I needed to change the behavior
of monitoring.
Parts of these tests are skipped in CI, and they should be rewritten as
e2e tests.
2019-01-08 09:35:16 -08:00
Nick Ethier
a96afb6c91
fix tests that fail as a result of async client startup
2018-12-20 00:53:44 -05:00
Mahmood Ali
d497729826
Merge pull request #4978 from hashicorp/f-device-tweaks
...
Display device attributes in `nomad node status -verbose`
2018-12-12 19:45:07 -05:00
Mahmood Ali
fa9b9028a5
Use max 3 precision in displaying floats
...
When formating floats in `nomad node status`, use a maximum precision of
3.
2018-12-10 12:18:24 -05:00
Mahmood Ali
14668f48d1
device attributes in nomad node status -verbose
...
This reports device attributes like the following:
```
$ nomad node status -self -verbose
ID = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name = mars.local
Class = <none>
DC = dc1
Drain = false
Eligibility = eligible
Status = ready
Uptime = 12h40m13s
Drivers
Driver Detected Healthy Message Time
docker true true healthy 2018-12-10T11:47:19-05:00
...
Attributes
cpu.arch = amd64
cpu.frequency = 2200
cpu.modelname = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores = 12
...
Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem = ext4
size = 63.2 GB
Meta
```
2018-12-10 12:18:24 -05:00
Alex Dadgar
1e3c3cb287
Deprecate IOPS
...
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Mahmood Ali
f72e599ee7
Populate alloc stats API with device stats
...
This change makes few compromises:
* Looks up the devices associated with tasks at look up time. Given
that `nomad alloc status` is called rarely generally (compared to stats
telemetry and general job reporting), it seems fine. However, the
lookup overhead grows bounded by number of `tasks x total-host-devices`,
which can be significant.
* `client.Client` performs the task devices->statistics lookup. It
passes self to alloc/task runners so they can look up the device statistics
allocated to them.
* Currently alloc/task runners are responsible for constructing the
entire RPC response for stats
* The alternatives for making task runners device statistics aware
don't seem appealing (e.g. having task runners contain reference to hostStats)
* On the alloc aggregation resource usage, I did a naive merging of task device statistics.
* Personally, I question the value of such aggregation, compared to
costs of struct duplication and bloating the response - but opted to be
consistent in the API.
* With naive concatination, device instances from a single device group used by separate tasks in the alloc, would be aggregated in two separate device group statistics.
2018-11-16 10:26:32 -05:00
Mahmood Ali
93e8fc53f9
device stats summary in node status
...
Sample output with a mock device:
```
Host Resource Utilization
CPU Memory Disk
2651/26400 MHz 9.6 GiB/16 GiB 98 GiB/234 GiB
Device Resource Utilization
nomad/file/mock[README.md] 511 bytes
nomad/file/mock[e2e.go] 239 bytes
nomad/file/mock[e2e_test.go] 128 bytes
Allocations
No allocations placed
```
2018-11-14 22:13:23 -05:00
Mahmood Ali
7d3e34fab5
Add NodeResource Device types in api
package
2018-11-14 14:42:36 -05:00
Mahmood Ali
63acda956c
Add Client Device Stats structs in api
package
2018-11-14 14:41:19 -05:00
Preetha Appan
fd0ba320da
change path to v1/scheduler/configuration
2018-11-12 15:57:45 -06:00
Preetha Appan
75662b50d1
Use response object/querymeta/writemeta in scheduler config API
2018-11-10 10:31:10 -06:00
Preetha Appan
5f0a9d2cfd
Show preemption output in plan CLI
2018-11-08 09:48:43 -06:00
Preetha Appan
57fe5050f0
more minor review feedback
2018-11-01 17:05:17 -05:00
Preetha Appan
8f7eb61823
Introduce a response object for scheduler configuration
2018-10-30 11:06:32 -05:00
Preetha Appan
1415032c13
More review comments
2018-10-30 11:06:32 -05:00
Preetha Appan
c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type
2018-10-30 11:06:32 -05:00
Preetha Appan
bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
9257387a69
Add number of evictions to DesiredUpdates struct to use in CLI/API
2018-10-30 11:06:32 -05:00
Preetha Appan
5b3bfb63eb
structs and API changes to plan and alloc structs needed for preemption
2018-10-30 11:06:32 -05:00
Alex Dadgar
c176a83388
rename api TotalShares -> CpuShares
2018-10-16 17:25:55 -07:00
Alex Dadgar
a78cefec18
use int64
2018-10-16 15:34:32 -07:00
Preetha Appan
7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs
2018-10-16 16:21:42 -05:00
Alex Dadgar
5a07f9f96e
parse affinities and constraints on devices
2018-10-11 14:05:19 -07:00
Alex Dadgar
87cacb427f
parse devices
2018-10-08 16:09:41 -07:00
Alex Dadgar
6b08b9d6b6
Define device request structs
2018-10-08 15:38:03 -07:00
Alex Dadgar
bac5cb1e8b
Scheduler uses allocated resources
2018-10-02 17:08:25 -07:00
Alex Dadgar
5c8697667e
Node reserved resources
2018-09-29 18:44:55 -07:00
Alex Dadgar
3183153315
Node resources on client
2018-09-29 17:23:41 -07:00
Alex Dadgar
cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
...
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar
c6576ddac1
Fix make check errors
2018-09-04 16:03:52 -07:00
Preetha Appan
751c0eb5a5
code review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer
2018-09-04 16:10:11 -05:00
Preetha Appan
6ed527c636
Use heap to store top K scoring nodes.
...
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan
659cfa3f64
Parsing and API layer for spread stanza
2018-09-04 16:10:11 -05:00
Preetha Appan
1c0b123777
Fix test
2018-09-04 16:10:11 -05:00
Preetha Appan
c407e3626f
More review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
4b3b618e4a
Remove unused field
2018-09-04 16:10:11 -05:00
Preetha Appan
9f0caa9c3d
Affinity parsing, api and structs
2018-09-04 16:10:11 -05:00
Nick Ethier
41e010cdc2
nomad: add 'Dispatch' field to Job
...
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Alex Dadgar
72effb8632
code review
2018-06-06 14:52:26 -07:00
Alex Dadgar
f4fccd7ed2
Monitoring non-draining node exits
2018-06-05 17:58:44 -07:00
Alex Dadgar
7f25fcc1bd
Merge pull request #4354 from hashicorp/b-job-modify
...
Deployment adds JobSpecModifyIndex
2018-05-31 17:57:38 +00:00
Alex Dadgar
f2b2e0482b
code review fixes
2018-05-31 10:57:08 -07:00
Preetha Appan
4f835790d7
Set node eligibility to true when old client calls disable
2018-05-30 16:54:07 -05:00
Alex Dadgar
195e19827b
Deployment adds JobSpecModifyIndex
...
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.
Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Nick Ethier
4b64db3a0f
api: emit different monitor message if node's drain strategy is never set
2018-05-24 06:39:09 -04:00
Preetha
159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
...
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan
4a400f045b
Fix docs and method documentation in API
2018-05-21 17:20:59 -05:00
Preetha Appan
3a8040e36f
Add new method EvaluateWithOptions to avoid breaking go API client
2018-05-11 14:18:53 -05:00
Preetha Appan
b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload
2018-05-09 15:04:27 -05:00
Chelsea Holland Komlo
d51611040f
Add driver health information to node list stub
2018-05-09 11:21:54 -04:00
Alex Dadgar
f4af30fbb5
Canary tags structs
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Alex Dadgar
224b3092ae
change default to 10m and docs
2018-05-07 14:50:01 -05:00
Alex Dadgar
8a81038cdb
Set Reschedule from deployment watcher
2018-05-07 14:50:01 -05:00
Alex Dadgar
e5caaf3358
Small test fix
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00
Alex Dadgar
ee50789c22
Initial implementation
2018-05-07 14:50:01 -05:00
Michael Schurter
0d534d30d6
Merge pull request #4251 from hashicorp/f-grpc-checks
...
Support Consul gRPC Health Checks
2018-05-04 14:55:16 -07:00
Michael Schurter
70b02875b7
Merge pull request #4234 from hashicorp/b-4159
...
Fix race in StreamFramer and truncation in api/AllocFS.Logs
2018-05-04 14:24:07 -07:00
Michael Schurter
f6a4713141
consul: make grpc checks more like http checks
2018-05-04 11:08:11 -07:00
Michael Schurter
382caec1e1
consul: initial grpc implementation
...
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Michael Schurter
526af6a246
framer: fix early exit/truncation in framer
2018-05-02 10:46:16 -07:00
Michael Schurter
949938534b
api: never return EOF from Logs error chan
...
Closing the frames chan is the only race-free way to signal to receivers
that all frames have been sent and no errors have occurred.
If EOF is sent on error chan receivers may not receive the last frame
(or frames since the chan is buffered) before receiving the error.
Closing frames is the idiomatic way of signaling there is no more data
to be read from a chan.
2018-05-02 10:46:16 -07:00
Michael Schurter
6f0e2e808b
tests: test logs from client<->api package
2018-05-02 10:46:16 -07:00
Preetha Appan
274bed1892
Add RescheduleTracker to allocs list stub struct
2018-05-01 14:53:47 -05:00
Alex Dadgar
15ad3f94af
Fix command line
2018-04-26 15:46:22 -07:00
James Rasell
b7c2ce2991
Update node-drain logging message to clearer for operators.
...
This change updates the console log message when performing a node
drain and particulary when a node has marked all allocs for
migration. Previously it logged 'drain complete' which was a little
confusing to operators as the node is not drained at this point.
Closes #4183
2018-04-24 07:50:01 +01:00
Nick Ethier
1f99c3a0f7
minor code review fixes to api/jobs
2018-04-17 10:18:36 -04:00
Nick Ethier
03a89060cf
api: add test for canonicalized jobs/parse
2018-04-16 19:21:09 -04:00
Nick Ethier
de4176606d
command/agent: add Canonicalize option to parse args
2018-04-16 19:21:09 -04:00
Nick Ethier
f2db03e56c
command/agent: add /v1/jobs/parse endpoint
...
The parse endpoint accepts a hcl jobspec body within a json object
and returns the parsed json object for the job. This allows users to
register jobs with the nomad json api without specifically needing
a nomad binary to parse their hcl encoded jobspec file.
2018-04-16 19:21:06 -04:00
Preetha
bdc17ebf10
Merge pull request #4139 from hashicorp/b-reschedule-invalid-system-jobs
...
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 20:01:19 -05:00
Preetha Appan
db549d1388
add canonicalize for reschedulepolicy to simplify validation logic
2018-04-11 18:47:27 -05:00
Preetha Appan
ae1419826e
Always merge with default reschedule policy if its not nil
2018-04-11 15:26:01 -05:00
Preetha Appan
a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 14:56:20 -05:00
Charlie Voiselle
ba88f00ccb
Changed "til" to "until"
...
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Alex Dadgar
3c51d5a5ea
Fix ineffectual assignment
2018-04-05 11:29:39 -07:00
Alex Dadgar
f1d3b47499
Don't assume the read index won't be zero if no jobs have been registered
2018-04-03 18:24:59 -07:00
Michael Schurter
7046db8818
cli: remove unreachable drain message
2018-03-30 14:15:12 -07:00
Michael Schurter
f912cd4272
cli: log if a node goes down during draining
2018-03-30 14:02:42 -07:00
Michael Schurter
06874d8b3d
drain: fix monitor node index handling
2018-03-30 12:43:53 -07:00
Michael Schurter
7199a2b960
cli: differentiate normal output vs info
2018-03-30 11:42:11 -07:00
Michael Schurter
0260bda046
cli: add color to drain output
2018-03-30 11:15:12 -07:00
Michael Schurter
6e10e0f84e
drain: fix cli blocking when allocs already stopped
2018-03-30 10:18:14 -07:00
Michael Schurter
ee3eddbac3
drain: block cli until all allocs stop
...
Before the drain CLI would block until the node was marked as completing
drain operations. While technically correct, it could lead operators (or
more likely: scripts) to shutdown drained nodes before all of its
allocations had *actually* terminated.
This change makes the CLI block until all allocations have terminated
(unless ignoring system jobs).
2018-03-29 10:56:09 -07:00
Alex Dadgar
de4b3772f1
Create evals for system jobs when drain is unset
...
This PR creates evals for system jobs when:
* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Preetha Appan
33e170c15d
s/linear/constant/g
2018-03-26 14:45:09 -05:00
Alex Dadgar
d7f246efe1
Drain cli, api, http
2018-03-21 20:27:32 -07:00
Michael Schurter
a854c7bdae
docs: improve DrainRequest.MarkEligible comment
2018-03-21 16:55:22 -07:00
Michael Schurter
a7ab75d853
test: index no longer guaranteed on job list
...
Also switch to require and add t.Helper to appropriate funcs.
2018-03-21 16:55:22 -07:00
Michael Schurter
1cc012966b
api: fix tests to expect default migrate strategy
2018-03-21 16:51:45 -07:00
Michael Schurter
2832853bfa
Add DesiredTransition.ShouldMigrate to api pkg
2018-03-21 16:51:45 -07:00
Alex Dadgar
92b636dd32
Fix deadline handling
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
02019f216a
Correct defaulting
2018-03-21 16:51:44 -07:00
Alex Dadgar
a37329189a
Improve DeadlineTime helper
2018-03-21 16:51:44 -07:00