Commit graph

68 commits

Author SHA1 Message Date
Michael Schurter 8a6b1acaa6 drain: fix node drain monitoring
The whole approach to monitoring drains has ordering issues and lacks
state to output useful error messages.

AFAICT to get the tests passing reliably I needed to change the behavior
of monitoring.

Parts of these tests are skipped in CI, and they should be rewritten as
e2e tests.
2019-01-08 09:35:16 -08:00
Mahmood Ali fa9b9028a5 Use max 3 precision in displaying floats
When formating floats in `nomad node status`, use a maximum precision of
3.
2018-12-10 12:18:24 -05:00
Mahmood Ali 14668f48d1 device attributes in nomad node status -verbose
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali 93e8fc53f9 device stats summary in node status
Sample output with a mock device:

```
Host Resource Utilization
CPU             Memory          Disk
2651/26400 MHz  9.6 GiB/16 GiB  98 GiB/234 GiB

Device Resource Utilization
nomad/file/mock[README.md]    511 bytes
nomad/file/mock[e2e.go]       239 bytes
nomad/file/mock[e2e_test.go]  128 bytes

Allocations
No allocations placed
```
2018-11-14 22:13:23 -05:00
Mahmood Ali 7d3e34fab5 Add NodeResource Device types in api package 2018-11-14 14:42:36 -05:00
Mahmood Ali 63acda956c Add Client Device Stats structs in api package 2018-11-14 14:41:19 -05:00
Alex Dadgar c176a83388 rename api TotalShares -> CpuShares 2018-10-16 17:25:55 -07:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 72effb8632 code review 2018-06-06 14:52:26 -07:00
Alex Dadgar f4fccd7ed2 Monitoring non-draining node exits 2018-06-05 17:58:44 -07:00
Preetha Appan 4f835790d7
Set node eligibility to true when old client calls disable 2018-05-30 16:54:07 -05:00
Nick Ethier 4b64db3a0f
api: emit different monitor message if node's drain strategy is never set 2018-05-24 06:39:09 -04:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
James Rasell b7c2ce2991
Update node-drain logging message to clearer for operators.
This change updates the console log message when performing a node
drain and particulary when a node has marked all allocs for
migration. Previously it logged 'drain complete' which was a little
confusing to operators as the node is not drained at this point.

Closes #4183
2018-04-24 07:50:01 +01:00
Michael Schurter 7046db8818 cli: remove unreachable drain message 2018-03-30 14:15:12 -07:00
Michael Schurter f912cd4272 cli: log if a node goes down during draining 2018-03-30 14:02:42 -07:00
Michael Schurter 06874d8b3d drain: fix monitor node index handling 2018-03-30 12:43:53 -07:00
Michael Schurter 7199a2b960 cli: differentiate normal output vs info 2018-03-30 11:42:11 -07:00
Michael Schurter 0260bda046 cli: add color to drain output 2018-03-30 11:15:12 -07:00
Michael Schurter 6e10e0f84e drain: fix cli blocking when allocs already stopped 2018-03-30 10:18:14 -07:00
Michael Schurter ee3eddbac3 drain: block cli until all allocs stop
Before the drain CLI would block until the node was marked as completing
drain operations. While technically correct, it could lead operators (or
more likely: scripts) to shutdown drained nodes before all of its
allocations had *actually* terminated.

This change makes the CLI block until all allocations have terminated
(unless ignoring system jobs).
2018-03-29 10:56:09 -07:00
Alex Dadgar de4b3772f1 Create evals for system jobs when drain is unset
This PR creates evals for system jobs when:

* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Michael Schurter a854c7bdae docs: improve DrainRequest.MarkEligible comment 2018-03-21 16:55:22 -07:00
Alex Dadgar 92b636dd32 Fix deadline handling 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Alex Dadgar d47c68f764 Add eligibility to node view 2018-03-21 16:51:44 -07:00
Alex Dadgar 8289cc3c6f HTTP and API 2018-03-21 16:51:44 -07:00
Alex Dadgar 010228577e Drain cli, api, http 2018-03-21 16:51:43 -07:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo d30c269fbe code review feedback 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 00d9923454 Ensure node updates don't strip node events
Add node events to CLI
2018-03-13 18:05:40 -07:00
Alex Dadgar aa98f8ba7b Enhance API pkg to utilize Server's Client Tunnel
This PR enhances the API package by having client only RPCs route
through the server when they are low cost and for filesystem access to
first attempt a direct connection to the node and then falling back to
a server routed request.
2018-02-15 13:59:03 -08:00
James Rasell 45e8f977f7
Update node-status verbose command to include node address.
This change updates the `nomad node-status -verbose` command to
also include the addreess of the node. This is helpful for cluster
administrators to quickly discover information and access nodes
when required.
2017-12-21 08:58:35 +00:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Michael Schurter b145e04d5d Refactor GetNodeClient weirdness
- No need to for a pointer to a pointer
- Properly set and use QueryOptions.Region
2017-08-28 14:41:21 -07:00
Michael Schurter 7363b50666 Fix TLS support in api pkg / cli
Fixes #3013

It's a little weird that Client now has a method for returning a
NewClient, but it's a convenient way to dedupe the logic to
connect-directly-to-a-node which is nontrivial and had sutble
differences between locations.
2017-08-28 11:46:28 -07:00
James Rasell 0d120228ea Updates based on feedback provided by dadgar. 2017-08-16 22:19:31 +01:00
“James d6d721d7c1 Add the Nomad agent version to the node-status CLI putput. 2017-08-10 08:27:26 +01:00
Diptanu Choudhury bb664835c2 Added the API for GC of allocations and nodes 2017-01-12 16:18:29 -08:00
Diptanu Choudhury 84722234b4 Fixed a bunch of TLS related failures 2016-10-26 14:08:46 -07:00
Diptanu Choudhury 067fcda3fe Making the cli use TLS if the client has enabled TLS 2016-10-26 11:13:53 -07:00
Diptanu Choudhury 3c64fa562b Added the status updated at timestamp 2016-07-27 11:50:06 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00