Commit graph

2952 commits

Author SHA1 Message Date
Chelsea Komlo 4338360da9
Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change
Emit first node event after initialization on health status change
2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo 2174ede6b9 add clarifying comment 2018-03-29 10:58:39 -04:00
Michael Schurter 3a79c32677
Merge pull request #4059 from hashicorp/b-drain-health-svc-only
only service allocs should have health watched
2018-03-28 16:49:22 -07:00
Michael Schurter 5eb0cb7176 only service allocs should have health watched 2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo e3319afee1 emit first node event 2018-03-28 17:26:53 -04:00
Chelsea Komlo 7812ac5abf
Merge pull request #4057 from hashicorp/specify-docker-msg
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha 177d2d6010
Merge pull request #4052 from hashicorp/f-specify-total-memory
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo efc03e252c specify driver health messages 2018-03-28 11:35:21 -04:00
Preetha Appan 329428b49f
Code review feedback and unit test 2018-03-28 10:07:15 -05:00
Charlie Voiselle ea10588227 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter fcaee471a0 client: always mark exited sys/svc allocs as failed
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya 1017cbe8ab
Allow to specify total memory on agent configuration
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Alex Dadgar 432784dae3 Fix alloc watcher snapshot streaming 2018-03-27 11:14:53 -07:00
Alex Dadgar 05449fea09 drop stats fetching log 2018-03-23 12:01:50 -07:00
Chelsea Komlo 5f0c382021
Merge pull request #4030 from hashicorp/health-check-ux
UX improvments to driver health checks
2018-03-23 09:46:50 -04:00
Alex Dadgar da27fc3880 Driver Info output 2018-03-22 17:18:32 -07:00
Chelsea Holland Komlo e9005d8cfb ux improvments to driver health checks 2018-03-22 18:38:29 -04:00
Michael Schurter a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 8b346c6176 test: try to prevent flakiness on travis 2018-03-21 16:51:45 -07:00
Michael Schurter 1b7ac447e9 alloc_runner: watch health for deployed batch jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter bb0ff44fb4 mock_driver: improve Kill() logging 2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo f329e45e03 always set initial health status for every driver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo bbaffe3eca set driver to unhealthy once if it cannot be detected in periodic check 2018-03-21 15:15:26 -04:00
Alex Dadgar 5df4b3728d Docker driver doesn't return errors but injects into the DriverInfo 2018-03-21 15:15:26 -04:00
Alex Dadgar 4365bb7f59 Only run health check if driver is detected 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 285729aee2 function rename and re-arrange functions in fingerprint_manager 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 739784736a remove unused function 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 53a5bc2bb3 Code review feedback 2018-03-21 15:15:26 -04:00
Alex Dadgar 34dc58421c notes from walk through 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 44b6951dda improve tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d740a6a46e refresh driver information for non-health checking drivers periodically 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d5f6c940c4 fix up racy tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter 1022170bf3 executor: increase level for helpful log lines
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Michael Schurter 1044bc0feb
Merge pull request #3984 from hashicorp/f-loosen-consul-skipverify
Replace Consul TLSSkipVerify handling
2018-03-16 11:21:28 -07:00