Commit Graph

3051 Commits

Author SHA1 Message Date
Chelsea Komlo 7812ac5abf
Merge pull request #4057 from hashicorp/specify-docker-msg
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha 177d2d6010
Merge pull request #4052 from hashicorp/f-specify-total-memory
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo efc03e252c specify driver health messages 2018-03-28 11:35:21 -04:00
Preetha Appan 329428b49f
Code review feedback and unit test 2018-03-28 10:07:15 -05:00
Charlie Voiselle ea10588227 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter fcaee471a0 client: always mark exited sys/svc allocs as failed
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya 1017cbe8ab
Allow to specify total memory on agent configuration
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Alex Dadgar 432784dae3 Fix alloc watcher snapshot streaming 2018-03-27 11:14:53 -07:00
Alex Dadgar 05449fea09 drop stats fetching log 2018-03-23 12:01:50 -07:00
Chelsea Komlo 5f0c382021
Merge pull request #4030 from hashicorp/health-check-ux
UX improvments to driver health checks
2018-03-23 09:46:50 -04:00
Alex Dadgar da27fc3880 Driver Info output 2018-03-22 17:18:32 -07:00
Chelsea Holland Komlo e9005d8cfb ux improvments to driver health checks 2018-03-22 18:38:29 -04:00
Michael Schurter a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 8b346c6176 test: try to prevent flakiness on travis 2018-03-21 16:51:45 -07:00
Michael Schurter 1b7ac447e9 alloc_runner: watch health for deployed batch jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter bb0ff44fb4 mock_driver: improve Kill() logging 2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo f329e45e03 always set initial health status for every driver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo bbaffe3eca set driver to unhealthy once if it cannot be detected in periodic check 2018-03-21 15:15:26 -04:00
Alex Dadgar 5df4b3728d Docker driver doesn't return errors but injects into the DriverInfo 2018-03-21 15:15:26 -04:00
Alex Dadgar 4365bb7f59 Only run health check if driver is detected 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 285729aee2 function rename and re-arrange functions in fingerprint_manager 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 739784736a remove unused function 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 53a5bc2bb3 Code review feedback 2018-03-21 15:15:26 -04:00
Alex Dadgar 34dc58421c notes from walk through 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 44b6951dda improve tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d740a6a46e refresh driver information for non-health checking drivers periodically 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d5f6c940c4 fix up racy tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter 1022170bf3 executor: increase level for helpful log lines
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Marcin Matlaszek 6019a88824
Make raw_exec processes cleanup function more precise. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek bb36c122e2
Fix errors when trying to kill whole process group. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 86d650d7b0
Make starting & cleaning process group Windows compatible. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 79c139f2ef
Create new process group on process startup.
Clean up by sending SIGKILL to the whole process group.
2018-03-20 13:40:21 +01:00
Michael Schurter 1044bc0feb
Merge pull request #3984 from hashicorp/f-loosen-consul-skipverify
Replace Consul TLSSkipVerify handling
2018-03-16 11:21:28 -07:00
Michael Schurter 32ee5e0d53
Merge pull request #3990 from hashicorp/f-rkt-groups
rkt: allow specifying --group
2018-03-16 11:19:53 -07:00
Michael Schurter bd78cfb039 rkt: allow specifying --group 2018-03-16 11:08:22 -07:00
Michael Schurter fb10ec9c01 docker: make volume errors recoverable
The interface+mock just to test this one little error handling may seem
like overkill but there was just no other way to write an automated test
around this logic as there's no way to simluate this error with stock
Docker.
2018-03-15 17:52:43 -07:00
Michael Schurter 0971114f0c Replace Consul TLSSkipVerify handling
Instead of checking Consul's version on startup to see if it supports
TLSSkipVerify, assume that it does and only log in the job service
handler if we discover Consul does not support TLSSkipVerify.

The old code would break TLSSkipVerify support if Nomad started before
Consul (such as on system boot) as TLSSkipVerify would default to false
if Consul wasn't running. Since TLSSkipVerify has been supported since
Consul 0.7.2, it's safe to relax our handling.
2018-03-14 17:43:06 -07:00
Preetha Appan 3c38eededd
Fix spelling in comment 2018-03-14 15:54:25 -05:00
Alex Dadgar bef4a8ee09 fix clearing node events 2018-03-14 09:48:59 -07:00
Chelsea Komlo 810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
Add node events
2018-03-14 08:42:55 -04:00
Preetha 360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8655320fd fix up go check warnings 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 0934769b04 add client side emitting of node events
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan 914eaed64f
Address some code review comments 2018-03-13 18:19:16 -05:00
Preetha Appan 09c231ce43
Return the err from server correctly 2018-03-13 18:10:14 -05:00
Preetha Appan 9618f52746
Remove error wrapping and make vault connection server side errors clearer. 2018-03-13 17:09:03 -05:00
Michael Schurter 79df90acb0
Merge pull request #3958 from simplesurance/swappiness
fix: disable swap for executor_linux allocations
2018-03-13 10:10:22 -07:00
Fabian Holler e6af051c93 fix: disable swap for executor_linux allocations
A comment in the nomad source code states that swapping for
executor_linux allocations is disabled but it wasn't.

Nomad wrote -1 to the memsw.limit_in_bytes cgroup file to disable
swapping.
This has the following problems:

1.) Writing -1 to the file does not disable swapping. It sets
    the limit for memory and swap to unlimited.
2.) On common Linux distributions like Ubuntu 16.04 LTS the
    memsw.limit_in_bytes cgroup file does not exist by default.
    The memsw.limit_in_bytes file only exist if the Linux kernel is
    build with CONFIG_MEMCG_SWAP=yes and either
    CONFIG_MEMCG_SWAP_ENABLED=yes or when the kernel parameter
    swapaccount=1 is passed during boot.
    Most Linux distributions disable swap accounting by default because
    of higher memory usage.
    Nomad silently ignores if writing to the memsw.limit_in_bytes file
    fails. The allocation succeeds, no message is logged to notify the
    user.

To ensure that disabling swap works on common Linux kernels, disable
swapping by writing 0 to the memory.swappiness file.
Using the memory.swappiness file only requires that the kernel is
compiled with CONFIG_MEMCG=yes. This is the default in common Linux
kernels.
2018-03-13 10:52:50 +01:00
Alex Dadgar 4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 18c5659474 spelling: version 2018-03-11 19:13:25 +00:00
Josh Soref 6222bd564e spelling: verify 2018-03-11 19:13:32 +00:00
Josh Soref 1359fd2c3d spelling: unexpected 2018-03-11 19:08:07 +00:00
Josh Soref 173ce63fe9 spelling: transition 2018-03-11 19:06:05 +00:00
Josh Soref 782c704de6 spelling: thresholds 2018-03-11 19:03:47 +00:00
Josh Soref ac6d3767da spelling: terminated 2018-03-11 19:01:49 +00:00
Josh Soref 2dda6abab9 spelling: templates 2018-03-11 19:01:39 +00:00
Josh Soref 8978caea28 spelling: shutdown 2018-03-11 18:55:49 +00:00
Josh Soref 8d191c9273 spelling: severity 2018-03-11 18:53:52 +00:00
Josh Soref a79eccaa58 spelling: service 2018-03-11 18:53:47 +00:00
Josh Soref 8149694f3a spelling: server 2018-03-11 18:55:30 +00:00
Josh Soref 3787d8141e spelling: serialize 2018-03-11 18:53:39 +00:00
Josh Soref e37626561c spelling: semantics 2018-03-11 19:00:26 +00:00
Josh Soref e4639ac62f spelling: secrets 2018-03-11 18:53:26 +00:00
Josh Soref cec45c6bc8 spelling: safety 2018-03-11 18:52:54 +00:00
Josh Soref de9d0c7180 spelling: retrieved 2018-03-11 18:51:40 +00:00
Josh Soref e949d23e1b spelling: resource 2018-03-11 18:51:03 +00:00
Josh Soref 82221f9a2b spelling: represents 2018-03-11 18:42:29 +00:00
Josh Soref 1c3b60ae70 spelling: replace 2018-03-11 18:41:53 +00:00
Josh Soref b47ab9ab8c spelling: removes 2018-03-11 18:41:43 +00:00
Josh Soref db166c6cf6 spelling: remnants 2018-03-11 18:41:26 +00:00
Josh Soref 258d76ec13 spelling: registry 2018-03-11 18:41:13 +00:00
Josh Soref 7ad77f568b spelling: purposes 2018-03-11 18:39:35 +00:00
Josh Soref 6fa892a463 spelling: propagated 2018-03-11 18:39:26 +00:00
Josh Soref 1a8204fa11 spelling: previous 2018-03-11 18:38:23 +00:00
Josh Soref f764e5552a spelling: periodically 2018-03-11 18:36:59 +00:00
Josh Soref 96e47bd4c1 spelling: parallelism 2018-03-11 18:35:54 +00:00
Josh Soref 3c1ce6d16d spelling: otherwise 2018-03-11 18:34:27 +00:00
Josh Soref 96dba3e267 spelling: mount 2018-03-11 18:27:18 +00:00
Josh Soref 13e5fb8221 spelling: malicious 2018-03-11 18:26:25 +00:00
Josh Soref 1ef6d6319e spelling: labels 2018-03-11 18:21:44 +00:00
Josh Soref b6ec60fb5f spelling: isolation 2018-03-11 18:19:02 +00:00
Josh Soref 337ac13f0a spelling: interpolation 2018-03-11 18:16:36 +00:00
Josh Soref 75d1240446 spelling: interface 2018-03-11 18:15:37 +00:00
Josh Soref c1a0ae3161 spelling: inspect 2018-03-11 18:15:27 +00:00
Josh Soref 2a1cf2f216 spelling: initialization 2018-03-11 18:18:37 +00:00
Josh Soref b293b48287 spelling: idempotent 2018-03-11 18:14:50 +00:00
Josh Soref 52b83328fc spelling: heartbeating 2018-03-11 18:12:19 +00:00
Josh Soref 3ad579930e spelling: fingerprint 2018-03-11 18:07:37 +00:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Josh Soref 7cd95f6eb3 spelling: executor 2018-03-11 18:05:31 +00:00
Josh Soref b9ce8b9e37 spelling: each 2018-03-11 17:56:19 +00:00
Josh Soref 0fc23b0ba3 spelling: down 2018-03-11 17:55:47 +00:00
Josh Soref e8478c4065 spelling: documentation 2018-03-11 17:55:21 +00:00
Josh Soref 4241ffc5ab spelling: disable 2018-03-11 17:55:12 +00:00
Josh Soref 858b9e809f spelling: directory 2018-03-11 17:55:06 +00:00
Josh Soref 09970343b5 spelling: destruction 2018-03-11 17:54:39 +00:00
Josh Soref 2f135f0ed7 spelling: destroy 2018-03-11 17:54:13 +00:00
Josh Soref 97dc9a00c0 spelling: default 2018-03-11 17:52:58 +00:00
Josh Soref aaa6e104ed spelling: could 2018-03-11 17:51:47 +00:00
Josh Soref c9b86bbc2f spelling: controls 2018-03-11 17:50:39 +00:00
Josh Soref f2a7c95379 spelling: constraints 2018-03-11 17:50:28 +00:00
Josh Soref cb1303e47a spelling: conjunction 2018-03-11 17:48:37 +00:00
Josh Soref 42fa13bbc6 spelling: cancelled 2018-03-11 17:45:47 +00:00
Josh Soref 7077386916 spelling: cancelable 2018-03-11 17:45:34 +00:00
Josh Soref a70fe97556 spelling: assert 2018-03-11 17:41:33 +00:00
Josh Soref 58b794875f spelling: artifact 2018-03-11 17:41:02 +00:00
Josh Soref e78cf9c81a spelling: already 2018-03-11 17:39:04 +00:00
Josh Soref b8b46d3f74 spelling: allocation 2018-03-11 17:37:22 +00:00
Josh Soref e87b0a4d86 spelling: alloc 2018-03-11 17:36:34 +00:00
Josh Soref b67449796a spelling: added 2018-03-11 17:34:28 +00:00
Chelsea Komlo bd88877249
Merge pull request #3909 from hashicorp/b-node-attributes-concurrent-access
Fingerprinters accessing node information should be thread safe
2018-03-06 11:57:46 -05:00
Chelsea Komlo 7c7e2f4d0b
Merge pull request #3873 from hashicorp/r-edge-trigger-node-watcher
Edge trigger node updates
2018-03-01 15:18:59 -05:00
Chelsea Holland Komlo 122d1c4e4a simplify retry logic 2018-03-01 09:48:26 -05:00
Michael Schurter 557a70f78d
Merge pull request #3917 from jaininshah9/master
changing the formula to correctly pass the CPUQota to docker
2018-02-28 20:00:37 -08:00
Jainin Shah 39e1fc06e5 adding comments to the change 2018-02-28 16:19:51 -08:00
Preetha Appan eaedffc7f7
Fix go vet errors 2018-02-28 12:21:27 -06:00
Chelsea Holland Komlo 355805db56 reset timer after updating node copy 2018-02-27 17:18:10 -05:00
Jainin Shah 6eb7da002f changing the formula to correctly pass the CPUQota to docker 2018-02-27 12:32:23 -08:00
Chelsea Holland Komlo a72aaaf47f add network resources equal method, use time ticker
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo e736e31820 use time ticker, update how network resources are compared 2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo 5059065b52 improved testing; node networks comparison 2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo b7bcd0b59f fingerprinters accessing node information should be thread safe 2018-02-26 15:25:54 -05:00
Chelsea Holland Komlo 1f31b39fe8 code review fixups 2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo ed8c8afbcd edge trigger node update
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar 49a47483d1 Registering back to initializing
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar eff4455c68 Fix original client server list behavior 2018-02-15 16:04:53 -08:00
Alex Dadgar 0ebf7f3b7f remove tmp file 2018-02-15 15:51:27 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar 0e85ae77b4 fix flaky gc tests 2018-02-15 13:59:03 -08:00
Alex Dadgar 38b695b69c feedback and rebasing 2018-02-15 13:59:03 -08:00