Commit Graph

3372 Commits

Author SHA1 Message Date
Preetha 6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan 12ba4c45da
remove outdated commented out test code 2018-04-04 15:03:24 -05:00
Preetha Appan 6363a6fb4d
Remove old comment 2018-04-04 15:01:48 -05:00
Preetha Appan 5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Alex Dadgar 86c32358d4
Spelling error 2018-04-03 18:30:01 -07:00
Alex Dadgar 01a6beafbf RPC Retry Watcher 2018-04-03 18:05:28 -07:00
Preetha Appan e6bbce3fa0
Add comment 2018-04-03 19:49:03 -05:00
Alex Dadgar ec844f19d9 randomize servers 2018-04-03 17:46:13 -07:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Alex Dadgar 86f9044676 remove generated files 2018-03-30 16:52:49 -07:00
Alex Dadgar af81349dbe Generated files 2018-03-30 16:14:40 -07:00
Michael Schurter 257ba5937d test: don't rely on alloc runner update count
We were incorrectly relying on the count of alloc updates in a number of
tests. Since alloc updates are async, their number is non-determinstic
and largely meaningless.

This should fix quite a few flaky tests in Travis and prevent future
mistaken assumptions in tests.
2018-03-30 09:34:33 -07:00
Michael Schurter 62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar beee130a6e Always capture the finish time 2018-03-29 11:27:22 -07:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo 4338360da9
Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change
Emit first node event after initialization on health status change
2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo 2174ede6b9 add clarifying comment 2018-03-29 10:58:39 -04:00
Michael Schurter 3a79c32677
Merge pull request #4059 from hashicorp/b-drain-health-svc-only
only service allocs should have health watched
2018-03-28 16:49:22 -07:00
Michael Schurter 5eb0cb7176 only service allocs should have health watched 2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo e3319afee1 emit first node event 2018-03-28 17:26:53 -04:00
Chelsea Komlo 7812ac5abf
Merge pull request #4057 from hashicorp/specify-docker-msg
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha 177d2d6010
Merge pull request #4052 from hashicorp/f-specify-total-memory
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo efc03e252c specify driver health messages 2018-03-28 11:35:21 -04:00
Preetha Appan 329428b49f
Code review feedback and unit test 2018-03-28 10:07:15 -05:00
Charlie Voiselle ea10588227 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter fcaee471a0 client: always mark exited sys/svc allocs as failed
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya 1017cbe8ab
Allow to specify total memory on agent configuration
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Alex Dadgar 432784dae3 Fix alloc watcher snapshot streaming 2018-03-27 11:14:53 -07:00
Alex Dadgar 05449fea09 drop stats fetching log 2018-03-23 12:01:50 -07:00
Chelsea Komlo 5f0c382021
Merge pull request #4030 from hashicorp/health-check-ux
UX improvments to driver health checks
2018-03-23 09:46:50 -04:00
Alex Dadgar da27fc3880 Driver Info output 2018-03-22 17:18:32 -07:00
Chelsea Holland Komlo e9005d8cfb ux improvments to driver health checks 2018-03-22 18:38:29 -04:00
Michael Schurter a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 8b346c6176 test: try to prevent flakiness on travis 2018-03-21 16:51:45 -07:00
Michael Schurter 1b7ac447e9 alloc_runner: watch health for deployed batch jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter bb0ff44fb4 mock_driver: improve Kill() logging 2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo f329e45e03 always set initial health status for every driver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo bbaffe3eca set driver to unhealthy once if it cannot be detected in periodic check 2018-03-21 15:15:26 -04:00
Alex Dadgar 5df4b3728d Docker driver doesn't return errors but injects into the DriverInfo 2018-03-21 15:15:26 -04:00
Alex Dadgar 4365bb7f59 Only run health check if driver is detected 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 285729aee2 function rename and re-arrange functions in fingerprint_manager 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 739784736a remove unused function 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 53a5bc2bb3 Code review feedback 2018-03-21 15:15:26 -04:00
Alex Dadgar 34dc58421c notes from walk through 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 44b6951dda improve tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d740a6a46e refresh driver information for non-health checking drivers periodically 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d5f6c940c4 fix up racy tests 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter 1022170bf3 executor: increase level for helpful log lines
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Marcin Matlaszek 6019a88824
Make raw_exec processes cleanup function more precise. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek bb36c122e2
Fix errors when trying to kill whole process group. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 86d650d7b0
Make starting & cleaning process group Windows compatible. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 79c139f2ef
Create new process group on process startup.
Clean up by sending SIGKILL to the whole process group.
2018-03-20 13:40:21 +01:00
Michael Schurter 1044bc0feb
Merge pull request #3984 from hashicorp/f-loosen-consul-skipverify
Replace Consul TLSSkipVerify handling
2018-03-16 11:21:28 -07:00
Michael Schurter 32ee5e0d53
Merge pull request #3990 from hashicorp/f-rkt-groups
rkt: allow specifying --group
2018-03-16 11:19:53 -07:00
Michael Schurter bd78cfb039 rkt: allow specifying --group 2018-03-16 11:08:22 -07:00
Michael Schurter fb10ec9c01 docker: make volume errors recoverable
The interface+mock just to test this one little error handling may seem
like overkill but there was just no other way to write an automated test
around this logic as there's no way to simluate this error with stock
Docker.
2018-03-15 17:52:43 -07:00
Michael Schurter 0971114f0c Replace Consul TLSSkipVerify handling
Instead of checking Consul's version on startup to see if it supports
TLSSkipVerify, assume that it does and only log in the job service
handler if we discover Consul does not support TLSSkipVerify.

The old code would break TLSSkipVerify support if Nomad started before
Consul (such as on system boot) as TLSSkipVerify would default to false
if Consul wasn't running. Since TLSSkipVerify has been supported since
Consul 0.7.2, it's safe to relax our handling.
2018-03-14 17:43:06 -07:00
Preetha Appan 3c38eededd
Fix spelling in comment 2018-03-14 15:54:25 -05:00
Alex Dadgar bef4a8ee09 fix clearing node events 2018-03-14 09:48:59 -07:00
Chelsea Komlo 810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
Add node events
2018-03-14 08:42:55 -04:00
Preetha 360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8655320fd fix up go check warnings 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 0934769b04 add client side emitting of node events
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan 914eaed64f
Address some code review comments 2018-03-13 18:19:16 -05:00
Preetha Appan 09c231ce43
Return the err from server correctly 2018-03-13 18:10:14 -05:00
Preetha Appan 9618f52746
Remove error wrapping and make vault connection server side errors clearer. 2018-03-13 17:09:03 -05:00
Michael Schurter 79df90acb0
Merge pull request #3958 from simplesurance/swappiness
fix: disable swap for executor_linux allocations
2018-03-13 10:10:22 -07:00
Fabian Holler e6af051c93 fix: disable swap for executor_linux allocations
A comment in the nomad source code states that swapping for
executor_linux allocations is disabled but it wasn't.

Nomad wrote -1 to the memsw.limit_in_bytes cgroup file to disable
swapping.
This has the following problems:

1.) Writing -1 to the file does not disable swapping. It sets
    the limit for memory and swap to unlimited.
2.) On common Linux distributions like Ubuntu 16.04 LTS the
    memsw.limit_in_bytes cgroup file does not exist by default.
    The memsw.limit_in_bytes file only exist if the Linux kernel is
    build with CONFIG_MEMCG_SWAP=yes and either
    CONFIG_MEMCG_SWAP_ENABLED=yes or when the kernel parameter
    swapaccount=1 is passed during boot.
    Most Linux distributions disable swap accounting by default because
    of higher memory usage.
    Nomad silently ignores if writing to the memsw.limit_in_bytes file
    fails. The allocation succeeds, no message is logged to notify the
    user.

To ensure that disabling swap works on common Linux kernels, disable
swapping by writing 0 to the memory.swappiness file.
Using the memory.swappiness file only requires that the kernel is
compiled with CONFIG_MEMCG=yes. This is the default in common Linux
kernels.
2018-03-13 10:52:50 +01:00
Alex Dadgar 4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 18c5659474 spelling: version 2018-03-11 19:13:25 +00:00
Josh Soref 6222bd564e spelling: verify 2018-03-11 19:13:32 +00:00
Josh Soref 1359fd2c3d spelling: unexpected 2018-03-11 19:08:07 +00:00
Josh Soref 173ce63fe9 spelling: transition 2018-03-11 19:06:05 +00:00
Josh Soref 782c704de6 spelling: thresholds 2018-03-11 19:03:47 +00:00
Josh Soref ac6d3767da spelling: terminated 2018-03-11 19:01:49 +00:00
Josh Soref 2dda6abab9 spelling: templates 2018-03-11 19:01:39 +00:00
Josh Soref 8978caea28 spelling: shutdown 2018-03-11 18:55:49 +00:00
Josh Soref 8d191c9273 spelling: severity 2018-03-11 18:53:52 +00:00
Josh Soref a79eccaa58 spelling: service 2018-03-11 18:53:47 +00:00
Josh Soref 8149694f3a spelling: server 2018-03-11 18:55:30 +00:00
Josh Soref 3787d8141e spelling: serialize 2018-03-11 18:53:39 +00:00
Josh Soref e37626561c spelling: semantics 2018-03-11 19:00:26 +00:00
Josh Soref e4639ac62f spelling: secrets 2018-03-11 18:53:26 +00:00
Josh Soref cec45c6bc8 spelling: safety 2018-03-11 18:52:54 +00:00
Josh Soref de9d0c7180 spelling: retrieved 2018-03-11 18:51:40 +00:00
Josh Soref e949d23e1b spelling: resource 2018-03-11 18:51:03 +00:00
Josh Soref 82221f9a2b spelling: represents 2018-03-11 18:42:29 +00:00
Josh Soref 1c3b60ae70 spelling: replace 2018-03-11 18:41:53 +00:00
Josh Soref b47ab9ab8c spelling: removes 2018-03-11 18:41:43 +00:00
Josh Soref db166c6cf6 spelling: remnants 2018-03-11 18:41:26 +00:00
Josh Soref 258d76ec13 spelling: registry 2018-03-11 18:41:13 +00:00
Josh Soref 7ad77f568b spelling: purposes 2018-03-11 18:39:35 +00:00
Josh Soref 6fa892a463 spelling: propagated 2018-03-11 18:39:26 +00:00
Josh Soref 1a8204fa11 spelling: previous 2018-03-11 18:38:23 +00:00
Josh Soref f764e5552a spelling: periodically 2018-03-11 18:36:59 +00:00
Josh Soref 96e47bd4c1 spelling: parallelism 2018-03-11 18:35:54 +00:00
Josh Soref 3c1ce6d16d spelling: otherwise 2018-03-11 18:34:27 +00:00
Josh Soref 96dba3e267 spelling: mount 2018-03-11 18:27:18 +00:00
Josh Soref 13e5fb8221 spelling: malicious 2018-03-11 18:26:25 +00:00
Josh Soref 1ef6d6319e spelling: labels 2018-03-11 18:21:44 +00:00
Josh Soref b6ec60fb5f spelling: isolation 2018-03-11 18:19:02 +00:00
Josh Soref 337ac13f0a spelling: interpolation 2018-03-11 18:16:36 +00:00
Josh Soref 75d1240446 spelling: interface 2018-03-11 18:15:37 +00:00
Josh Soref c1a0ae3161 spelling: inspect 2018-03-11 18:15:27 +00:00
Josh Soref 2a1cf2f216 spelling: initialization 2018-03-11 18:18:37 +00:00
Josh Soref b293b48287 spelling: idempotent 2018-03-11 18:14:50 +00:00
Josh Soref 52b83328fc spelling: heartbeating 2018-03-11 18:12:19 +00:00
Josh Soref 3ad579930e spelling: fingerprint 2018-03-11 18:07:37 +00:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Josh Soref 7cd95f6eb3 spelling: executor 2018-03-11 18:05:31 +00:00
Josh Soref b9ce8b9e37 spelling: each 2018-03-11 17:56:19 +00:00
Josh Soref 0fc23b0ba3 spelling: down 2018-03-11 17:55:47 +00:00
Josh Soref e8478c4065 spelling: documentation 2018-03-11 17:55:21 +00:00
Josh Soref 4241ffc5ab spelling: disable 2018-03-11 17:55:12 +00:00
Josh Soref 858b9e809f spelling: directory 2018-03-11 17:55:06 +00:00
Josh Soref 09970343b5 spelling: destruction 2018-03-11 17:54:39 +00:00
Josh Soref 2f135f0ed7 spelling: destroy 2018-03-11 17:54:13 +00:00
Josh Soref 97dc9a00c0 spelling: default 2018-03-11 17:52:58 +00:00
Josh Soref aaa6e104ed spelling: could 2018-03-11 17:51:47 +00:00
Josh Soref c9b86bbc2f spelling: controls 2018-03-11 17:50:39 +00:00
Josh Soref f2a7c95379 spelling: constraints 2018-03-11 17:50:28 +00:00
Josh Soref cb1303e47a spelling: conjunction 2018-03-11 17:48:37 +00:00
Josh Soref 42fa13bbc6 spelling: cancelled 2018-03-11 17:45:47 +00:00
Josh Soref 7077386916 spelling: cancelable 2018-03-11 17:45:34 +00:00
Josh Soref a70fe97556 spelling: assert 2018-03-11 17:41:33 +00:00
Josh Soref 58b794875f spelling: artifact 2018-03-11 17:41:02 +00:00
Josh Soref e78cf9c81a spelling: already 2018-03-11 17:39:04 +00:00
Josh Soref b8b46d3f74 spelling: allocation 2018-03-11 17:37:22 +00:00
Josh Soref e87b0a4d86 spelling: alloc 2018-03-11 17:36:34 +00:00
Josh Soref b67449796a spelling: added 2018-03-11 17:34:28 +00:00
Chelsea Komlo bd88877249
Merge pull request #3909 from hashicorp/b-node-attributes-concurrent-access
Fingerprinters accessing node information should be thread safe
2018-03-06 11:57:46 -05:00
Chelsea Komlo 7c7e2f4d0b
Merge pull request #3873 from hashicorp/r-edge-trigger-node-watcher
Edge trigger node updates
2018-03-01 15:18:59 -05:00
Chelsea Holland Komlo 122d1c4e4a simplify retry logic 2018-03-01 09:48:26 -05:00
Michael Schurter 557a70f78d
Merge pull request #3917 from jaininshah9/master
changing the formula to correctly pass the CPUQota to docker
2018-02-28 20:00:37 -08:00
Jainin Shah 39e1fc06e5 adding comments to the change 2018-02-28 16:19:51 -08:00
Preetha Appan eaedffc7f7
Fix go vet errors 2018-02-28 12:21:27 -06:00
Chelsea Holland Komlo 355805db56 reset timer after updating node copy 2018-02-27 17:18:10 -05:00
Jainin Shah 6eb7da002f changing the formula to correctly pass the CPUQota to docker 2018-02-27 12:32:23 -08:00
Chelsea Holland Komlo a72aaaf47f add network resources equal method, use time ticker
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo e736e31820 use time ticker, update how network resources are compared 2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo 5059065b52 improved testing; node networks comparison 2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo b7bcd0b59f fingerprinters accessing node information should be thread safe 2018-02-26 15:25:54 -05:00
Chelsea Holland Komlo 1f31b39fe8 code review fixups 2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo ed8c8afbcd edge trigger node update
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar 49a47483d1 Registering back to initializing
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.

It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar eff4455c68 Fix original client server list behavior 2018-02-15 16:04:53 -08:00
Alex Dadgar 0ebf7f3b7f remove tmp file 2018-02-15 15:51:27 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar 0e85ae77b4 fix flaky gc tests 2018-02-15 13:59:03 -08:00
Alex Dadgar 38b695b69c feedback and rebasing 2018-02-15 13:59:03 -08:00
Alex Dadgar 9117ef4650 HTTP agent 2018-02-15 13:59:03 -08:00
Alex Dadgar d7029965ca Server side impl + touch ups 2018-02-15 13:59:02 -08:00
Alex Dadgar ce0caccad2 client implementation of alloc gc and stats 2018-02-15 13:59:02 -08:00
Alex Dadgar e685211892 Code review feedback 2018-02-15 13:59:02 -08:00
Alex Dadgar a9c4f8a4c8 clarify force 2018-02-15 13:59:02 -08:00
Alex Dadgar dc75501c69 Respond to comments 2018-02-15 13:59:02 -08:00
Alex Dadgar cea77df6a7 Add Streaming RPC ack
This PR introduces an ack allowing the receiving end of the streaming
RPC to return any error that may have occured during the establishment
of the streaming RPC.
2018-02-15 13:59:02 -08:00
Alex Dadgar 2f9d33f479 vet 2018-02-15 13:59:02 -08:00
Alex Dadgar f5f43218f5 HTTP and tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 6546b43a17 Client implementation of stream 2018-02-15 13:59:02 -08:00
Alex Dadgar 9a5569678c Client Stat/List impl 2018-02-15 13:59:02 -08:00
Alex Dadgar 8854b35b34 Agent logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 857b0ab6c7 client tests 2018-02-15 13:59:02 -08:00
Alex Dadgar 69def2ff22 Server tests of logs 2018-02-15 13:59:02 -08:00
Alex Dadgar 9479cb7f25 Remove logging 2018-02-15 13:59:01 -08:00
Alex Dadgar 14f57024b7 test stream framer 2018-02-15 13:59:01 -08:00
Alex Dadgar ddd67f5f11 Server streaming 2018-02-15 13:59:01 -08:00
Alex Dadgar ca9379be09 Logs over RPC w/ lots to touch up 2018-02-15 13:59:01 -08:00
Alex Dadgar 2c0ad26374 New RPC Modes and basic setup for streaming RPC handlers 2018-02-15 13:59:01 -08:00
Alex Dadgar fea0e69d4f wip fs endpoint 2018-02-15 13:59:01 -08:00
Alex Dadgar b5037f20db Remove circular dependency 2018-02-15 13:59:01 -08:00
Alex Dadgar 9bc75f0ad4 Fix manager tests and make testagent recover from port conflicts 2018-02-15 13:59:01 -08:00
Alex Dadgar feb943c873 Fix lint/comments 2018-02-15 13:59:01 -08:00
Alex Dadgar ac67da3b06 Unjankify the pkg 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f1f8604bb initial round of comment review 2018-02-15 13:59:01 -08:00
Alex Dadgar e03b074650 Plumb config 2018-02-15 13:59:01 -08:00
Alex Dadgar 05c4fe8675 Change defaults for min use duration 2018-02-15 13:59:01 -08:00
Alex Dadgar c8c1284bc3 SetServer command actually returns an error if given an invalid server 2018-02-15 13:59:01 -08:00
Alex Dadgar 3f786b904b use server manager 2018-02-15 13:59:01 -08:00
Alex Dadgar b24b05e025 Remove testing 2018-02-15 13:59:01 -08:00
Alex Dadgar 4e1cb1d96e Test RPC from server 2018-02-15 13:59:00 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00
Alex Dadgar a6dfffa4fa Add testing interfaces 2018-02-15 13:59:00 -08:00
Alex Dadgar d918f9bd5c RPC Listener 2018-02-15 13:59:00 -08:00
Alex Dadgar 1472b943d6 Stats Endpoint 2018-02-15 13:59:00 -08:00
Chelsea Komlo 0c0b56a1a4
Merge pull request #3807 from hashicorp/f-client-add-fingerprint-manager
Add fingerprint manager to manage fingerprinting node
2018-02-13 11:22:50 -05:00
Chelsea Holland Komlo b321287712 extract test helper
lock concurrent accesses to node

comment exported method
2018-02-12 18:30:10 -05:00
Michael Schurter 101e85f078
Merge pull request #3819 from schmichael/qemu-graceful-shutdown-alpine
Test QEMU graceful shutdown
2018-02-12 12:32:14 -08:00
Michael Schurter ed6bce2ccf Improve test logging 2018-02-12 11:25:52 -08:00
Michael Schurter 06397ba59d
Merge pull request #3825 from jaininshah9/master
add a flag for cpu_hard_limit
2018-02-08 20:40:38 -08:00
Michael Schurter 6e6915e7f5 Merge branch 'master' into f-cpu_hard_limit 2018-02-08 20:14:29 -08:00
Alan Scherger eee7144643 drivers: use ctx.TaskEnv for mount points 2018-02-08 12:59:20 -06:00
Jainin Shah a4516aa71a removing underscore in variable name 2018-02-07 16:28:43 -08:00
Chelsea Holland Komlo 4a26959825 code review feedback 2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo d626d24488 remove dependency on client for fingerprint manager 2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo e012e5ab8a add fingerprint manager 2018-02-07 18:10:33 -05:00
Jainin Shah 8149587abe clearing the confusion between microsecond,nanosecond and millisecond 2018-02-06 19:11:39 -08:00
Jainin Shah d3087d6069 using d.node.Resources.CPU as suggested 2018-02-06 14:52:15 -08:00
Michael Schurter 279a3b3f28
Merge pull request #3790 from 42wim/dockerv6
Service registration for IPv6 docker addresses (Fixes #3785)
2018-02-05 17:07:53 -08:00
Michael Schurter 25f0ad050f docker: Skip IPv6 test if IPv6 disabled 2018-02-05 16:24:30 -08:00
Chelsea Komlo 42d20234a3
Merge pull request #3781 from hashicorp/f-client-fingerprint-refactor
Refactor client fingerprinters to return a diff of node attributes
2018-02-01 20:13:44 -05:00
Chelsea Holland Komlo b21233fe23 update log message 2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo 6f9c0ab361 req/resp should be within config locks; rename for detected fingerprints
changelog
2018-02-01 19:00:39 -05:00
Wim a1a2ca8e33 Add AdvertiseIPv6Address test 2018-02-01 23:21:47 +01:00
Jainin Shah 94d0ce6006 wrapping the line to less than 80 characters 2018-02-01 14:16:38 -08:00
Jainin Shah 0d99f256de changes after running go fmt 2018-02-01 12:07:05 -08:00
Jainin Shah 04c14b3cb2 add a flag for cpu_hard_limit 2018-02-01 10:09:12 -08:00
Chelsea Holland Komlo d889e471a2 fix up linting 2018-02-01 12:26:38 -05:00
Chelsea Holland Komlo b54203eddc add detected to more drivers where the driver is found but unusable 2018-02-01 11:28:17 -05:00
Michael Schurter 0ac43a7622 Skip QEMU graceful shutdown test except on Travis
Hopefully we can reuse the SkipSlow helper elsewhere.
2018-01-31 15:47:26 -08:00
Chelsea Holland Komlo b8e8064835 code review fixup 2018-01-31 18:34:03 -05:00
Michael Schurter 24d060bbb4 Test graceful shutdown
Uses an Alpine image which supports ACPI poweroff signal handling.
Handling is only enabled after the VM has booted, so this test blocks
until sshd starts before issuing the command.
2018-01-31 15:05:02 -08:00
Wim db3bdfe898 * Change use_ipv6_address to advertise_ipv6_address.
* Set autoadvertise to true.
* Update documentation.
2018-02-01 00:01:25 +01:00
Chelsea Holland Komlo 7b53474a6e add applicable boolean to fingerprint response
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Michael Schurter cc54e36f91
Merge pull request #3798 from simar7/qemu-graceful-shutdown-bug
[QEMU] Fixing an unintentional variable shadowing
2018-01-30 17:43:44 -08:00
Michael Schurter c662cc0172
Merge pull request #3773 from mikemccracken/2018-01-18/destroy-container-on-err
lxc: cleanup partially configured containers after errors in Start
2018-01-30 14:52:29 -08:00
Chelsea Holland Komlo 9482c322b7 locks for fingerprint reads/writes 2018-01-30 11:32:45 -05:00
Wim 76f09db067 Service registration for IPv6 docker addresses 2018-01-30 17:07:47 +01:00
Alex Dadgar 3ad5916f72
Merge pull request #3799 from mikemccracken/2018-01-25/lxc-log-outside-container
lxc: move lxc log file out of container-visible alloc dir
2018-01-29 14:32:22 -08:00
Chelsea Holland Komlo 14147c8496 remove attributes from periodic fingerprints when state changes
write test for client periodic fingerprinters
2018-01-29 13:48:54 -05:00
Alex Dadgar 3d28774f74
Merge pull request #3802 from filipochnik/docker-readonly-rootfs
Add ReadonlyRootfs option to the Docker driver
2018-01-29 09:47:27 -08:00
Indradhanush Gupta 7db4ee1122 rkt_test.go: Remove underscore from variable names 2018-01-29 11:39:50 +01:00
Filip Ochnik 80a17ee8dd Add ReadonlyRootfs option to the Docker driver 2018-01-27 14:38:29 +01:00
Chelsea Holland Komlo 7c19de797c create safe getters and setters for fingerprint response 2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo 896d6f8058 fixups from code review 2018-01-26 07:04:32 -05:00
Simarpreet Singh ac720b84f0
qemu: Make the driver debugging output more indicative
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:40:16 -08:00
Simarpreet Singh 8b058f7570
qemu: Fix unintentional shadowing of monitorPath variable
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:24:10 -08:00
Michael McCracken 09c9ca23f5 lxc: move lxc log file out of container-visible alloc dir
The LXC runtime's log file is currently written to TaskDir.LogDir,
which is mounted as alloc/logs inside the containers in the task
group.

This file is not intended to be visible to containers, and depending
on the log level, may have information about the host that a container
should not be allowed to see.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 14:41:37 -08:00
Michael McCracken 88e3063717 fix speling in log
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 13:56:14 -08:00
Chelsea Holland Komlo 3d38868b88 add test case for available cgroups 2018-01-25 06:08:07 -05:00
Chelsea Holland Komlo 9a8344333b refactor Fingerprint to request/response construct 2018-01-24 11:54:02 -05:00
Michael McCracken f8fe2ea8cb review cleanup
don't export an internal function, and simplify some code

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-23 15:03:09 -08:00
Alex Dadgar a43e0a7b08 Allow overriding an image's entrypoint in Docker
Fixes https://github.com/hashicorp/nomad/issues/2219
2018-01-23 14:05:00 -08:00
Alex Dadgar 98a03ad689
Merge pull request #3754 from filipochnik/docker-caps
Add an option to add and drop capabilities in the Docker driver
2018-01-23 12:02:50 -08:00
Chelsea Komlo d09cc2a69f
Merge pull request #3492 from hashicorp/f-client-tls-reload
Client/Server TLS dynamic reload
2018-01-23 05:51:32 -05:00
Filip Ochnik 4abd269a68
Merge branch 'master' into docker-caps 2018-01-21 12:18:22 +01:00
Filip Ochnik 558812350d Finish implementation of the capabilities whitelist 2018-01-21 12:14:24 +01:00
Michael McCracken 00dcfa6db9 lxc: cleanup partially configured containers after errors in Start
If there are any errors in container setup after c.Create() in
Start(), the container will be left around, with no way to clean it up
because the handle will not be created or returned from Start.

Added a wrapper that checks for errors and performs appropriate
cleanup. Returning a cleanup function from a wrapped function instead
of just doing the cleanup before returning the error helps to ensure
that future changes that might add or change error exits can't forget
to consider a cleanup function.

Adds a check to the invalid config test case to check that a container
created with an invalid config doesn't get left behind.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 16:03:03 -08:00
Michael Schurter 38182bebea Drop log level to TRACE
For people not using driver networks these log lines would just be
confusing.
2018-01-18 15:35:24 -08:00
Michael Schurter 9d410c88a7 Improve driver network logging 2018-01-18 15:35:24 -08:00
Michael Schurter 583e17fad5 Always advertise driver IP when in driver mode
Fixes #3681

When in drive address mode Nomad should always advertise the driver's IP
in Consul even when no network exists. This matches the 0.6 behavior.

When in host address mode Nomad advertises the alloc's network's IP if
one exists. Otherwise it lets Consul determine the IP.

I also added some much needed logging around Docker's network discovery.
2018-01-18 15:35:24 -08:00
Michael McCracken 70817f728c lxc_test: add test for contents of file in bind-mounted dir
Ensure that bind mounting via the volumes config really did work.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 05:36:45 -08:00
Michael McCracken fd44bdee37 Simplify with gofmt -s
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken f176e02a64 lxc: add tests for volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken c78c00a2d2 lxc: Add config flag to disable volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken d694a8921f Add volumes config to LXC driver
Allow lxc driver to accept bind mount config similarly to the docker
driver.

Includes some static sanity checks in Validate step

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Chelsea Holland Komlo 649f86f094 refactor creating a new tls configuration 2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo 6c9f9c8ac3 adding additional test assertions; differentiate reloading agent and http server 2018-01-16 07:34:39 -05:00
Filip Ochnik 4eeb552a4f Add a sketch of capabilities whitelist logic for the Docker driver 2018-01-14 20:01:47 +01:00
Filip Ochnik 8ee3ce7a26 Add an option to add and drop capabilities in the Docker driver 2018-01-14 19:56:57 +01:00
Alex Dadgar bec9a72eec Remove networking from basic resources 2018-01-12 14:33:42 -08:00
Charlie Voiselle 867bb6f7f9 Found more priviledge.
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Alex Dadgar 9e1e04c6f1
Merge pull request #3727 from filipochnik/fix-gh-2832
Recognize renewing non-renewable Vault lease as fatal
2018-01-10 11:47:10 -08:00
Michael Schurter 189ce7f991
Merge pull request #3723 from hashicorp/b-3702-chown-dirs
chown dirs when migrating ephemeral_disk data
2018-01-09 09:27:26 -08:00
Michael Schurter e6c27256b7 Test streamed directory ownership 2018-01-08 16:00:07 -08:00
Michael Schurter 2c79ffb213 chown dirs when migrating ephemeral_disk data
Fixes #3702

Added missing chown call and made it conditional on running as root and
not on Windows as we do with files.
2018-01-08 15:31:12 -08:00
Charlie Voiselle 1bb1ab5069 fix typo
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Filip Ochnik d265e11c36 Recognize renewing non-renewable Vault lease as fatal 2018-01-08 20:32:31 +01:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo 9741097406 reloading tls config should be atomic for clients/servers 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo acd3d1b162 fix up downgrading client to plaintext
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Michael Schurter ef76c65da1 Lookup euid outside of loop 2017-12-13 11:50:12 -08:00
Michael Schurter 5032bf4f5a Skip tests that require root when not root
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar f0b0697b57 Keyify struct 2017-12-11 17:23:14 -08:00
Michael Schurter c4d4ead199 Fix test broken by mock updates 2017-12-08 16:45:25 -08:00
Michael Schurter 4b20441eef Validate port label for host address mode
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter 30dd570061 Fix interpolation bug with service/check updates
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
2017-12-08 12:03:00 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Alex Dadgar a0d6b6a121
Merge pull request #3630 from hashicorp/b-periodic
Handle race between fingerprinters and registration
2017-12-07 16:11:13 -08:00
Alex Dadgar 91ffbbb517 Review feedback 2017-12-07 16:10:57 -08:00
Chelsea Komlo c8e0cb3044
Merge pull request #3591 from hashicorp/b-1755-stop
Allow controlling the stop signal for drivers
2017-12-07 17:06:43 -05:00
Alex Dadgar 02baa6c52b Handle race between fingerprinters and registration 2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo 61fa8ad4ba code review fixes 2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo 77ab41124b set default kill signal on executor shutdown 2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo 6cae8fe6e6 extend configurable kill signal to java driver 2017-12-07 11:40:10 -05:00
Alex Dadgar 4409fdacc0 Drop trace logging 2017-12-06 18:02:24 -08:00
Alex Dadgar cd9a7f14b8 Add logging around heartbeats 2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo 350319239c change location of default kill signal 2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo 7dfb64f941 extract signal helper into utils 2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo b08611cfac move kill_signal to task level, extend to docker 2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo 80de7d5ebd allow controlling the stop signal in exec/raw_exec 2017-12-06 11:28:45 -05:00
Chelsea Komlo 9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Michael Schurter b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo 4463dc607e fix up test 2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo 7284f2385a remove unused user option 2017-12-04 18:01:31 -05:00
Michael Schurter 6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo 7c74968452 add ability to specify user for rkt 2017-12-04 14:21:48 -05:00
Michael Schurter 2bf1d6d85e rkt: Don't require port_map with host networking
Also don't try to return a DriverNetwork with host networking. None will
ever exist as that's the point of host networking: rkt won't create a
network namespace.
2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo 4ee2122536 get KillTimeout in seconds, not nanoseconds 2017-12-01 10:43:00 -05:00
Michael Schurter 5e975bbd0f Add comment and normalize err check ordering
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter d996c3a231 Check for error file when receiving snapshots 2017-11-29 17:26:11 -08:00
Michael Schurter ca946679f6 Destroy partially migrated alloc dirs
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Schurter 23c66e37c5 Handle errors during snapshotting
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo 2208964948 Support StopTimeout for Docker tasksw
Update github.com/fsouza/go-dockerclient
2017-11-29 14:33:05 -05:00
Preetha Appan 6ad65c51e6 Missed assert in one place 2017-11-20 13:04:38 -06:00
Preetha Appan 747bd59daa Better error validation, and added test case for invalid sysctl inputs 2017-11-20 12:07:18 -06:00
Preetha Appan c68973747b Address some review comments 2017-11-20 11:15:09 -06:00
Preetha Appan 39ef9ee76d Fix gofmt warnings 2017-11-18 09:23:09 -06:00
Preetha Appan e53dd15f58 Fix test compilation after rebase 2017-11-17 17:46:04 -06:00
Samuel BERTHE 0fca2e19c8 review(docker driver): sysctls -> sysctl + ulimits -> ulimit 2017-11-17 16:30:45 -06:00
Samuel BERTHE 6c93922cb7 Oops 2017-11-17 16:14:14 -06:00
Samuel BERTHE c8363bc44b 💄 2017-11-17 16:03:22 -06:00
Samuel BERTHE 281ab90484 test(docker driver): testing sysctls and ulimits 2017-11-17 16:03:22 -06:00
Samuel BERTHE b9a10ff7fa feat(docker driver): adds sysctls and ulimits configs 2017-11-17 16:03:22 -06:00
Alex Dadgar 69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter 3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter 737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter 437fce9954 Improve rktRemove error message 2017-11-16 15:45:14 -08:00
Michael Schurter 3ceec0caab Remove rkt pods when exiting
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle 7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle 969ddf9c2a Lowered to DEBUG from AD feedback 2017-11-16 14:13:03 -05:00
Alex Dadgar 05b1588cea Only publish metric when the task is running and dev mode publishes metrics 2017-11-15 13:21:06 -08:00
Alex Dadgar 07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar 97ec3974a9 Use interface attached to default route 2017-11-15 11:32:32 -08:00
Michael Schurter f86f0bd9ea Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle 1197637251 Dropped loglevel for AWS fingerprinter env reads
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally).  Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter 3023336b39 Add a test demonstrating the bug
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar ee31e15f51 Better interface selection heuristic
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan 926c9ed997 Make device mounting unit test verify configuration via docker inspect 2017-11-13 09:56:54 -06:00
Preetha Appan dc2d5fb5a4 Unit test (linux only) that tests mounting a device in the docker driver 2017-11-13 09:56:54 -06:00
Preetha Appan 4834710e45 Add default value for cgroup permissions for device if not set 2017-11-13 09:56:54 -06:00
Preetha Appan 9cdee6991c Remove unnecessary check since validate method already checks this 2017-11-13 09:56:54 -06:00
Preetha Appan 110c1fd4f0 Add support for passing device into docker driver 2017-11-13 09:56:54 -06:00
Alex Dadgar d1358ec1b6 alway load all templates 2017-11-10 12:35:51 -08:00
Alex Dadgar a3ea0c17a0 Handle multiple environment templates
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter 690b8f4cfb Remove noisy log line
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer 11e2870875 Qemu driver: clean up logging; fail unsupported features on Windows 2017-11-03 15:40:20 -07:00
Alex Dadgar 6034916ad1 fix spelling mistake 2017-11-03 15:04:59 -07:00
Alex Dadgar a23033932a
Merge pull request #3459 from multani/docker-oom-notification
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer cef9ba9770 Qemu driver: tweaks in response to PR feedback
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan 0eaef09675 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Preetha Appan 5f09c968b3 Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details. 2017-11-03 09:13:01 -05:00
Alex Dadgar b4af10edde Alloc Runner doesn't panic on restoration. 2017-11-02 16:14:13 -07:00
Alex Dadgar abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury cb68889652 Added the node_id as a tag 2017-11-02 13:29:10 -07:00
Alex Dadgar 701f462d33 remove atlas 2017-11-02 11:27:21 -07:00
Michael Schurter fc33c945be Make unable-to-gc log level adaptive
WARNing when someone has over 50 non-terminal allocs was just too
confusing.

Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:

```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury 8a9d0d40b1 Added support for tagged metrics 2017-11-02 10:07:57 -07:00
Diptanu Choudhury 5f522c6de3 Incrementing the start counter when we are actually starting a container 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 44535e5d10 Recording counter for dead allocs properly 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 0b34e811b7 Added metrics to track task/alloc start/restarts/dead events 2017-11-02 09:51:20 -07:00
Matt Mercer 00f90323c2 Qemu driver: defer cleanup sooner 2017-11-01 17:37:43 -07:00
Matt Mercer 43256af5f3 Qemu driver: clean up test logging; retry integration test for longer 2017-11-01 17:21:56 -07:00
Matt Mercer b1145705d3 Use strings.Replace() instead of custom function 2017-11-01 15:31:35 -07:00
Matt Mercer d51d174fa0 Qemu driver: basic testing of graceful shutdown feature 2017-11-01 15:31:30 -07:00
Matt Mercer c26013ea0b Qemu driver: include PIDs in log output 2017-11-01 15:31:24 -07:00
Matt Mercer 38d9a391aa Qemu driver: ensure proper cleanup of resources 2017-11-01 15:31:20 -07:00
Matt Mercer 46f7e2fa4c Qemu driver: minor logging fixes 2017-11-01 15:31:14 -07:00
Matt Mercer 4afb9dfa2d Standardize driver.qemu logging prefix 2017-11-01 15:30:44 -07:00
Matt Mercer 5127e75569 Qemu driver: add graceful shutdown feature 2017-11-01 15:30:36 -07:00
Michael Schurter 1769db98b7 Fix regression by returning error on unknown alloc 2017-11-01 15:16:38 -05:00
Michael Schurter 9f26b9a403 Fix race in test 2017-11-01 15:16:38 -05:00
Michael Schurter 73e9b57908 Trigger GCs after alloc changes
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter 2a81160dcd Fix GC'd alloc tracking
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.

Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar c710550551 fix test 2017-10-30 12:35:31 -07:00
Alex Dadgar 4831380e57 Node access is done using locked Node copy
Fixes https://github.com/hashicorp/nomad/issues/3454

Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Jonathan Ballet 5429d1c656 docker: changed OOM killed error message 2017-10-27 20:30:52 +02:00
Jonathan Ballet 12615bde9c docker: log that a container has been killed by the OOM killer
Fix: #2203 (at least for Docker tasks)
2017-10-27 18:05:27 +02:00
Alex Dadgar f117eb28c7 go style vars 2017-10-25 10:49:34 -07:00
Alex Dadgar 3f8495dd0e fix two flaky tests 2017-10-23 18:15:52 -07:00
Alex Dadgar cb0d0ef009 move to consul freeport implementation 2017-10-23 16:51:40 -07:00
Alex Dadgar dbc014b360 Standardize retrieving a free port into a helper package 2017-10-23 16:48:20 -07:00
Alex Dadgar 4a69e1ad15 don't double parallel 2017-10-23 16:48:06 -07:00
Alex Dadgar 96ca2bbe4c respond to comments 2017-10-23 15:50:27 -07:00
Alex Dadgar 99c81b5848 Skip if no docker 2017-10-19 16:55:10 -07:00
Alex Dadgar 593536664e fix flaky java tests 2017-10-19 16:49:57 -07:00
Alex Dadgar 4bc452b479 Undo darwin user setting 2017-10-19 16:49:57 -07:00
Alex Dadgar c7c6964313 Run as user on mac 2017-10-19 16:49:57 -07:00
Alex Dadgar 55a1dffa2f sudo docker works 2017-10-19 16:49:57 -07:00
Alex Dadgar 805e7b3b62 docker tests 2017-10-19 16:49:57 -07:00
Michael Schurter 797f49702e Add logging around moby/moby#32648 bug 2017-10-18 10:44:03 -07:00
Michael Schurter 22ac450b2f Properly fail rkt fingerprinting on old vesions 2017-10-16 13:58:58 -07:00
Michael Schurter d7732c1a58 Squelch repeated rkt version warnings 2017-10-16 12:09:47 -07:00
Michael Schurter b5fd075d74 Test fixes from #3383 2017-10-13 15:45:35 -07:00
Michael Schurter b63eee17e9 Merge pull request #3383 from hashicorp/b-migrate-token
base64 migrate token
2017-10-13 13:46:54 -07:00
Michael Schurter dfd2967cdb Merge pull request #3376 from hashicorp/f-node-acls
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter 15b991e039 base64 migrate token
HTTP header values must be ASCII.

Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Alex Dadgar 85178d6048 rkt remove allocid 2017-10-13 10:07:50 -07:00
Adam Stankiewicz cefbc72b49
Remove AllocID from ExecutorContext 2017-10-13 17:07:49 +02:00
Michael Schurter 4a70d4356a Alloc watcher must send Node.SecretID as AuthToken
An auth token is required if ACLs are enabled
2017-10-12 16:38:02 -07:00
Michael Schurter 84d8a51be1 SecretID -> AuthToken 2017-10-12 15:16:33 -07:00
Michael Schurter 59ff94cd71 Don't panic on unexpeced Consul response
Fixes #3326
2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo e1c4701a43 fix up build warnings 2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo b018ca4d46 fixing up code review comments 2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo a77e462465 add tests for functionality 2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo 410adaf726 Add functionality for authenticated volumes 2017-10-11 17:09:20 -07:00
Alex Dadgar 6d3d0a9391 Nomad UI Command 2017-10-09 23:01:55 -07:00
Michael Schurter f788974f8a Merge pull request #3288 from simar7/qemu-improvements
qemu: Add bound checks for memory assignment
2017-10-02 14:47:05 -07:00
Simarpreet Singh d801584c46
qemu: Fix lower memory bound to 128M
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 14:29:44 -07:00
Simarpreet Singh 10d7d6dab0
gofmt: format qemu.go and qemu_test.go
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 13:16:48 -07:00
Michael Schurter a66c53d45a Remove `structs` import from `api`
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Michael Schurter 77f1fe40e7 Properly autodetect Docker IP in Windows
Our Docker network plugin autodetection code was erroneously treating
Window's default network `nat` as a plugin and defaulting to it instead
of the host.

Fixes #3218
2017-09-27 16:49:23 -07:00
Michael Schurter a8a87af7ed Only build rkt driver on linux
Build stub for non-linux targets
2017-09-27 14:21:45 -07:00
Simarpreet Singh 3d99e71de8
qemu: Add bound checks for memory assignment
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-09-26 21:07:48 -07:00
Michael Schurter d7229ce6c5 Merge pull request #3256 from dalegaard/master
Enable rkt driver to use address_mode = 'driver'
2017-09-26 18:04:37 -05:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Lasse Dalegaard 9f584d1114 Ignore rkt network failure if container died early
If the container dies before the network can be read, we now ignore the
error coming out of the network information polling loop. Nomad will
restart the task regardless, so we might be masking the actual error.

The polling loop for the rkt network information, inside the `Start`
method, was getting a bit unwieldy. It's been refactored out so it's not
a seperate function.
2017-09-27 00:15:27 +02:00
Lasse Dalegaard b43ec57c02 Make rkt port mapping test not exit immediately
The rkt port mapping test currently starts redis with --version, which
obviously makes redis exit again almost immediately. This means that the
container exists before the network status can be queried, and so the
test fails.
2017-09-26 23:10:24 +02:00
Lasse Dalegaard 17d155d316 Improve rkt driver network status poll loop
The network status poll loop will now report any networks it ignored, as
well as a no-networks situations.
2017-09-26 21:49:45 +02:00
Lasse Dalegaard bafd32fda0 Refactor rkt network status loop
The network status poll loop for the rkt drivers `Start` method was a
bit messy, and could not display the last encountered error. Here we
clean it up.
2017-09-26 21:27:12 +02:00
Lasse Dalegaard 5e9e2b07bd Small logging fix in rkt/driver 2017-09-26 19:36:13 +02:00
Lasse Dalegaard 3d25fd3b00 Bump minimum rkt version to 1.27.0.
The changes introduces in #3256 require at least rkt 1.27.0 because of
a bug in the JSON output of `rkt status` in previous versions.

Here we upgrade all references to rkt's minimum version, and also make
travis and vagrant use this version when running tests.

Finally we add a CHANGELOG notice.
2017-09-26 19:15:43 +02:00
Lasse Dalegaard f55f2b8f24 Turn rkt network status failure into Start failure
If the rkt driver cannot get the network status, for a task with a
configured port mapping, it will now fail the Start() call and kill the
task instead of simply logging. This matches the Docker behavior.

If no port map is specified, the warnings will be logged but the task
will be allowed to start.
2017-09-26 10:20:57 +02:00
Lasse Dalegaard 55a2e60e1a Test for rkt driver setting DriverNetwork
To test that the rkt driver correctly sets a DriverNetwork, at least
when a port mapping is requested, we amend the
TestRktDriver_PortsMapping test with a small check.
2017-09-26 09:10:50 +02:00
Lasse Dalegaard 2d307d5beb Discard errors from rkt status and cat-manifest
Since we don't actually show these errors anywhere, just discard them
right away.
2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo b26454cf99 Move setGaugeForAllocationStats to emitClientMetrics 2017-09-25 16:05:49 +00:00
Lasse Dalegaard cbcbe0da2e Expose rkt DriverNetwork
Currently the rkt driver does not expose a DriverNetwork instance after
starting the container, which means that address_mode = 'driver' does
not work.

To get the container network information, we can call `rkt status` on
the UUID of the container and grab the container IP from there.

For the port map, we need to grab the pod manifest as it will tell us
which ports the container exposes. We then cross-reference the
configured port name with the container port names, and use that to
create a correct port mapping.

To avoid doing a (bad) reimplementation of the appc schema(which rkt
uses for its manifest) and rkt apis, we pull those in as vendored
dependencies. The versions used are the same ones that rkt use in their
glide dependency configuration for version 1.28.0.
2017-09-21 00:34:22 +02:00
Lasse Dalegaard 7ac599d509 Use rkt prepare + run-prepared instead of run.
The rkt driver currently executes run and asks that the pod UUID is
written to a file that is then polled for changes for up to five
seconds. Many container fetches will take longer than this, so this
method will often not be able to track the pod UUID reliably.

To avoid this problem, rkt allows pods to be first prepared, which will
return their UUID, and then run as a second invocation.

Here we convert the rkt driver's Start method to use this method
instead. This way, the UUID will always be tracked correctly.
2017-09-21 00:17:31 +02:00
Michael Schurter f92ffe5af5 Merge pull request #3105 from hashicorp/f-876-restart-unhealthy
Restart unhealthy tasks
2017-09-17 19:38:32 -07:00
epipho a16c97394f Fix incorrect docker stats 2017-09-16 00:43:03 -04:00
Michael Schurter 67a4a169a9 Name const after what it represents 2017-09-15 14:57:18 -07:00
Michael Schurter 79a7bf3d7c Cleanup and test restart failure code 2017-09-15 14:54:37 -07:00
Michael Schurter 06ca379da0 Add comments 2017-09-15 14:34:36 -07:00
Michael Schurter 4dbaa52aba Fold SetFailure into SetRestartTriggered 2017-09-14 16:48:39 -07:00
Michael Schurter ed77c0944b DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter 73fb71ca10 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter 06dd86adbd Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter 0447f79288 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Alex Dadgar d306da846c changelog and feedback 2017-09-14 14:08:58 -07:00