Commit graph

2582 commits

Author SHA1 Message Date
Michael Schurter 4dbaa52aba Fold SetFailure into SetRestartTriggered 2017-09-14 16:48:39 -07:00
Michael Schurter ed77c0944b DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter 73fb71ca10 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter 06dd86adbd Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter 0447f79288 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Alex Dadgar d306da846c changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar 07ed83fdd5 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Komlo 536d38454b Merge pull request #3191 from hashicorp/b-tagged-metrics-panic
Fix panic in emitting tagged allocation metrics
2017-09-11 14:28:50 -04:00
Armon Dadgar d4aed839d2 Merge pull request #3185 from hashicorp/f-acl-reset
Add ability to reset ACL bootstrap process
2017-09-11 10:47:17 -07:00
Armon Dadgar 3d5ecaafff Address @dadgar feedback 2017-09-11 10:30:59 -07:00
Alex Dadgar b3958faa14 Merge pull request #3187 from hashicorp/b-windows-docker
Fix MemorySwappiness on Windows Docker
2017-09-11 09:56:49 -07:00
Alex Dadgar 1cd8f7523f Merge pull request #3184 from hashicorp/b-docker-logging
Fix docker user specified syslogging
2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo 848af92183 fix panic in emitting tagged metrics 2017-09-11 15:32:37 +00:00
Alex Dadgar d3a9463358 Fix MemorySwappiness on Windows Docker
Fixes https://github.com/hashicorp/nomad/issues/3181
2017-09-10 17:46:45 -07:00
Alex Dadgar 3ec7946b3e Fix invalid CPU stats on Windows
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420. The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
Alex Dadgar 637ae9580a Fix docker user specified syslogging 2017-09-10 14:57:48 -07:00
James Nugent 448145872f client: Guard against "NaN" values from floats
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Alex Dadgar 31f9e099d9 Merge pull request #3148 from clinta/purge-stopped
Always purge stopped containers
2017-09-05 17:18:05 -07:00
Alex Dadgar 6fdaf38389 Fix repo name passed to docker credential helpers
This PR fixes the server url passed to docker credential helpers and
fixes stderr capture.

Fixes https://github.com/hashicorp/nomad/issues/2957
2017-09-05 16:43:21 -07:00
Alex Dadgar 21564c7c04 Parse Docker mounts correctly (#3163)
* Parse Docker mounts correctly

This PR fixes the parsing of Docker mounts and adds testing to ensure no
regressions.

Fixes https://github.com/hashicorp/nomad/issues/3156

* Review feedback
2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo 0ef43c3c5f final code review fixups 2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo dea1fa089b fix up travis test failure via race condition 2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo a8cbd0b559 fixups from code review 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo f72e4aad13 labels depend on full setup of client beforehand 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo 87a814397d refactor to use baseLabels 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo b2953d905a pass in commonly used values 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo c634043069 create base labels to be used in every metric 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo f5ea83da8d emit metrics using labels, add option for backwards compatibility 2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo 0175f80775 add metrics options to client config 2017-09-05 14:12:57 +00:00
Armon Dadgar b8bf35f087 ACL RPCs allow stale reads for scalability 2017-09-04 13:07:44 -07:00
Armon Dadgar f31cd6a618 client: fixing policy resolution after ACL endpoint enforcement 2017-09-04 13:05:53 -07:00
Armon Dadgar ddcc5f89bc Add ErrPermissionDenied, rename TokenNotFound 2017-09-04 13:05:53 -07:00
Armon Dadgar 76a03f2d8e Address @dadgar feedback 2017-09-04 13:05:53 -07:00
Armon Dadgar e3f32ca6f1 client: adding token resolution logic 2017-09-04 13:05:36 -07:00
Armon Dadgar 688897561b client: adding token cache for ACL resolution 2017-09-04 13:05:36 -07:00
Armon Dadgar c2e72e8a9c client: create ACL and Policy cache 2017-09-04 13:05:35 -07:00
Armon Dadgar 792f176a44 agent: thread ACL config to client 2017-09-04 13:04:45 -07:00
Clint Armstrong b5c2636313 Always purge stopped containers 2017-08-31 14:28:48 -04:00
Clint Armstrong 7e35ab6abb fix logging re-init 2017-08-30 12:36:31 -04:00
Michael Schurter 78823d559b Squelch logspam when unable to get disk usage stats
To reproduce logspam:

```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar 876732833f Merge pull request #3073 from clinta/docker-500
Allow retry of 500 API errors to be handled by restart policies
2017-08-24 16:57:36 -07:00
Alex Dadgar fd7d614ae4 Handle interfaces that only have link-local addrs
This PR changes the fingerprint handling of network interfaces that only
contain link local addresses. The new behavior is to prefer globally
routable addresses and if none are detected, to fall back to link local
addresses if the operator hasn't disallowed it. This gives us pre 0.6
behavior for interfaces with only link local addresses but 0.6+ behavior
for IPv6 interfaces that will always have a link-local address.

Fixes https://github.com/hashicorp/nomad/issues/3005

/cc diptanuc
2017-08-23 15:32:22 -07:00
Alex Dadgar 211a793530 resolve feedback 2017-08-23 14:17:00 -07:00
Alex Dadgar 653733e093 Clean up docker mounts 2017-08-22 14:12:44 -07:00
Clint Armstrong ae230395ba Allow retry of 500 API errors to be handled by restart policies 2017-08-22 14:04:46 -04:00
Michael Schurter 51a27cc83d Merge pull request #3031 from hashicorp/f-2924-consul-headers
Add Header and Method support for HTTP checks
2017-08-18 13:35:08 -07:00
Michael Schurter 7ebd429a86 Merge mistake made go fmt fail 2017-08-18 13:19:44 -07:00
Michael Schurter 5c015da3cb Merge pull request #3021 from clinta/docker-mount2
Expose docker mount options
2017-08-17 16:57:09 -07:00
Michael Schurter ff3944a981 Update and test service/check interpolation 2017-08-17 16:49:14 -07:00
Michael Schurter b4813747d0 Merge pull request #3043 from hashicorp/f-2441-shutdown-delay
Add optional shutdown delay to tasks
2017-08-17 14:37:48 -07:00
Michael Schurter c709251ed6 Lower ShutdownDelay for non-Travis testing 2017-08-17 14:23:42 -07:00
Michael Schurter b33b2fb4c0 Lower shutdown delay in test 2017-08-17 13:57:22 -07:00
Michael Schurter 0726ca75e3 Make shutdown delay log DEBUG, not INFO 2017-08-17 11:28:33 -07:00
Clint Armstrong f0460156ae restrict mount to volume type 2017-08-17 09:52:13 -04:00
Michael Schurter d529b422b2 Add optional shutdown delay to tasks
Fixes #2441

Defaults to 0 (no delay) for backward compat and because this feature
should be opt-in.
2017-08-16 17:59:46 -07:00
Alex Dadgar d6187cd3e8 Fix tests 2017-08-16 16:26:52 -07:00
Alex Dadgar 1a86aecf55 Add version package
This PR adds a version package and consolidates version strings into a
Version struct.
2017-08-16 15:44:21 -07:00
Alex Dadgar 3d69961c3a Must be root for TestAllocDir_CreateDir 2017-08-16 10:46:14 -07:00
Alex Dadgar 7dd86b5dfe Merge pull request #3025 from hashicorp/f-health-events
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar bb165b97ef comments 2017-08-15 12:23:29 -07:00
Michael Schurter 1126268a81 Fix formatting 2017-08-15 10:37:02 -07:00
Michael Schurter 74d5c272c6 Cleanup comments and return val 2017-08-14 16:59:03 -07:00
Michael Schurter 46b7fd45d7 spelling 2017-08-14 16:55:59 -07:00
Michael Schurter de8ea243b6 Return move errors from local Migrate like remote
Since alloc runner just logs these errors and continues there's no
reason not to return it.
2017-08-14 16:48:56 -07:00
Michael Schurter 7342e23669 Move migrating state into prevAllocWatcher 2017-08-14 16:02:28 -07:00
Alex Dadgar fdc0115427 test 2017-08-12 14:42:53 -07:00
Alex Dadgar 56801349eb Refactor health watcher and emit events 2017-08-12 14:23:36 -07:00
Michael Schurter 4601419d63 Soft fail on migration errors 2017-08-11 16:50:30 -07:00
Michael Schurter 3dbd764969 Exit if alloc listener closes
Add test for that case, add comments, remove debug logging
2017-08-11 16:22:02 -07:00
Michael Schurter b7915bdac7 Update tests for new blocking/migrating code 2017-08-11 16:21:57 -07:00
Michael Schurter ad6cec9e82 Set failed status instead of panic'ing
Fixup some TODOs and formatting left from new prevAllocWatcher code.
2017-08-11 16:21:35 -07:00
Michael Schurter e41a654917 switch from alloc blocker to new interface
interface has 3 implementations:

1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter ee04717a0b initial attempt at refactoring blocked/migrating 2017-08-11 16:21:35 -07:00
Michael Schurter ec6e6e6c66 Only set alloc status if it's not already terminal 2017-08-11 16:21:35 -07:00
Alex Dadgar 0d5127d5fc Merge pull request #3011 from hashicorp/b-cv-fix-TestEnvAWSFingerprint_aws
Updated AWS fingerprint test for ami-id
2017-08-11 10:58:22 -07:00
Alex Dadgar 2fdfd9af4a Merge pull request #2992 from decoomanj/master
Added dnsoptions to the docker driver
2017-08-11 10:12:36 -07:00
Charlie Voiselle 507c75bd16 Updated AWS fingerprint test for ami-id
In https://github.com/hashicorp/nomad/pull/2999, I changed ami-id
to non-unique.  This updates the test to reflect that.
2017-08-11 12:54:27 -04:00
Jan De Cooman 8b88d56c01 updated message in test 2017-08-11 09:24:15 +02:00
Alex Dadgar 1b061b8f47 Unmount task directories when alloc is terminal
This PR unmounts directories from tasks when the alloc is terminal
rather than when it is garbage collected.

/cc @angrycub
2017-08-10 13:28:17 -07:00
Alex Dadgar 6e20acb503 Merge pull request #2984 from hashicorp/b-tags
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar 6b238edc22 Merge pull request #3001 from hashicorp/f-template-events
Template emits events explaining why it is blocked
2017-08-10 13:00:58 -07:00
Alex Dadgar bd9f63d20e address comments 2017-08-10 13:00:06 -07:00
Clint Armstrong 9063b500e0 expose mount options to nomad 2017-08-10 12:37:17 -04:00
Alex Dadgar 83ba2f1814 Template emits events explaining why it is blocked
This PR does the following:
* Adds a mechanism to emit events in the TaskRunner
* Vendors a new version of Consul-Template that allows extraction of
missing dependencies
* Adds logic to our consul_template.go to determine missing events and
emit them in a batched fashion.
* Refactors the consul_template code to split the run method and take in
a config struct rather than many parameters.

Fixes https://github.com/hashicorp/nomad/issues/2578
2017-08-09 18:01:27 -07:00
Charlie Voiselle ae466eaaa7 AMI ID is potentally non-unique
Changed the keys map to reflect that.
2017-08-09 12:53:54 -04:00
Jan De Cooman 633bcee661 fixed typo 2017-08-09 14:44:38 +02:00
Jan De Cooman 804fc0d06f added dnsoptions to the docker driver 2017-08-09 13:30:06 +02:00
Alex Dadgar aba107be99 Merge pull request #2979 from lfarnell/cleanup
Code cleanup
2017-08-08 10:21:15 -07:00
Alex Dadgar 4f6f6a13c8 Emit generic task events 2017-08-07 21:26:04 -07:00
Alex Dadgar 79d25b7db9 Merge pull request #2947 from hashicorp/f-vault-grace
Allow template to set Vault grace
2017-08-07 16:29:53 -07:00
Alex Dadgar 93b9a1bf20 Rename runnerConfig 2017-08-07 16:29:42 -07:00
Alex Dadgar d86b3977b9 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Michael Schurter c76b3b54b9 Merge branch 'master' into fix-pending-state 2017-08-03 17:27:03 -07:00
Alex Dadgar 067a638478 Allow template to set Vault grace
This PR allows a template to specify the Vault grace duration.

Fixes https://github.com/hashicorp/nomad/issues/2922
2017-08-01 14:14:08 -07:00
Alex Dadgar 562ea52c8e vendor vault api 2017-08-01 09:30:55 -07:00