Michael Schurter
4dbaa52aba
Fold SetFailure into SetRestartTriggered
2017-09-14 16:48:39 -07:00
Michael Schurter
ed77c0944b
DRY up restart handling a bit.
...
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter
73fb71ca10
RestartDelay isn't needed as checks are re-added on restarts
...
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter
06dd86adbd
Remove unused lastStart field
2017-09-14 16:47:41 -07:00
Michael Schurter
0447f79288
Removed partially implemented allocLock
2017-09-14 16:47:41 -07:00
Michael Schurter
ade29ecbed
Improve check watcher logging and add tests
...
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter
a137676358
Add comments and move delay calc to TaskRunner
2017-09-14 16:46:54 -07:00
Michael Schurter
8a87475498
Use existing restart policy infrastructure
2017-09-14 16:46:54 -07:00
Michael Schurter
22690c5f4c
Add check watcher for restarting unhealthy tasks
2017-09-14 16:46:54 -07:00
Alex Dadgar
d306da846c
changelog and feedback
2017-09-14 14:08:58 -07:00
Alex Dadgar
07ed83fdd5
Non-locked accessors to common Node fields
...
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.
Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Komlo
536d38454b
Merge pull request #3191 from hashicorp/b-tagged-metrics-panic
...
Fix panic in emitting tagged allocation metrics
2017-09-11 14:28:50 -04:00
Armon Dadgar
d4aed839d2
Merge pull request #3185 from hashicorp/f-acl-reset
...
Add ability to reset ACL bootstrap process
2017-09-11 10:47:17 -07:00
Armon Dadgar
3d5ecaafff
Address @dadgar feedback
2017-09-11 10:30:59 -07:00
Alex Dadgar
b3958faa14
Merge pull request #3187 from hashicorp/b-windows-docker
...
Fix MemorySwappiness on Windows Docker
2017-09-11 09:56:49 -07:00
Alex Dadgar
1cd8f7523f
Merge pull request #3184 from hashicorp/b-docker-logging
...
Fix docker user specified syslogging
2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo
848af92183
fix panic in emitting tagged metrics
2017-09-11 15:32:37 +00:00
Alex Dadgar
d3a9463358
Fix MemorySwappiness on Windows Docker
...
Fixes https://github.com/hashicorp/nomad/issues/3181
2017-09-10 17:46:45 -07:00
Alex Dadgar
3ec7946b3e
Fix invalid CPU stats on Windows
...
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420 . The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
Alex Dadgar
637ae9580a
Fix docker user specified syslogging
2017-09-10 14:57:48 -07:00
James Nugent
448145872f
client: Guard against "NaN" values from floats
...
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Alex Dadgar
31f9e099d9
Merge pull request #3148 from clinta/purge-stopped
...
Always purge stopped containers
2017-09-05 17:18:05 -07:00
Alex Dadgar
6fdaf38389
Fix repo name passed to docker credential helpers
...
This PR fixes the server url passed to docker credential helpers and
fixes stderr capture.
Fixes https://github.com/hashicorp/nomad/issues/2957
2017-09-05 16:43:21 -07:00
Alex Dadgar
21564c7c04
Parse Docker mounts correctly ( #3163 )
...
* Parse Docker mounts correctly
This PR fixes the parsing of Docker mounts and adds testing to ensure no
regressions.
Fixes https://github.com/hashicorp/nomad/issues/3156
* Review feedback
2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo
0ef43c3c5f
final code review fixups
2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
dea1fa089b
fix up travis test failure via race condition
2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo
a8cbd0b559
fixups from code review
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f72e4aad13
labels depend on full setup of client beforehand
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
87a814397d
refactor to use baseLabels
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
b2953d905a
pass in commonly used values
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
c634043069
create base labels to be used in every metric
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f5ea83da8d
emit metrics using labels, add option for backwards compatibility
2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo
0175f80775
add metrics options to client config
2017-09-05 14:12:57 +00:00
Armon Dadgar
b8bf35f087
ACL RPCs allow stale reads for scalability
2017-09-04 13:07:44 -07:00
Armon Dadgar
f31cd6a618
client: fixing policy resolution after ACL endpoint enforcement
2017-09-04 13:05:53 -07:00
Armon Dadgar
ddcc5f89bc
Add ErrPermissionDenied, rename TokenNotFound
2017-09-04 13:05:53 -07:00
Armon Dadgar
76a03f2d8e
Address @dadgar feedback
2017-09-04 13:05:53 -07:00
Armon Dadgar
e3f32ca6f1
client: adding token resolution logic
2017-09-04 13:05:36 -07:00
Armon Dadgar
688897561b
client: adding token cache for ACL resolution
2017-09-04 13:05:36 -07:00
Armon Dadgar
c2e72e8a9c
client: create ACL and Policy cache
2017-09-04 13:05:35 -07:00
Armon Dadgar
792f176a44
agent: thread ACL config to client
2017-09-04 13:04:45 -07:00
Clint Armstrong
b5c2636313
Always purge stopped containers
2017-08-31 14:28:48 -04:00
Clint Armstrong
7e35ab6abb
fix logging re-init
2017-08-30 12:36:31 -04:00
Michael Schurter
78823d559b
Squelch logspam when unable to get disk usage stats
...
To reproduce logspam:
```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar
876732833f
Merge pull request #3073 from clinta/docker-500
...
Allow retry of 500 API errors to be handled by restart policies
2017-08-24 16:57:36 -07:00
Alex Dadgar
fd7d614ae4
Handle interfaces that only have link-local addrs
...
This PR changes the fingerprint handling of network interfaces that only
contain link local addresses. The new behavior is to prefer globally
routable addresses and if none are detected, to fall back to link local
addresses if the operator hasn't disallowed it. This gives us pre 0.6
behavior for interfaces with only link local addresses but 0.6+ behavior
for IPv6 interfaces that will always have a link-local address.
Fixes https://github.com/hashicorp/nomad/issues/3005
/cc diptanuc
2017-08-23 15:32:22 -07:00
Alex Dadgar
211a793530
resolve feedback
2017-08-23 14:17:00 -07:00
Alex Dadgar
653733e093
Clean up docker mounts
2017-08-22 14:12:44 -07:00
Clint Armstrong
ae230395ba
Allow retry of 500 API errors to be handled by restart policies
2017-08-22 14:04:46 -04:00
Michael Schurter
51a27cc83d
Merge pull request #3031 from hashicorp/f-2924-consul-headers
...
Add Header and Method support for HTTP checks
2017-08-18 13:35:08 -07:00
Michael Schurter
7ebd429a86
Merge mistake made go fmt fail
2017-08-18 13:19:44 -07:00
Michael Schurter
5c015da3cb
Merge pull request #3021 from clinta/docker-mount2
...
Expose docker mount options
2017-08-17 16:57:09 -07:00
Michael Schurter
ff3944a981
Update and test service/check interpolation
2017-08-17 16:49:14 -07:00
Michael Schurter
b4813747d0
Merge pull request #3043 from hashicorp/f-2441-shutdown-delay
...
Add optional shutdown delay to tasks
2017-08-17 14:37:48 -07:00
Michael Schurter
c709251ed6
Lower ShutdownDelay for non-Travis testing
2017-08-17 14:23:42 -07:00
Michael Schurter
b33b2fb4c0
Lower shutdown delay in test
2017-08-17 13:57:22 -07:00
Michael Schurter
0726ca75e3
Make shutdown delay log DEBUG, not INFO
2017-08-17 11:28:33 -07:00
Clint Armstrong
f0460156ae
restrict mount to volume type
2017-08-17 09:52:13 -04:00
Michael Schurter
d529b422b2
Add optional shutdown delay to tasks
...
Fixes #2441
Defaults to 0 (no delay) for backward compat and because this feature
should be opt-in.
2017-08-16 17:59:46 -07:00
Alex Dadgar
d6187cd3e8
Fix tests
2017-08-16 16:26:52 -07:00
Alex Dadgar
1a86aecf55
Add version package
...
This PR adds a version package and consolidates version strings into a
Version struct.
2017-08-16 15:44:21 -07:00
Alex Dadgar
3d69961c3a
Must be root for TestAllocDir_CreateDir
2017-08-16 10:46:14 -07:00
Alex Dadgar
7dd86b5dfe
Merge pull request #3025 from hashicorp/f-health-events
...
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar
bb165b97ef
comments
2017-08-15 12:23:29 -07:00
Michael Schurter
1126268a81
Fix formatting
2017-08-15 10:37:02 -07:00
Michael Schurter
74d5c272c6
Cleanup comments and return val
2017-08-14 16:59:03 -07:00
Michael Schurter
46b7fd45d7
spelling
2017-08-14 16:55:59 -07:00
Michael Schurter
de8ea243b6
Return move errors from local Migrate like remote
...
Since alloc runner just logs these errors and continues there's no
reason not to return it.
2017-08-14 16:48:56 -07:00
Michael Schurter
7342e23669
Move migrating state into prevAllocWatcher
2017-08-14 16:02:28 -07:00
Alex Dadgar
fdc0115427
test
2017-08-12 14:42:53 -07:00
Alex Dadgar
56801349eb
Refactor health watcher and emit events
2017-08-12 14:23:36 -07:00
Michael Schurter
4601419d63
Soft fail on migration errors
2017-08-11 16:50:30 -07:00
Michael Schurter
3dbd764969
Exit if alloc listener closes
...
Add test for that case, add comments, remove debug logging
2017-08-11 16:22:02 -07:00
Michael Schurter
b7915bdac7
Update tests for new blocking/migrating code
2017-08-11 16:21:57 -07:00
Michael Schurter
ad6cec9e82
Set failed status instead of panic'ing
...
Fixup some TODOs and formatting left from new prevAllocWatcher code.
2017-08-11 16:21:35 -07:00
Michael Schurter
e41a654917
switch from alloc blocker to new interface
...
interface has 3 implementations:
1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter
ee04717a0b
initial attempt at refactoring blocked/migrating
2017-08-11 16:21:35 -07:00
Michael Schurter
ec6e6e6c66
Only set alloc status if it's not already terminal
2017-08-11 16:21:35 -07:00
Alex Dadgar
0d5127d5fc
Merge pull request #3011 from hashicorp/b-cv-fix-TestEnvAWSFingerprint_aws
...
Updated AWS fingerprint test for ami-id
2017-08-11 10:58:22 -07:00
Alex Dadgar
2fdfd9af4a
Merge pull request #2992 from decoomanj/master
...
Added dnsoptions to the docker driver
2017-08-11 10:12:36 -07:00
Charlie Voiselle
507c75bd16
Updated AWS fingerprint test for ami-id
...
In https://github.com/hashicorp/nomad/pull/2999 , I changed ami-id
to non-unique. This updates the test to reflect that.
2017-08-11 12:54:27 -04:00
Jan De Cooman
8b88d56c01
updated message in test
2017-08-11 09:24:15 +02:00
Alex Dadgar
1b061b8f47
Unmount task directories when alloc is terminal
...
This PR unmounts directories from tasks when the alloc is terminal
rather than when it is garbage collected.
/cc @angrycub
2017-08-10 13:28:17 -07:00
Alex Dadgar
6e20acb503
Merge pull request #2984 from hashicorp/b-tags
...
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar
6b238edc22
Merge pull request #3001 from hashicorp/f-template-events
...
Template emits events explaining why it is blocked
2017-08-10 13:00:58 -07:00
Alex Dadgar
bd9f63d20e
address comments
2017-08-10 13:00:06 -07:00
Clint Armstrong
9063b500e0
expose mount options to nomad
2017-08-10 12:37:17 -04:00
Alex Dadgar
83ba2f1814
Template emits events explaining why it is blocked
...
This PR does the following:
* Adds a mechanism to emit events in the TaskRunner
* Vendors a new version of Consul-Template that allows extraction of
missing dependencies
* Adds logic to our consul_template.go to determine missing events and
emit them in a batched fashion.
* Refactors the consul_template code to split the run method and take in
a config struct rather than many parameters.
Fixes https://github.com/hashicorp/nomad/issues/2578
2017-08-09 18:01:27 -07:00
Charlie Voiselle
ae466eaaa7
AMI ID is potentally non-unique
...
Changed the keys map to reflect that.
2017-08-09 12:53:54 -04:00
Jan De Cooman
633bcee661
fixed typo
2017-08-09 14:44:38 +02:00
Jan De Cooman
804fc0d06f
added dnsoptions to the docker driver
2017-08-09 13:30:06 +02:00
Alex Dadgar
aba107be99
Merge pull request #2979 from lfarnell/cleanup
...
Code cleanup
2017-08-08 10:21:15 -07:00
Alex Dadgar
4f6f6a13c8
Emit generic task events
2017-08-07 21:26:04 -07:00
Alex Dadgar
79d25b7db9
Merge pull request #2947 from hashicorp/f-vault-grace
...
Allow template to set Vault grace
2017-08-07 16:29:53 -07:00
Alex Dadgar
93b9a1bf20
Rename runnerConfig
2017-08-07 16:29:42 -07:00
Alex Dadgar
d86b3977b9
Fix alloc health with checks using interpolation
...
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.
Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell
f0ced87b95
fixed all spelling mistakes for goreport
2017-08-07 17:13:05 -04:00
Michael Schurter
c76b3b54b9
Merge branch 'master' into fix-pending-state
2017-08-03 17:27:03 -07:00
Alex Dadgar
067a638478
Allow template to set Vault grace
...
This PR allows a template to specify the Vault grace duration.
Fixes https://github.com/hashicorp/nomad/issues/2922
2017-08-01 14:14:08 -07:00
Alex Dadgar
562ea52c8e
vendor vault api
2017-08-01 09:30:55 -07:00