Chelsea Holland Komlo
ede4f518bd
add setters for access to the fingerprint manager's node
...
refactor extracting driver info
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
f479da19f5
guard against overwriting health status
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
ece1618815
immediately set healthy to false when driver moves to undetected
2018-04-10 15:29:51 -04:00
Chelsea Komlo
d3bd8fb96e
Merge pull request #4109 from hashicorp/f-shorten-docker-health-timeout
...
Shorten docker health timeout
2018-04-09 15:38:39 -04:00
Chelsea Holland Komlo
ea4b65dd41
only initialize docker clients if they are nil
2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo
288c7a33a1
refacotoring simplification from code review
2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo
6e3b056c37
only run health check if driver moves from undetected to detected
2018-04-09 10:10:43 -04:00
Alex Dadgar
ae1f76477e
Start rebalance after discovering new servers
2018-04-05 15:41:59 -07:00
Alex Dadgar
929b6823a3
Merge pull request #4106 from hashicorp/b-servers
...
Improved Client handling of failed RPCs
2018-04-05 13:48:50 -07:00
Alex Dadgar
be2513e0f9
more jitter
2018-04-05 13:48:33 -07:00
Chelsea Holland Komlo
d3637825ef
group similar functions; update comments
...
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
e8743f1f7b
remove do once block when creating a new docker client
...
only set cached connections upon no error
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
d0d793fc23
use client with shorter timeouts for health checks
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
5d1b2b77cb
refactor docker clients method to be able to extend to creating new clients
2018-04-05 16:19:02 -04:00
Alex Dadgar
bd3345942c
Handle no leader and faster retries near limit
...
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar
279b5c22e5
Scale heartbeat retrying based on remaining heartbeat time
2018-04-05 10:58:13 -07:00
Alex Dadgar
7941f4eb2d
Fire retry only when consul discovers new servers
2018-04-05 10:40:17 -07:00
Preetha
6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
...
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
12ba4c45da
remove outdated commented out test code
2018-04-04 15:03:24 -05:00
Preetha Appan
6363a6fb4d
Remove old comment
2018-04-04 15:01:48 -05:00
Preetha Appan
5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests.
2018-04-04 14:38:15 -05:00
Alex Dadgar
86c32358d4
Spelling error
2018-04-03 18:30:01 -07:00
Alex Dadgar
01a6beafbf
RPC Retry Watcher
2018-04-03 18:05:28 -07:00
Preetha Appan
e6bbce3fa0
Add comment
2018-04-03 19:49:03 -05:00
Alex Dadgar
ec844f19d9
randomize servers
2018-04-03 17:46:13 -07:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Alex Dadgar
86f9044676
remove generated files
2018-03-30 16:52:49 -07:00
Alex Dadgar
af81349dbe
Generated files
2018-03-30 16:14:40 -07:00
Michael Schurter
257ba5937d
test: don't rely on alloc runner update count
...
We were incorrectly relying on the count of alloc updates in a number of
tests. Since alloc updates are async, their number is non-determinstic
and largely meaningless.
This should fix quite a few flaky tests in Travis and prevent future
mistaken assumptions in tests.
2018-03-30 09:34:33 -07:00
Michael Schurter
62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
...
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
beee130a6e
Always capture the finish time
2018-03-29 11:27:22 -07:00
Michael Schurter
91b5bb58d9
add HasHealth helper for nil checks
...
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
4338360da9
Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change
...
Emit first node event after initialization on health status change
2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo
2174ede6b9
add clarifying comment
2018-03-29 10:58:39 -04:00
Michael Schurter
3a79c32677
Merge pull request #4059 from hashicorp/b-drain-health-svc-only
...
only service allocs should have health watched
2018-03-28 16:49:22 -07:00
Michael Schurter
5eb0cb7176
only service allocs should have health watched
2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo
e3319afee1
emit first node event
2018-03-28 17:26:53 -04:00
Chelsea Komlo
7812ac5abf
Merge pull request #4057 from hashicorp/specify-docker-msg
...
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha
177d2d6010
Merge pull request #4052 from hashicorp/f-specify-total-memory
...
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo
efc03e252c
specify driver health messages
2018-03-28 11:35:21 -04:00
Preetha Appan
329428b49f
Code review feedback and unit test
2018-03-28 10:07:15 -05:00
Charlie Voiselle
ea10588227
rkt: logging enhancements ( #4044 )
...
* Added extra debug logging; extended timeout; added jitter.
* small log changes
* increase timeout
* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter
fcaee471a0
client: always mark exited sys/svc allocs as failed
...
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya
1017cbe8ab
Allow to specify total memory on agent configuration
...
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Alex Dadgar
432784dae3
Fix alloc watcher snapshot streaming
2018-03-27 11:14:53 -07:00
Alex Dadgar
05449fea09
drop stats fetching log
2018-03-23 12:01:50 -07:00
Chelsea Komlo
5f0c382021
Merge pull request #4030 from hashicorp/health-check-ux
...
UX improvments to driver health checks
2018-03-23 09:46:50 -04:00
Alex Dadgar
da27fc3880
Driver Info output
2018-03-22 17:18:32 -07:00