open-nomad

Author	SHA1	Message	Date
Chelsea Holland Komlo	f7ef13cc64	fingerprinter should set health check status if health check is not periodic	2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo	ede4f518bd	add setters for access to the fingerprint manager's node refactor extracting driver info	2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo	f479da19f5	guard against overwriting health status	2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo	ece1618815	immediately set healthy to false when driver moves to undetected	2018-04-10 15:29:51 -04:00
Chelsea Komlo	d3bd8fb96e	Merge pull request #4109 from hashicorp/f-shorten-docker-health-timeout Shorten docker health timeout	2018-04-09 15:38:39 -04:00
Chelsea Holland Komlo	ea4b65dd41	only initialize docker clients if they are nil	2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo	288c7a33a1	refacotoring simplification from code review	2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo	6e3b056c37	only run health check if driver moves from undetected to detected	2018-04-09 10:10:43 -04:00
Alex Dadgar	ae1f76477e	Start rebalance after discovering new servers	2018-04-05 15:41:59 -07:00
Alex Dadgar	929b6823a3	Merge pull request #4106 from hashicorp/b-servers Improved Client handling of failed RPCs	2018-04-05 13:48:50 -07:00
Alex Dadgar	be2513e0f9	more jitter	2018-04-05 13:48:33 -07:00
Chelsea Holland Komlo	d3637825ef	group similar functions; update comments health check timeout should be 1 minute	2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo	e8743f1f7b	remove do once block when creating a new docker client only set cached connections upon no error	2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo	d0d793fc23	use client with shorter timeouts for health checks	2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo	5d1b2b77cb	refactor docker clients method to be able to extend to creating new clients	2018-04-05 16:19:02 -04:00
Alex Dadgar	bd3345942c	Handle no leader and faster retries near limit Handle the ErrNoLeader case and apply slower retries. Also when we have missed the heartbeat retry aggressively, backing off after we have missed for more than 30 seconds.	2018-04-05 11:22:47 -07:00
Alex Dadgar	279b5c22e5	Scale heartbeat retrying based on remaining heartbeat time	2018-04-05 10:58:13 -07:00
Alex Dadgar	7941f4eb2d	Fire retry only when consul discovers new servers	2018-04-05 10:40:17 -07:00
Preetha	6254d75eee	Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes Fixes edge cases around timing/ task finish time being set more than once	2018-04-04 16:18:21 -05:00
Preetha Appan	12ba4c45da	remove outdated commented out test code	2018-04-04 15:03:24 -05:00
Preetha Appan	6363a6fb4d	Remove old comment	2018-04-04 15:01:48 -05:00
Preetha Appan	5e4525bd30	Moves setting finishedAt to the right place and adds two unit tests.	2018-04-04 14:38:15 -05:00
Alex Dadgar	86c32358d4	Spelling error	2018-04-03 18:30:01 -07:00
Alex Dadgar	01a6beafbf	RPC Retry Watcher	2018-04-03 18:05:28 -07:00
Preetha Appan	e6bbce3fa0	Add comment	2018-04-03 19:49:03 -05:00
Alex Dadgar	ec844f19d9	randomize servers	2018-04-03 17:46:13 -07:00
Preetha Appan	00537c739b	Fixes edge cases around timing and task finish time being set more than once	2018-04-03 16:34:59 -05:00
Alex Dadgar	58a3ec3fb2	Improve Vault error handling	2018-04-03 14:29:22 -07:00
Alex Dadgar	86f9044676	remove generated files	2018-03-30 16:52:49 -07:00
Alex Dadgar	af81349dbe	Generated files	2018-03-30 16:14:40 -07:00
Michael Schurter	257ba5937d	test: don't rely on alloc runner update count We were incorrectly relying on the count of alloc updates in a number of tests. Since alloc updates are async, their number is non-determinstic and largely meaningless. This should fix quite a few flaky tests in Travis and prevent future mistaken assumptions in tests.	2018-03-30 09:34:33 -07:00
Michael Schurter	62e9553333	Merge pull request #4069 from hashicorp/f-hashealth add HasHealth helper for nil checks	2018-03-29 17:03:20 -07:00
Alex Dadgar	beee130a6e	Always capture the finish time	2018-03-29 11:27:22 -07:00
Michael Schurter	91b5bb58d9	add HasHealth helper for nil checks We performed the DeploymentStatus nil checks a couple different ways, so hopefully this helper will consoldiate them and make it more clear what the code is doing.	2018-03-29 09:29:19 -07:00
Chelsea Komlo	4338360da9	Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change Emit first node event after initialization on health status change	2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo	2174ede6b9	add clarifying comment	2018-03-29 10:58:39 -04:00
Michael Schurter	3a79c32677	Merge pull request #4059 from hashicorp/b-drain-health-svc-only only service allocs should have health watched	2018-03-28 16:49:22 -07:00
Michael Schurter	5eb0cb7176	only service allocs should have health watched	2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo	e3319afee1	emit first node event	2018-03-28 17:26:53 -04:00
Chelsea Komlo	7812ac5abf	Merge pull request #4057 from hashicorp/specify-docker-msg Specify docker name in driver health messages	2018-03-28 13:32:36 -04:00
Preetha	177d2d6010	Merge pull request #4052 from hashicorp/f-specify-total-memory Allow to specify total memory on agent configuration	2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo	efc03e252c	specify driver health messages	2018-03-28 11:35:21 -04:00
Preetha Appan	329428b49f	Code review feedback and unit test	2018-03-28 10:07:15 -05:00
Charlie Voiselle	ea10588227	rkt: logging enhancements (#4044 ) * Added extra debug logging; extended timeout; added jitter. * small log changes * increase timeout * remove unneccessary uuid	2018-03-27 17:30:06 -07:00
Michael Schurter	fcaee471a0	client: always mark exited sys/svc allocs as failed When restarts.attempts=0 was set in a jobspec a system or service alloc that exited with 0 status would be marked as `completed` instead of `failed`. Since system and service jobs are intended to run until stopped or updated, they should always be marked as failed when they exit even in cases where the exit code is 0.	2018-03-27 14:30:19 -07:00
Mildred Ki'Lya	1017cbe8ab	Allow to specify total memory on agent configuration Allow to set the total memory of an agent in its configuration file. This can be used in case the automatic detection doesn't work or in specific environments when memory overcommit (using swap for example) can be desirable.	2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo	003bc209b9	use time.Time for node events for compatibility	2018-03-27 15:43:57 -04:00
Alex Dadgar	432784dae3	Fix alloc watcher snapshot streaming	2018-03-27 11:14:53 -07:00
Alex Dadgar	05449fea09	drop stats fetching log	2018-03-23 12:01:50 -07:00
Chelsea Komlo	5f0c382021	Merge pull request #4030 from hashicorp/health-check-ux UX improvments to driver health checks	2018-03-23 09:46:50 -04:00

1 2 3 4 5 ...

2986 commits