open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	c1b8bef813	Use broadcast send retry logic everywhere	2017-07-18 14:36:32 -07:00
Alex Dadgar	d2381c9263	Merge pull request #2853 from hashicorp/b-watcher Improve alloc health watcher	2017-07-18 14:12:28 -07:00
Alex Dadgar	bd43bd509c	Save deployment status	2017-07-18 12:37:52 -07:00
Alex Dadgar	41f67e3535	Small fixes	2017-07-18 12:19:57 -07:00
Michael Schurter	c24e73ede7	Fix deadlock caused by syncing during destroy When replacing an alloc the new alloc is blocked until the old alloc is destroyed. This could cause a deadlock: 1. Destroying the old alloc includes a final sync of its status 2. Syncing status causes a GC 3. A GC looks for terminal allocs to cleanup 4. The GC waits for an alloc to stop completely before GC'ing If the GC chooses the currently-being-destroyed-alloc to GC, the GC deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged until the Nomad process is restarted. Performing the final sync asynchronously is an ugly hack but prevents the deadlock by allowing the final sync to occur after the alloc runner has shutdown and been destroyed.	2017-07-18 11:12:56 -07:00
Michael Schurter	420be86e39	Test AllocDir.Copy	2017-07-17 15:46:54 -07:00
Michael Schurter	cdb2e96d99	Add AllocRunner.allocID for ease-of-use Since the AllocRunner.alloc struct can be mutated, most of AllocRunner needs to acquire a lock to get the alloc's ID. Log lines always need to include the alloc ID, so we often skipped acquiring a lock just to grab the ID and accepted the race. Let's make the race detector a little happier by storing the ID in a single assignment field.	2017-07-17 15:46:54 -07:00
Michael Schurter	181fda825a	Fix log level	2017-07-17 15:46:54 -07:00
Michael Schurter	98f6e7f10f	Don't fail if task dirs don't exist on creation Task dir metadata is created in AllocRunner.Run which may not run before an alloc is sync'd and Nomad exits. There's no reason not to just create task dir metadata on restore if it doesn't exist.	2017-07-17 15:46:54 -07:00
Michael Schurter	51515cbe0c	Ensure allocDir is never nil and persisted safely Fixes #2834	2017-07-17 15:46:54 -07:00
Alex Dadgar	0821ee67f5	Fix alloc broadcaster panic on double close	2017-07-17 14:09:05 -07:00
Michael Schurter	0a6bf87365	Fix nil panic in Docker error condition Fixes #2835 Yet another bug caused by overwriting container and then trying to reference container.ID in the err handling block. Did a quick audit of docker.go and it seems to be the last offender. See #2804 for previous bug.	2017-07-14 10:48:19 -07:00
Alex Dadgar	05894f4611	Small fixes	2017-07-07 17:34:50 -07:00
Michael Schurter	fecb16cfb2	Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername Propagate vault.tls_server_name to consul-template	2017-07-07 16:44:19 -07:00
Michael Schurter	95a9a5da71	Merge pull request #2787 from hashicorp/f-docker-test-mac Test #2652 - Docker MAC Address option	2017-07-07 16:22:10 -07:00
Michael Schurter	4be4df21c9	Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip Add driver.docker.bridge_ip node attribute	2017-07-07 16:20:20 -07:00
Michael Schurter	94389c3ecc	Remove debug logging	2017-07-07 16:19:42 -07:00
Michael Schurter	5e3e3818db	Merge pull request #2804 from hashicorp/b-2802-docker-panic Don't panic in container list/remove/inspect race	2017-07-07 15:35:51 -07:00
Michael Schurter	67a7b0eac9	Don't panic in container list/remove/inspect race Fixes #2802 While it's hard to reproduce the theoretical race is: 1. This goroutine calls ListContainers() 2. Another goroutine removes a container X 3. This goroutine attempts to InspectContainer(X) However, this bug could be hit in the much simpler case of InspectContainer() timing out. In those cases an error is returned and the old code attempted to wrap the error with the now-nil container.ID. Storing the container ID fixes that panic.	2017-07-07 15:10:59 -07:00
Alex Dadgar	bf97a2455c	Vet and small improvement on watcher failure detection	2017-07-07 14:53:01 -07:00
Alex Dadgar	45712c6ca3	test fixes	2017-07-07 14:11:27 -07:00
Alex Dadgar	ade9a7c768	@jippi Changed my mind! Good suggestion	2017-07-07 12:12:48 -07:00
Alex Dadgar	c063eba836	Warn log	2017-07-07 12:10:04 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00
Alex Dadgar	001058227e	watcher per alloc	2017-07-07 12:07:08 -07:00
Alex Dadgar	2e2fd26bed	Update index	2017-07-07 12:07:08 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Alex Dadgar	c77944ed29	assign names	2017-07-07 12:03:11 -07:00
Michael Schurter	084dd384c1	Add driver.docker.bridge_ip node attribute Fixes #2785	2017-07-07 10:14:10 -07:00
Michael Schurter	d38d48151a	Propagate vault.tls_server_name to consul-template Fixes #2776	2017-07-06 16:56:50 -07:00
Michael Schurter	39edf23fd5	Merge pull request #2786 from hashicorp/f-docker-auth-soft-fail Default to auth hard fail but optionally soft fail	2017-07-06 13:25:56 -07:00
Michael Schurter	bae1b7db2d	Test #2652 Also cleanup docker config opts docs	2017-07-06 12:46:25 -07:00
Michael Schurter	8f4353779a	Merge branch 'master' into master	2017-07-06 12:09:36 -07:00
Michael Schurter	2900f941b5	Default to auth hard fail but optionally soft fail	2017-07-06 11:35:34 -07:00
Michael Schurter	08b452adf5	Merge pull request #2781 from hashicorp/f-2678-getter-mode Add support for go-getter modes	2017-07-06 11:06:40 -07:00
Michael Schurter	b000bb8598	Merge pull request #2744 from aep/master Do not fail when no docker registry auth is available	2017-07-06 11:04:11 -07:00
Michael Schurter	0d3bdf7210	Add support for go-getter modes Fixes #2678	2017-07-06 10:45:44 -07:00
Michael Schurter	644f0cfaa4	Consistently quote alloc ids in client logs	2017-07-06 10:24:52 -07:00
Michael Schurter	4fd9ef6a8c	Tiny client race condition fix Plus some logging improvements that may help with #2563	2017-07-05 16:15:19 -07:00
Michael Schurter	8e2e26c607	rkt: use %s instead of %q when interpolating env Fixes #2686	2017-07-05 09:36:17 -07:00
Michael Schurter	b2382f99f2	0 compute == error	2017-07-03 14:51:02 -07:00
Michael Schurter	ecf090e980	Fix cpu_total_compute override	2017-07-03 14:51:02 -07:00
Michael Schurter	2d741c770b	Merge pull request #2732 from hashicorp/b-persist-alloc-updates Persist Alloc when EvalID changes	2017-07-03 14:46:43 -07:00
Michael Schurter	56a6f8ca8a	Merge pull request #2763 from hashicorp/f-bad-state-help Add more logging to restore state errors	2017-07-03 14:45:03 -07:00
Michael Schurter	9d4b0651ef	Merge pull request #2753 from hashicorp/b-leader-dies-first Destroy task group leader first	2017-07-03 14:38:04 -07:00
Michael Schurter	6e7cc3964e	Merge pull request #2709 from hashicorp/f-advertise-docker-ips Advertise driver-specific addresses	2017-07-03 14:04:12 -07:00
Michael Schurter	5ec52ec24a	Destroy task group leader first Before this commit all tasks in a task group were destroyed concurrently. This meant logging sidecars might be stopped before the leader task whose logs still need to be shipped. This commit blocks on the leader shutting down before signalling to followers to shutdown.	2017-07-03 13:56:56 -07:00
Michael Schurter	596727230b	Suggest wiping out alloc dir too	2017-07-03 12:29:21 -07:00
Michael Schurter	11f68bfca2	Add more logging to restore state errors	2017-07-03 11:58:41 -07:00
Arvid E. Picciani	aa4f029f10	Do not fail when no docker registry auth is available this amends the behaviour introduced with #2651 and allows pulling public images when docker.auth.helper is set	2017-06-27 11:11:18 +02:00

1 2 3 4 5 ...

2394 Commits