open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	4601419d63	Soft fail on migration errors	2017-08-11 16:50:30 -07:00
Michael Schurter	ad6cec9e82	Set failed status instead of panic'ing Fixup some TODOs and formatting left from new prevAllocWatcher code.	2017-08-11 16:21:35 -07:00
Michael Schurter	e41a654917	switch from alloc blocker to new interface interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block	2017-08-11 16:21:35 -07:00
Michael Schurter	ee04717a0b	initial attempt at refactoring blocked/migrating	2017-08-11 16:21:35 -07:00
Michael Schurter	ec6e6e6c66	Only set alloc status if it's not already terminal	2017-08-11 16:21:35 -07:00
Alex Dadgar	1b061b8f47	Unmount task directories when alloc is terminal This PR unmounts directories from tasks when the alloc is terminal rather than when it is garbage collected. /cc @angrycub	2017-08-10 13:28:17 -07:00
Alex Dadgar	83ba2f1814	Template emits events explaining why it is blocked This PR does the following: * Adds a mechanism to emit events in the TaskRunner * Vendors a new version of Consul-Template that allows extraction of missing dependencies * Adds logic to our consul_template.go to determine missing events and emit them in a batched fashion. * Refactors the consul_template code to split the run method and take in a config struct rather than many parameters. Fixes https://github.com/hashicorp/nomad/issues/2578	2017-08-09 18:01:27 -07:00
Alex Dadgar	4f6f6a13c8	Emit generic task events	2017-08-07 21:26:04 -07:00
Michael Schurter	c76b3b54b9	Merge branch 'master' into fix-pending-state	2017-08-03 17:27:03 -07:00
Michael Schurter	b01dd31f26	Don't attempt to restore tasks that never sync'd	2017-07-24 15:58:46 -07:00
Michael Schurter	9a7a1d8c13	Fix race by not accessing tr.task from ar	2017-07-21 16:16:53 -07:00
Michael Schurter	2e9a1e3fa6	Remove unneeded saveTaskRunnerState method Collapse it into the one place it's called	2017-07-21 16:16:02 -07:00
Alex Dadgar	09c8ee621b	Destroy tasks that are part of terminal alloc	2017-07-20 12:02:04 -07:00
Alex Dadgar	64776b1370	Should not persist state after alloc_runner is garbage collected	2017-07-19 17:31:30 -07:00
Michael Schurter	c1b8bef813	Use broadcast send retry logic everywhere	2017-07-18 14:36:32 -07:00
Alex Dadgar	d2381c9263	Merge pull request #2853 from hashicorp/b-watcher Improve alloc health watcher	2017-07-18 14:12:28 -07:00
Alex Dadgar	bd43bd509c	Save deployment status	2017-07-18 12:37:52 -07:00
Alex Dadgar	41f67e3535	Small fixes	2017-07-18 12:19:57 -07:00
Michael Schurter	c24e73ede7	Fix deadlock caused by syncing during destroy When replacing an alloc the new alloc is blocked until the old alloc is destroyed. This could cause a deadlock: 1. Destroying the old alloc includes a final sync of its status 2. Syncing status causes a GC 3. A GC looks for terminal allocs to cleanup 4. The GC waits for an alloc to stop completely before GC'ing If the GC chooses the currently-being-destroyed-alloc to GC, the GC deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged until the Nomad process is restarted. Performing the final sync asynchronously is an ugly hack but prevents the deadlock by allowing the final sync to occur after the alloc runner has shutdown and been destroyed.	2017-07-18 11:12:56 -07:00
Michael Schurter	cdb2e96d99	Add AllocRunner.allocID for ease-of-use Since the AllocRunner.alloc struct can be mutated, most of AllocRunner needs to acquire a lock to get the alloc's ID. Log lines always need to include the alloc ID, so we often skipped acquiring a lock just to grab the ID and accepted the race. Let's make the race detector a little happier by storing the ID in a single assignment field.	2017-07-17 15:46:54 -07:00
Michael Schurter	181fda825a	Fix log level	2017-07-17 15:46:54 -07:00
Michael Schurter	98f6e7f10f	Don't fail if task dirs don't exist on creation Task dir metadata is created in AllocRunner.Run which may not run before an alloc is sync'd and Nomad exits. There's no reason not to just create task dir metadata on restore if it doesn't exist.	2017-07-17 15:46:54 -07:00
Michael Schurter	51515cbe0c	Ensure allocDir is never nil and persisted safely Fixes #2834	2017-07-17 15:46:54 -07:00
Michael Schurter	e9a416b731	Merge branch 'master' into fix-pending-state	2017-07-10 10:43:23 -07:00
unknown	26b16fa3ce	#2563 fixed pending state for allocations with terminal status	2017-07-09 16:18:06 +03:00
Alex Dadgar	bf97a2455c	Vet and small improvement on watcher failure detection	2017-07-07 14:53:01 -07:00
Alex Dadgar	ade9a7c768	@jippi Changed my mind! Good suggestion	2017-07-07 12:12:48 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00
Alex Dadgar	001058227e	watcher per alloc	2017-07-07 12:07:08 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Michael Schurter	2d741c770b	Merge pull request #2732 from hashicorp/b-persist-alloc-updates Persist Alloc when EvalID changes	2017-07-03 14:46:43 -07:00
Michael Schurter	5ec52ec24a	Destroy task group leader first Before this commit all tasks in a task group were destroyed concurrently. This meant logging sidecars might be stopped before the leader task whose logs still need to be shipped. This commit blocks on the leader shutting down before signalling to followers to shutdown.	2017-07-03 13:56:56 -07:00
Michael Schurter	cff8546035	Fix spelling & re-add immutable state struct	2017-06-23 13:01:39 -07:00
Michael Schurter	d359d3b554	Rename immutable -> alloc meh; naming is hard	2017-06-23 10:58:36 -07:00
Michael Schurter	af2fc0f1bc	Persist Alloc when EvalID changes	2017-06-22 17:33:12 -07:00
Alex Dadgar	ee8dd84965	Fix nil job on allocation The way the copying was happening on the alloc_runner was by temporarily setting the alloc.Job to nil, copying and then restoring it. This created an issue in which when the alloc was shared (which it is in server/client mode and between alloc_runner/task_runner) there were race conditions that could create a panic. Fixes https://github.com/hashicorp/nomad/issues/2605	2017-05-17 14:07:06 -04:00
Alex Dadgar	3cd7e06fba	Fix test	2017-05-09 11:35:48 -07:00
Alex Dadgar	ba70cc4f01	Merge branch 'master' into f-bolt-db	2017-05-09 11:11:55 -07:00
Alex Dadgar	843bc26e5d	Respond to comments	2017-05-09 10:50:24 -07:00
Alex Dadgar	730e49a598	Helpful comment	2017-05-03 11:27:33 -07:00
Alex Dadgar	1d8444bc1e	Fix tests	2017-05-03 11:15:30 -07:00
Alex Dadgar	e00f9c9413	Restore state + upgrade path	2017-05-02 18:21:49 -07:00
Alex Dadgar	ec101b4760	Revert "metrics" This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.	2017-05-02 09:28:11 -07:00
Alex Dadgar	e010fdf8c0	metrics	2017-05-01 14:51:27 -07:00
Alex Dadgar	d779defe65	Use batching	2017-05-01 14:50:34 -07:00
Alex Dadgar	b94f855326	boltDB database for client state	2017-05-01 14:50:34 -07:00
Alex Dadgar	bddedd7aba	Don't deepcopy job when retrieving copy of Alloc This PR removes deepcopying of the job attached to the allocation in the alloc runner. This operation is called very often so removing reflect from the code path and the potentially large number of mallocs need to create a job reduced memory and cpu pressure.	2017-05-01 14:50:34 -07:00
Michael Schurter	095d2ee340	Switch java/exec to use Exec in Executor	2017-04-21 16:25:49 -07:00
Michael Schurter	a305b68159	Restart tasks on upgrade with script checks and old executors	2017-04-21 16:25:49 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00

1 2 3 4

167 Commits