open-nomad

Author	SHA1	Message	Date
Michael Schurter	9a7a1d8c13	Fix race by not accessing tr.task from ar	2017-07-21 16:16:53 -07:00
Michael Schurter	2e9a1e3fa6	Remove unneeded saveTaskRunnerState method Collapse it into the one place it's called	2017-07-21 16:16:02 -07:00
Alex Dadgar	09c8ee621b	Destroy tasks that are part of terminal alloc	2017-07-20 12:02:04 -07:00
Alex Dadgar	64776b1370	Should not persist state after alloc_runner is garbage collected	2017-07-19 17:31:30 -07:00
Michael Schurter	c1b8bef813	Use broadcast send retry logic everywhere	2017-07-18 14:36:32 -07:00
Alex Dadgar	d2381c9263	Merge pull request #2853 from hashicorp/b-watcher Improve alloc health watcher	2017-07-18 14:12:28 -07:00
Alex Dadgar	bd43bd509c	Save deployment status	2017-07-18 12:37:52 -07:00
Alex Dadgar	41f67e3535	Small fixes	2017-07-18 12:19:57 -07:00
Michael Schurter	c24e73ede7	Fix deadlock caused by syncing during destroy When replacing an alloc the new alloc is blocked until the old alloc is destroyed. This could cause a deadlock: 1. Destroying the old alloc includes a final sync of its status 2. Syncing status causes a GC 3. A GC looks for terminal allocs to cleanup 4. The GC waits for an alloc to stop completely before GC'ing If the GC chooses the currently-being-destroyed-alloc to GC, the GC deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged until the Nomad process is restarted. Performing the final sync asynchronously is an ugly hack but prevents the deadlock by allowing the final sync to occur after the alloc runner has shutdown and been destroyed.	2017-07-18 11:12:56 -07:00
Michael Schurter	cdb2e96d99	Add AllocRunner.allocID for ease-of-use Since the AllocRunner.alloc struct can be mutated, most of AllocRunner needs to acquire a lock to get the alloc's ID. Log lines always need to include the alloc ID, so we often skipped acquiring a lock just to grab the ID and accepted the race. Let's make the race detector a little happier by storing the ID in a single assignment field.	2017-07-17 15:46:54 -07:00
Michael Schurter	181fda825a	Fix log level	2017-07-17 15:46:54 -07:00
Michael Schurter	98f6e7f10f	Don't fail if task dirs don't exist on creation Task dir metadata is created in AllocRunner.Run which may not run before an alloc is sync'd and Nomad exits. There's no reason not to just create task dir metadata on restore if it doesn't exist.	2017-07-17 15:46:54 -07:00
Michael Schurter	51515cbe0c	Ensure allocDir is never nil and persisted safely Fixes #2834	2017-07-17 15:46:54 -07:00
Alex Dadgar	bf97a2455c	Vet and small improvement on watcher failure detection	2017-07-07 14:53:01 -07:00
Alex Dadgar	ade9a7c768	@jippi Changed my mind! Good suggestion	2017-07-07 12:12:48 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00
Alex Dadgar	001058227e	watcher per alloc	2017-07-07 12:07:08 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Michael Schurter	2d741c770b	Merge pull request #2732 from hashicorp/b-persist-alloc-updates Persist Alloc when EvalID changes	2017-07-03 14:46:43 -07:00
Michael Schurter	5ec52ec24a	Destroy task group leader first Before this commit all tasks in a task group were destroyed concurrently. This meant logging sidecars might be stopped before the leader task whose logs still need to be shipped. This commit blocks on the leader shutting down before signalling to followers to shutdown.	2017-07-03 13:56:56 -07:00
Michael Schurter	cff8546035	Fix spelling & re-add immutable state struct	2017-06-23 13:01:39 -07:00
Michael Schurter	d359d3b554	Rename immutable -> alloc meh; naming is hard	2017-06-23 10:58:36 -07:00
Michael Schurter	af2fc0f1bc	Persist Alloc when EvalID changes	2017-06-22 17:33:12 -07:00
Alex Dadgar	ee8dd84965	Fix nil job on allocation The way the copying was happening on the alloc_runner was by temporarily setting the alloc.Job to nil, copying and then restoring it. This created an issue in which when the alloc was shared (which it is in server/client mode and between alloc_runner/task_runner) there were race conditions that could create a panic. Fixes https://github.com/hashicorp/nomad/issues/2605	2017-05-17 14:07:06 -04:00
Alex Dadgar	3cd7e06fba	Fix test	2017-05-09 11:35:48 -07:00
Alex Dadgar	ba70cc4f01	Merge branch 'master' into f-bolt-db	2017-05-09 11:11:55 -07:00
Alex Dadgar	843bc26e5d	Respond to comments	2017-05-09 10:50:24 -07:00
Alex Dadgar	730e49a598	Helpful comment	2017-05-03 11:27:33 -07:00
Alex Dadgar	1d8444bc1e	Fix tests	2017-05-03 11:15:30 -07:00
Alex Dadgar	e00f9c9413	Restore state + upgrade path	2017-05-02 18:21:49 -07:00
Alex Dadgar	ec101b4760	Revert "metrics" This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.	2017-05-02 09:28:11 -07:00
Alex Dadgar	e010fdf8c0	metrics	2017-05-01 14:51:27 -07:00
Alex Dadgar	d779defe65	Use batching	2017-05-01 14:50:34 -07:00
Alex Dadgar	b94f855326	boltDB database for client state	2017-05-01 14:50:34 -07:00
Alex Dadgar	bddedd7aba	Don't deepcopy job when retrieving copy of Alloc This PR removes deepcopying of the job attached to the allocation in the alloc runner. This operation is called very often so removing reflect from the code path and the potentially large number of mallocs need to create a job reduced memory and cpu pressure.	2017-05-01 14:50:34 -07:00
Michael Schurter	095d2ee340	Switch java/exec to use Exec in Executor	2017-04-21 16:25:49 -07:00
Michael Schurter	a305b68159	Restart tasks on upgrade with script checks and old executors	2017-04-21 16:25:49 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00
Alex Dadgar	61f4a2dac6	Sync allocation state before waiting for a destroy This change ensures that the client syncs allocation state with the servers before entering its wait loop for the allocation to be destroyed. Fixes https://github.com/hashicorp/nomad/issues/2563	2017-04-14 13:09:54 -07:00
Alex Dadgar	c52000f792	FinishedAt only records when the task has actually started	2017-03-31 17:06:05 -07:00
Alex Dadgar	81b78f77e1	Track task start/finish time & improve logs errors This PR adds tracking to when a task starts and finishes and the logs API takes advantage of this and returns better errors when asking for logs that do not exist.	2017-03-31 16:14:11 -07:00
Alex Dadgar	8238a8601e	Address comment	2017-03-09 21:05:34 -08:00
Alex Dadgar	9011a7984c	Add metrics to show allocations on the client This PR adds the following metrics to the client: client.allocations.migrating client.allocations.blocked client.allocations.pending client.allocations.running client.allocations.terminal Also adds some missing fields to the API version of the evaluation.	2017-03-09 12:37:41 -08:00
Alex Dadgar	5be806a3df	Fix vet script and fix vet problems This PR fixes our vet script and fixes all the missed vet changes. It also fixes pointers being printed in `nomad stop <job>` and `nomad node-status <node>`.	2017-02-27 16:00:19 -08:00
Alex Dadgar	238b4bcafd	Add Leader support to client	2017-02-10 17:55:19 -08:00
Michael Schurter	acd11f678d	Add COMPAT comment	2017-01-06 11:39:17 -08:00
Michael Schurter	86fcf96f72	Put a logger in AllocDir/TaskDir	2017-01-05 16:31:56 -08:00
Michael Schurter	5a6bd19eb7	Fix upgrade path for #2132 AllocRunner's state dropped the Context struct which needs to be converted to the new AllocDir+TaskDir structs in RestoreState. TaskRunner added a TaskDirBuilt flag, but it's safe to just let that default to `false` and rebuild all task dirs once on upgrade.	2017-01-05 16:31:55 -08:00
Michael Schurter	774afd8800	Fail fast on taskdir errors	2017-01-05 16:31:55 -08:00
Michael Schurter	3ea09ba16a	Move chroot building into TaskRunner * Refactor AllocDir to have a TaskDir struct per task. * Drivers expose filesystem isolation preference * Fix lxc mounting of `secrets/`	2017-01-05 16:31:49 -08:00

1 2 3 4

155 commits