open-nomad

Commit Graph

Author	SHA1	Message	Date
Alex Dadgar	552604451c	rework where time gets set	2018-05-07 14:50:01 -05:00
Alex Dadgar	ee50789c22	Initial implementation	2018-05-07 14:50:01 -05:00
Preetha Appan	6363a6fb4d	Remove old comment	2018-04-04 15:01:48 -05:00
Preetha Appan	5e4525bd30	Moves setting finishedAt to the right place and adds two unit tests.	2018-04-04 14:38:15 -05:00
Preetha Appan	e6bbce3fa0	Add comment	2018-04-03 19:49:03 -05:00
Preetha Appan	00537c739b	Fixes edge cases around timing and task finish time being set more than once	2018-04-03 16:34:59 -05:00
Alex Dadgar	beee130a6e	Always capture the finish time	2018-03-29 11:27:22 -07:00
Josh Soref	09970343b5	spelling: destruction	2018-03-11 17:54:39 +00:00
Michael Schurter	ca946679f6	Destroy partially migrated alloc dirs Test that snapshot errors don't return a valid tar currently fails.	2017-11-29 17:26:11 -08:00
Michael Schurter	f86f0bd9ea	Handle leader task being dead in RestoreState Fixes the panic mentioned in https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932 While a leader task dying serially stops all follower tasks, the synchronizing of state is asynchrnous. Nomad can shutdown before all follower tasks have updated their state to dead thus saving the state necessary to hit this panic: have a non-terminal alloc with a dead leader. The actual fix is a simple nil check to not assume non-terminal allocs leader's have a TaskRunner.	2017-11-15 10:36:13 -08:00
Alex Dadgar	b4af10edde	Alloc Runner doesn't panic on restoration.	2017-11-02 16:14:13 -07:00
Diptanu Choudhury	cb68889652	Added the node_id as a tag	2017-11-02 13:29:10 -07:00
Diptanu Choudhury	8a9d0d40b1	Added support for tagged metrics	2017-11-02 10:07:57 -07:00
Diptanu Choudhury	5f522c6de3	Incrementing the start counter when we are actually starting a container	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	44535e5d10	Recording counter for dead allocs properly	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	0b34e811b7	Added metrics to track task/alloc start/restarts/dead events	2017-11-02 09:51:20 -07:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Michael Schurter	8a87475498	Use existing restart policy infrastructure	2017-09-14 16:46:54 -07:00
Alex Dadgar	1a86aecf55	Add version package This PR adds a version package and consolidates version strings into a Version struct.	2017-08-16 15:44:21 -07:00
Michael Schurter	7342e23669	Move migrating state into prevAllocWatcher	2017-08-14 16:02:28 -07:00
Michael Schurter	4601419d63	Soft fail on migration errors	2017-08-11 16:50:30 -07:00
Michael Schurter	ad6cec9e82	Set failed status instead of panic'ing Fixup some TODOs and formatting left from new prevAllocWatcher code.	2017-08-11 16:21:35 -07:00
Michael Schurter	e41a654917	switch from alloc blocker to new interface interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block	2017-08-11 16:21:35 -07:00
Michael Schurter	ee04717a0b	initial attempt at refactoring blocked/migrating	2017-08-11 16:21:35 -07:00
Michael Schurter	ec6e6e6c66	Only set alloc status if it's not already terminal	2017-08-11 16:21:35 -07:00
Alex Dadgar	1b061b8f47	Unmount task directories when alloc is terminal This PR unmounts directories from tasks when the alloc is terminal rather than when it is garbage collected. /cc @angrycub	2017-08-10 13:28:17 -07:00
Alex Dadgar	83ba2f1814	Template emits events explaining why it is blocked This PR does the following: * Adds a mechanism to emit events in the TaskRunner * Vendors a new version of Consul-Template that allows extraction of missing dependencies * Adds logic to our consul_template.go to determine missing events and emit them in a batched fashion. * Refactors the consul_template code to split the run method and take in a config struct rather than many parameters. Fixes https://github.com/hashicorp/nomad/issues/2578	2017-08-09 18:01:27 -07:00
Alex Dadgar	4f6f6a13c8	Emit generic task events	2017-08-07 21:26:04 -07:00
Michael Schurter	c76b3b54b9	Merge branch 'master' into fix-pending-state	2017-08-03 17:27:03 -07:00
Michael Schurter	b01dd31f26	Don't attempt to restore tasks that never sync'd	2017-07-24 15:58:46 -07:00
Michael Schurter	9a7a1d8c13	Fix race by not accessing tr.task from ar	2017-07-21 16:16:53 -07:00
Michael Schurter	2e9a1e3fa6	Remove unneeded saveTaskRunnerState method Collapse it into the one place it's called	2017-07-21 16:16:02 -07:00
Alex Dadgar	09c8ee621b	Destroy tasks that are part of terminal alloc	2017-07-20 12:02:04 -07:00
Alex Dadgar	64776b1370	Should not persist state after alloc_runner is garbage collected	2017-07-19 17:31:30 -07:00
Michael Schurter	c1b8bef813	Use broadcast send retry logic everywhere	2017-07-18 14:36:32 -07:00
Alex Dadgar	d2381c9263	Merge pull request #2853 from hashicorp/b-watcher Improve alloc health watcher	2017-07-18 14:12:28 -07:00
Alex Dadgar	bd43bd509c	Save deployment status	2017-07-18 12:37:52 -07:00
Alex Dadgar	41f67e3535	Small fixes	2017-07-18 12:19:57 -07:00
Michael Schurter	c24e73ede7	Fix deadlock caused by syncing during destroy When replacing an alloc the new alloc is blocked until the old alloc is destroyed. This could cause a deadlock: 1. Destroying the old alloc includes a final sync of its status 2. Syncing status causes a GC 3. A GC looks for terminal allocs to cleanup 4. The GC waits for an alloc to stop completely before GC'ing If the GC chooses the currently-being-destroyed-alloc to GC, the GC deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged until the Nomad process is restarted. Performing the final sync asynchronously is an ugly hack but prevents the deadlock by allowing the final sync to occur after the alloc runner has shutdown and been destroyed.	2017-07-18 11:12:56 -07:00
Michael Schurter	cdb2e96d99	Add AllocRunner.allocID for ease-of-use Since the AllocRunner.alloc struct can be mutated, most of AllocRunner needs to acquire a lock to get the alloc's ID. Log lines always need to include the alloc ID, so we often skipped acquiring a lock just to grab the ID and accepted the race. Let's make the race detector a little happier by storing the ID in a single assignment field.	2017-07-17 15:46:54 -07:00
Michael Schurter	181fda825a	Fix log level	2017-07-17 15:46:54 -07:00
Michael Schurter	98f6e7f10f	Don't fail if task dirs don't exist on creation Task dir metadata is created in AllocRunner.Run which may not run before an alloc is sync'd and Nomad exits. There's no reason not to just create task dir metadata on restore if it doesn't exist.	2017-07-17 15:46:54 -07:00
Michael Schurter	51515cbe0c	Ensure allocDir is never nil and persisted safely Fixes #2834	2017-07-17 15:46:54 -07:00
Michael Schurter	e9a416b731	Merge branch 'master' into fix-pending-state	2017-07-10 10:43:23 -07:00
unknown	26b16fa3ce	#2563 fixed pending state for allocations with terminal status	2017-07-09 16:18:06 +03:00
Alex Dadgar	bf97a2455c	Vet and small improvement on watcher failure detection	2017-07-07 14:53:01 -07:00
Alex Dadgar	ade9a7c768	@jippi Changed my mind! Good suggestion	2017-07-07 12:12:48 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00

1 2 3 4

189 Commits