Commit Graph

263 Commits

Author SHA1 Message Date
Michael Schurter c5f222e4a6 Update created resources before exiting cleanup 2017-01-19 16:48:23 -08:00
Michael Schurter a93d43a9cf Exit early when cleanup succeeds 2017-01-19 15:07:01 -08:00
Michael Schurter 85f68aa00c Fix incorrect lock usage 2017-01-19 11:39:18 -08:00
Michael Schurter a3a3656dbb Switch to use recoverable errors from Cleanup
TaskRunner handles retrying but Cleanup handles all of CreatedResources.
2017-01-13 16:46:08 -08:00
Michael Schurter dc68aa1a5a Return errors from cleanup and let TaskRunner retry 2017-01-12 17:21:54 -08:00
Michael Schurter 4d081490e6 Add Cleanup method to Driver interface
Cleanup can be used for cleaning up resources created by drivers to run
a task. Initially the Docker driver is the only user (to remove
downloaded images).
2017-01-11 17:23:33 -08:00
Alex Dadgar 4127a3ea9d Merge pull request #2173 from hashicorp/b-stats
Don't retrieve Driver Stats if unsupported
2017-01-10 13:32:03 -08:00
Alex Dadgar 2be221d664 Don't retrieve Driver Stats if unsupported
This PR makes us only try to collect stats once if the Driver doesn't
support collecting stats.

Fixes https://github.com/hashicorp/nomad/issues/1986
2017-01-09 13:47:06 -08:00
Alex Dadgar 26e2c5bb74 Merge pull request #2164 from hashicorp/b-dispatch
Create Task directory structure in the Run method
2017-01-09 11:24:46 -08:00
Alex Dadgar 2a5fd85e3b Move to Run() 2017-01-08 13:55:12 -08:00
Alex Dadgar 2affef2972 Create task directory during Prestart() 2017-01-08 13:55:12 -08:00
Alex Dadgar 4ffd9a69e5 Send Driver events to servers immediately
This PR causes driver events to be sent to the server immediately rather
than waiting for Prestart() to finish.
2017-01-08 13:54:43 -08:00
Michael Schurter 3ea09ba16a Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Alex Dadgar 8d5f0fea69 Merge pull request #2128 from hashicorp/f-dispatch
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Michael Schurter ea92cd102a Append host env vars on every task env 2016-12-20 12:24:24 -08:00
Michael Schurter 2aa235f8f2 Rename InitializationMessage to DriverMessage 2016-12-20 11:51:09 -08:00
Alex Dadgar 159c819e08 Client writes payload to disk 2016-12-16 15:11:56 -08:00
Michael Schurter 770ed703d0 Add Driver.Prestart method
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.

Currently Prestart does two things:

* Any driver specific task environment building
* Download Docker images

This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar 960424f086 Merge pull request #1941 from hashicorp/b-complete-transistion
Task state "dead" is terminal
2016-11-04 17:16:10 -07:00
Alex Dadgar e6465e138b More precise marking of dead 2016-11-04 17:11:07 -07:00
Alex Dadgar 0fb7742c3c Task state "dead" is terminal 2016-11-04 16:57:24 -07:00
Alex Dadgar 8b7adb20e9 Fix tests 2016-11-04 15:10:18 -07:00
Alex Dadgar 4e8d39d674 Unique task 2016-11-04 14:53:37 -07:00
Alex Dadgar 4741a4b129 Create container much more robust 2016-11-04 14:39:56 -07:00
Alex Dadgar cd1791ed09 Download artifacts before templates 2016-10-31 11:29:26 -07:00
Alex Dadgar 6618f7a03d Fix passing of recoverable error from docker pull 2016-10-28 17:49:46 -07:00
Alex Dadgar fde7a24865 Consul-template fixes + PreviousAlloc in api 2016-10-28 15:50:35 -07:00
Alex Dadgar 4082732d3a Interpolate and then validate services 2016-10-25 14:27:49 -07:00
Alex Dadgar da8b05ba17 Fix merge 2016-10-24 17:04:10 -07:00
Alex Dadgar 03eba049ed Merge pull request #1848 from hashicorp/f-vault-error
Thread through whether DeriveToken error is recoverable or not
2016-10-24 15:01:18 -07:00
Alex Dadgar ede3a814ba Small fixes 2016-10-22 18:20:50 -07:00
Alex Dadgar 0070178741 Thread through whether DeriveToken error is recoverable or not 2016-10-22 18:08:30 -07:00
Alex Dadgar 46a7d1a0d7 Change how we mark tasks as failed and allow consul-template to fail tasks 2016-10-20 17:27:16 -07:00
Alex Dadgar b384bff053 Feedback 2016-10-18 15:01:04 -07:00
Alex Dadgar ba0b3963ef Comments 2016-10-18 11:36:04 -07:00
Alex Dadgar 4f8bfd7b18 Tests 2016-10-18 11:24:20 -07:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Alex Dadgar 53eeec9bc1 Merge pull request #1801 from hashicorp/f-signals
Consul-template signal change mode
2016-10-18 11:23:47 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Alex Dadgar bc35eaee21 Task runner sends signals 2016-10-10 15:09:00 -07:00
Alex Dadgar e2d49eb4a2 Comments 2016-10-06 15:21:59 -07:00
Alex Dadgar 68c5fe78f8 Tests 2016-10-06 15:17:34 -07:00
Alex Dadgar 8fb07bb083 Fix handling of restart in TaskEvents 2016-10-06 15:06:54 -07:00
Alex Dadgar 8eb7fa91cf Start of integration 2016-10-06 15:05:49 -07:00
Alex Dadgar 50efdb00e9 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Diptanu Choudhury f7a9b39e8c Ensuring that we are not emitting stats when handle is nil (#1723)
* Ensuring that we are not emitting stats when handle is nil

* Updated the changelog
2016-09-20 11:29:34 -07:00
Alex Dadgar ec152a6d12 Clean up vault client 2016-09-14 18:10:56 -07:00
Alex Dadgar 6702a29071 Vault token threaded 2016-09-14 13:30:01 -07:00
Michael Schurter 6cb6d9cdf1 Lock around saving state
Prevent interleaving state syncs as it could conceivably lead to
empty state files as per #1367
2016-09-02 16:07:06 -07:00
Vishal Nayak b6b73545ea Merge pull request #1606 from hashicorp/f-vault-client
VaultClient for Nomad client's interactions with Vault
2016-08-30 13:13:54 -04:00
Michael Schurter d31f373a5b Merge pull request #1653 from hashicorp/b-fix-artifact-retry
Don't fail other tasks when retrying artifact get
2016-08-26 09:53:39 -07:00
Michael Schurter 5ce26f82fe Don't fail other tasks when retrying artifact get
The artifact fetching may be retried and succeed, so don't set the task
as dead.

Fixes #1558
2016-08-25 13:16:41 -07:00
Ivo Verberk 9113244131 Don't duplicate TaskKilled event and check for TaskSiblingFailed. 2016-08-25 20:11:10 +02:00
vishalnayak 56e42cf03d Employ DeriveVaultToken API and flesh-up DeriveToken 2016-08-24 12:29:59 -04:00
Alex Dadgar 1da8566322 Merge pull request #1580 from hashicorp/f-disk-usage-monitoring
Monitor and enforce shared allocation directory disk usage
2016-08-23 09:49:53 -07:00
Diptanu Choudhury 4ca623bcfe blocking chained allocations until previous allocation hasn't terminated 2016-08-22 11:34:24 -05:00
Ivo Verberk 2a17895a83 Disk resource monitoring and enforcement 2016-08-18 07:59:03 +02:00
Diptanu Choudhury 28b3f511e0 Fixed some error messages 2016-08-10 15:17:32 -07:00
Kenjiro Nakayama 5c621b74e5 tiny: Return fmt.Errorf instead of duplicated error messages 2016-08-09 08:57:26 +09:00
Diptanu Choudhury 70d2f8ef1d Merge pull request #1534 from nak3/fix-intask_runner
tiny: print task name and error message for SaveState error
2016-08-08 13:37:25 -04:00
Kenjiro Nakayama e7863ea8ee tiny: print task name and error message for the SaveState error in task_runner 2016-08-07 13:33:58 +09:00
Kenjiro Nakayama 60b58eed84 Update GetArtifact by removing unused logger 2016-08-06 23:37:32 +09:00
Diptanu Choudhury 41b540fbc8 Allow operators to opt into publishing node and alloc metrics 2016-08-01 19:52:20 -07:00
Alex Dadgar 90748cedad Add killing event and mark task as not running when killed 2016-07-21 15:49:54 -07:00
Alex Dadgar c35b1be845 Set running when restoring 2016-06-28 13:47:59 -07:00
Diptanu Choudhury 88ac1b33a4 Not emitting per-pid stats and added the total ticks consumed by a Task 2016-06-20 17:30:25 -07:00
Alex Dadgar fe588a2469 Guard against restoring a nil task in task_runner 2016-06-16 11:55:40 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Diptanu Choudhury fd60cfd585 Emitting client resource usage metrics as guages instead of k/v pairs 2016-06-11 22:17:32 +02:00
Alex Dadgar b7e3a45fef fix channel being nil on restore 2016-06-07 15:03:08 -07:00
Diptanu Choudhury c21d606ebb Getting inodes used percent back 2016-06-06 16:10:34 -07:00
Alex Dadgar ba1a92eb8c Handle errors during stats collection 2016-06-03 14:23:18 -07:00
Diptanu Choudhury 667b478f3f Merge pull request #1226 from hashicorp/f-push-stats
Push Resource Usage stats to remote sinks
2016-06-02 23:14:59 +02:00
Diptanu Choudhury 35e31c1b81 Enqueing metrics only if they are not nil 2016-06-02 17:14:15 -04:00
Diptanu Choudhury 7efde782fa Sending metrics for tasks as well 2016-06-01 16:42:16 +02:00
Alex Dadgar 4e15611339 fix wait result being nil and some panics in the cli 2016-05-31 23:09:05 +00:00
Diptanu Choudhury f95b1d00c3 Renamed error message in alloc endpoint 2016-05-28 20:03:52 -07:00
Diptanu Choudhury c0dc6cfbf2 Changing the api of the stats endpoints 2016-05-28 19:59:20 -07:00
Diptanu Choudhury fa9b0dd7e8 Implemented the resource usage ts since a time 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 77ac2dd624 Initializing the ring buffer with no cells 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 0b0d0764e4 Changed signature of Allocation Stats Reporter 2016-05-28 19:59:20 -07:00
Diptanu Choudhury c46400597e Making the stats collection interval and number of data points to keep in memory configurable 2016-05-28 19:59:20 -07:00
Diptanu Choudhury d2021e2953 Changed the signature of ResourceUsageTS 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 05c221186b Added disk usage to node status 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 84cd943c48 Stopping stats collection of tasks which has been destroyed 2016-05-28 19:59:20 -07:00
Diptanu Choudhury b9feae89ce Making the conversion to Stats simpler 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 91d2cf319e Added some documentation 2016-05-28 19:42:34 -07:00
Diptanu Choudhury f3d0aecafe Reporting time series of stats 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 0fb0e0237f Added a client API to display resource usage of an allocation 2016-05-28 19:42:34 -07:00
Alex Dadgar 831909dcce pass a copy of the task to the task environment 2016-05-05 22:01:17 -07:00
Alex Dadgar 483fa975d7 createDriver expects task environment 2016-04-13 14:24:08 -07:00
Alex Dadgar dc63c24e59 interpet the artifact source 2016-04-11 18:46:16 -07:00
Alex Dadgar 23c1173269 ArtifactDownloaded in task runner state 2016-03-28 17:24:10 -07:00
Alex Dadgar f64f03f87e Test task failure killing TG and fix setting the task as received on a restore 2016-03-25 12:51:40 -07:00
Alex Dadgar dced530c7c kill tasks in alloc when one fails 2016-03-25 12:50:25 -07:00
Alex Dadgar 25dc8a0dcb Explain restart decision and display in alloc-status 2016-03-25 12:47:14 -07:00
Alex Dadgar 45dfae8f6f Operator specifiable blacklist for task's using certain users 2016-03-24 10:55:14 -07:00
Diptanu Choudhury 76343a3748 Merge pull request #972 from hashicorp/scripts
Moving consul service to executor
2016-03-24 00:12:45 -07:00
Diptanu Choudhury f6a932194f Removing references to old consul services and adding consul config to executor context 2016-03-23 12:19:19 -07:00
Alex Dadgar 782fa46b69 Show error when artifact validation fails in task runner 2016-03-22 16:09:41 -07:00
Alex Dadgar 0f73c3f402 Validate the artifact client side as well 2016-03-19 13:28:37 -07:00
Alex Dadgar 74a68c83f1 Test task runner downloading artifacts 2016-03-15 14:34:25 -07:00
Alex Dadgar ab44bc78a2 Get tests to pass 2016-03-15 13:28:57 -07:00
Alex Dadgar 9f878a16bf Download artifacts and remove old code for drivers 2016-03-15 13:28:57 -07:00
Alex Dadgar 144ccfb561 Killing a docker container that is dead is not an error 2016-03-02 16:27:01 -08:00
Alex Dadgar f8b047e088 Add Alloc ID/Name and Task Name to environment variables 2016-03-01 16:08:21 -08:00
Alex Dadgar 7fe8a4650f Acquire lock around handle 2016-02-29 10:45:08 -08:00
Alex Dadgar 61972c9ddc Refactor task runner to include driver starting into restart policy and add recoverable errors 2016-02-28 16:56:05 -08:00
Diptanu Choudhury e3d6c4a9dd Adding version information to snapshots 2016-02-24 19:06:30 -08:00
Alex Dadgar c08e3dbee8 Make updating alloc status async 2016-02-19 21:44:23 -08:00
Alex Dadgar e2a4c4ccc5 Client stores when it receives a task 2016-02-19 14:49:43 -08:00
Alex Dadgar 18d2d9c091 Killing a driver handle is retried with an exponential backoff 2016-02-16 21:00:49 -08:00
Alex Dadgar f6e0349d3b go vet 2016-02-12 16:08:58 -08:00
Alex Dadgar 4d7ed4f164 Strip as much copystructure as possible 2016-02-10 17:54:43 -08:00
Alex Dadgar 0c4c3fc4ee safe but slow 2016-02-10 13:44:53 -08:00
Alex Dadgar fdc7124032 Precise registration 2016-02-06 17:08:20 -08:00
Alex Dadgar c744e2f4f1 Update the consul service when the task/alloc changes 2016-02-06 17:08:20 -08:00
Alex Dadgar 41e1174f72 Client handles updates to KillTimeout and Restart Policy 2016-02-03 19:43:44 -08:00
Alex Dadgar b6f9e9c61c Move restart tracker creation into task runner 2016-02-03 16:16:48 -08:00
Alex Dadgar cf1e152f44 Clean interaction between alloc-runner and task-runner 2016-02-02 11:09:29 -08:00
Alex Dadgar a72d39bd04 Don't share task state with the alloc in the task runner 2016-02-01 17:47:53 -08:00
Alex Dadgar 3ba1c9b76b merge 2016-01-11 09:58:26 -08:00
Alex Dadgar 31c3e12957 merge 2015-12-18 12:17:13 -08:00
Diptanu Choudhury d8e51bb6b6 Moving the de-register once a task moves to DEAD state 2015-12-17 16:41:29 -08:00
Diptanu Choudhury 76486d71e2 Making the allocs hold service ids 2015-12-14 15:08:35 -08:00
Diptanu Choudhury 2c0822284b Tracking the tasks too 2015-11-24 17:26:30 -08:00
Diptanu Choudhury 135006699b Renamed consul client to service 2015-11-24 12:34:26 -08:00
Diptanu Choudhury a3d5b266a0 Registering Checks independently 2015-11-24 10:02:33 -08:00
Diptanu Choudhury b8c5268d88 Making the restart tracker aware of the exit codes 2015-11-23 10:56:38 -08:00
Diptanu Choudhury 4d2fe73dfb Not restarting if a task exited properly 2015-11-22 23:47:15 -08:00
Diptanu Choudhury 65bac7f4db Updating checks and services when allocs are refreshed 2015-11-18 17:33:29 -08:00
Diptanu Choudhury b8c2cc81f0 Defering calling the de-register from consul call when a service is not running 2015-11-18 02:37:34 -08:00
Diptanu Choudhury d6da6372cd Moving the logic to find port and host inside consul client 2015-11-18 01:18:29 -08:00
Diptanu Choudhury 404810043a Added the implementation of consul client 2015-11-18 00:50:45 -08:00
Alex Dadgar 11b43f8e1f Avoid calling destroy twice 2015-11-17 12:03:59 -08:00
Alex Dadgar ea0edd8c2f Change SetExitMessage from taking a string to an error 2015-11-16 15:14:21 -08:00
Alex Dadgar e76a613974 Use loop not recursion 2015-11-16 15:14:21 -08:00
Alex Dadgar b649039448 Fix the capacity 2015-11-16 15:14:21 -08:00
Alex Dadgar 82f51601db Track Task State in the client and capture Wait results 2015-11-16 15:14:21 -08:00
Diptanu Choudhury 3b4cb6dbc9 Saving state of the Task Runner while it's trying to update it 2015-11-12 15:53:42 -08:00
Alex Dadgar d3e2455459 Merge pull request #408 from hashicorp/f-client-restore
Client Restore State Fixes
2015-11-11 12:32:11 -08:00
Alex Dadgar 19d0c97da7 Client restores state properly 2015-11-09 15:55:31 -08:00
Diptanu Choudhury 0252b49c17 Updating snapshots of a TaskRunner when status of Task changes 2015-11-09 12:36:07 -08:00
Alex Dadgar edb43b27df Don't set the alloc status twice when not restarting 2015-11-06 15:26:01 -08:00
Diptanu Choudhury 3d5e02b3d7 Fixed some tests and refactored logic 2015-11-05 17:30:41 -08:00
Diptanu Choudhury fff38106ae Added some comments to code 2015-11-05 16:48:15 -08:00
Diptanu Choudhury a2a73b16d9 Added the client word to log lines 2015-11-05 16:39:57 -08:00
Diptanu Choudhury 44569d908f Passing restart tracker in the task runner 2015-11-05 16:38:19 -08:00
Diptanu Choudhury 86be2bf0be Cleaned up the logic to calculate restart duration 2015-11-05 15:16:29 -08:00