Commit graph

256 commits

Author SHA1 Message Date
Alex Dadgar 2be221d664 Don't retrieve Driver Stats if unsupported
This PR makes us only try to collect stats once if the Driver doesn't
support collecting stats.

Fixes https://github.com/hashicorp/nomad/issues/1986
2017-01-09 13:47:06 -08:00
Alex Dadgar 26e2c5bb74 Merge pull request #2164 from hashicorp/b-dispatch
Create Task directory structure in the Run method
2017-01-09 11:24:46 -08:00
Alex Dadgar 2a5fd85e3b Move to Run() 2017-01-08 13:55:12 -08:00
Alex Dadgar 2affef2972 Create task directory during Prestart() 2017-01-08 13:55:12 -08:00
Alex Dadgar 4ffd9a69e5 Send Driver events to servers immediately
This PR causes driver events to be sent to the server immediately rather
than waiting for Prestart() to finish.
2017-01-08 13:54:43 -08:00
Michael Schurter 3ea09ba16a Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Alex Dadgar 8d5f0fea69 Merge pull request #2128 from hashicorp/f-dispatch
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Michael Schurter ea92cd102a Append host env vars on every task env 2016-12-20 12:24:24 -08:00
Michael Schurter 2aa235f8f2 Rename InitializationMessage to DriverMessage 2016-12-20 11:51:09 -08:00
Alex Dadgar 159c819e08 Client writes payload to disk 2016-12-16 15:11:56 -08:00
Michael Schurter 770ed703d0 Add Driver.Prestart method
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.

Currently Prestart does two things:

* Any driver specific task environment building
* Download Docker images

This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar 960424f086 Merge pull request #1941 from hashicorp/b-complete-transistion
Task state "dead" is terminal
2016-11-04 17:16:10 -07:00
Alex Dadgar e6465e138b More precise marking of dead 2016-11-04 17:11:07 -07:00
Alex Dadgar 0fb7742c3c Task state "dead" is terminal 2016-11-04 16:57:24 -07:00
Alex Dadgar 8b7adb20e9 Fix tests 2016-11-04 15:10:18 -07:00
Alex Dadgar 4e8d39d674 Unique task 2016-11-04 14:53:37 -07:00
Alex Dadgar 4741a4b129 Create container much more robust 2016-11-04 14:39:56 -07:00
Alex Dadgar cd1791ed09 Download artifacts before templates 2016-10-31 11:29:26 -07:00
Alex Dadgar 6618f7a03d Fix passing of recoverable error from docker pull 2016-10-28 17:49:46 -07:00
Alex Dadgar fde7a24865 Consul-template fixes + PreviousAlloc in api 2016-10-28 15:50:35 -07:00
Alex Dadgar 4082732d3a Interpolate and then validate services 2016-10-25 14:27:49 -07:00
Alex Dadgar da8b05ba17 Fix merge 2016-10-24 17:04:10 -07:00
Alex Dadgar 03eba049ed Merge pull request #1848 from hashicorp/f-vault-error
Thread through whether DeriveToken error is recoverable or not
2016-10-24 15:01:18 -07:00
Alex Dadgar ede3a814ba Small fixes 2016-10-22 18:20:50 -07:00
Alex Dadgar 0070178741 Thread through whether DeriveToken error is recoverable or not 2016-10-22 18:08:30 -07:00
Alex Dadgar 46a7d1a0d7 Change how we mark tasks as failed and allow consul-template to fail tasks 2016-10-20 17:27:16 -07:00
Alex Dadgar b384bff053 Feedback 2016-10-18 15:01:04 -07:00
Alex Dadgar ba0b3963ef Comments 2016-10-18 11:36:04 -07:00
Alex Dadgar 4f8bfd7b18 Tests 2016-10-18 11:24:20 -07:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Alex Dadgar 53eeec9bc1 Merge pull request #1801 from hashicorp/f-signals
Consul-template signal change mode
2016-10-18 11:23:47 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Alex Dadgar bc35eaee21 Task runner sends signals 2016-10-10 15:09:00 -07:00
Alex Dadgar e2d49eb4a2 Comments 2016-10-06 15:21:59 -07:00
Alex Dadgar 68c5fe78f8 Tests 2016-10-06 15:17:34 -07:00
Alex Dadgar 8fb07bb083 Fix handling of restart in TaskEvents 2016-10-06 15:06:54 -07:00
Alex Dadgar 8eb7fa91cf Start of integration 2016-10-06 15:05:49 -07:00
Alex Dadgar 50efdb00e9 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Diptanu Choudhury f7a9b39e8c Ensuring that we are not emitting stats when handle is nil (#1723)
* Ensuring that we are not emitting stats when handle is nil

* Updated the changelog
2016-09-20 11:29:34 -07:00
Alex Dadgar ec152a6d12 Clean up vault client 2016-09-14 18:10:56 -07:00
Alex Dadgar 6702a29071 Vault token threaded 2016-09-14 13:30:01 -07:00
Michael Schurter 6cb6d9cdf1 Lock around saving state
Prevent interleaving state syncs as it could conceivably lead to
empty state files as per #1367
2016-09-02 16:07:06 -07:00
Vishal Nayak b6b73545ea Merge pull request #1606 from hashicorp/f-vault-client
VaultClient for Nomad client's interactions with Vault
2016-08-30 13:13:54 -04:00
Michael Schurter d31f373a5b Merge pull request #1653 from hashicorp/b-fix-artifact-retry
Don't fail other tasks when retrying artifact get
2016-08-26 09:53:39 -07:00
Michael Schurter 5ce26f82fe Don't fail other tasks when retrying artifact get
The artifact fetching may be retried and succeed, so don't set the task
as dead.

Fixes #1558
2016-08-25 13:16:41 -07:00
Ivo Verberk 9113244131 Don't duplicate TaskKilled event and check for TaskSiblingFailed. 2016-08-25 20:11:10 +02:00
vishalnayak 56e42cf03d Employ DeriveVaultToken API and flesh-up DeriveToken 2016-08-24 12:29:59 -04:00
Alex Dadgar 1da8566322 Merge pull request #1580 from hashicorp/f-disk-usage-monitoring
Monitor and enforce shared allocation directory disk usage
2016-08-23 09:49:53 -07:00
Diptanu Choudhury 4ca623bcfe blocking chained allocations until previous allocation hasn't terminated 2016-08-22 11:34:24 -05:00
Ivo Verberk 2a17895a83 Disk resource monitoring and enforcement 2016-08-18 07:59:03 +02:00
Diptanu Choudhury 28b3f511e0 Fixed some error messages 2016-08-10 15:17:32 -07:00
Kenjiro Nakayama 5c621b74e5 tiny: Return fmt.Errorf instead of duplicated error messages 2016-08-09 08:57:26 +09:00
Diptanu Choudhury 70d2f8ef1d Merge pull request #1534 from nak3/fix-intask_runner
tiny: print task name and error message for SaveState error
2016-08-08 13:37:25 -04:00
Kenjiro Nakayama e7863ea8ee tiny: print task name and error message for the SaveState error in task_runner 2016-08-07 13:33:58 +09:00
Kenjiro Nakayama 60b58eed84 Update GetArtifact by removing unused logger 2016-08-06 23:37:32 +09:00
Diptanu Choudhury 41b540fbc8 Allow operators to opt into publishing node and alloc metrics 2016-08-01 19:52:20 -07:00
Alex Dadgar 90748cedad Add killing event and mark task as not running when killed 2016-07-21 15:49:54 -07:00
Alex Dadgar c35b1be845 Set running when restoring 2016-06-28 13:47:59 -07:00
Diptanu Choudhury 88ac1b33a4 Not emitting per-pid stats and added the total ticks consumed by a Task 2016-06-20 17:30:25 -07:00
Alex Dadgar fe588a2469 Guard against restoring a nil task in task_runner 2016-06-16 11:55:40 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Diptanu Choudhury fd60cfd585 Emitting client resource usage metrics as guages instead of k/v pairs 2016-06-11 22:17:32 +02:00
Alex Dadgar b7e3a45fef fix channel being nil on restore 2016-06-07 15:03:08 -07:00
Diptanu Choudhury c21d606ebb Getting inodes used percent back 2016-06-06 16:10:34 -07:00
Alex Dadgar ba1a92eb8c Handle errors during stats collection 2016-06-03 14:23:18 -07:00
Diptanu Choudhury 667b478f3f Merge pull request #1226 from hashicorp/f-push-stats
Push Resource Usage stats to remote sinks
2016-06-02 23:14:59 +02:00
Diptanu Choudhury 35e31c1b81 Enqueing metrics only if they are not nil 2016-06-02 17:14:15 -04:00
Diptanu Choudhury 7efde782fa Sending metrics for tasks as well 2016-06-01 16:42:16 +02:00
Alex Dadgar 4e15611339 fix wait result being nil and some panics in the cli 2016-05-31 23:09:05 +00:00
Diptanu Choudhury f95b1d00c3 Renamed error message in alloc endpoint 2016-05-28 20:03:52 -07:00
Diptanu Choudhury c0dc6cfbf2 Changing the api of the stats endpoints 2016-05-28 19:59:20 -07:00
Diptanu Choudhury fa9b0dd7e8 Implemented the resource usage ts since a time 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 77ac2dd624 Initializing the ring buffer with no cells 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 0b0d0764e4 Changed signature of Allocation Stats Reporter 2016-05-28 19:59:20 -07:00
Diptanu Choudhury c46400597e Making the stats collection interval and number of data points to keep in memory configurable 2016-05-28 19:59:20 -07:00
Diptanu Choudhury d2021e2953 Changed the signature of ResourceUsageTS 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 05c221186b Added disk usage to node status 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 84cd943c48 Stopping stats collection of tasks which has been destroyed 2016-05-28 19:59:20 -07:00
Diptanu Choudhury b9feae89ce Making the conversion to Stats simpler 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 91d2cf319e Added some documentation 2016-05-28 19:42:34 -07:00
Diptanu Choudhury f3d0aecafe Reporting time series of stats 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 0fb0e0237f Added a client API to display resource usage of an allocation 2016-05-28 19:42:34 -07:00
Alex Dadgar 831909dcce pass a copy of the task to the task environment 2016-05-05 22:01:17 -07:00
Alex Dadgar 483fa975d7 createDriver expects task environment 2016-04-13 14:24:08 -07:00
Alex Dadgar dc63c24e59 interpet the artifact source 2016-04-11 18:46:16 -07:00
Alex Dadgar 23c1173269 ArtifactDownloaded in task runner state 2016-03-28 17:24:10 -07:00
Alex Dadgar f64f03f87e Test task failure killing TG and fix setting the task as received on a restore 2016-03-25 12:51:40 -07:00
Alex Dadgar dced530c7c kill tasks in alloc when one fails 2016-03-25 12:50:25 -07:00
Alex Dadgar 25dc8a0dcb Explain restart decision and display in alloc-status 2016-03-25 12:47:14 -07:00
Alex Dadgar 45dfae8f6f Operator specifiable blacklist for task's using certain users 2016-03-24 10:55:14 -07:00
Diptanu Choudhury 76343a3748 Merge pull request #972 from hashicorp/scripts
Moving consul service to executor
2016-03-24 00:12:45 -07:00
Diptanu Choudhury f6a932194f Removing references to old consul services and adding consul config to executor context 2016-03-23 12:19:19 -07:00
Alex Dadgar 782fa46b69 Show error when artifact validation fails in task runner 2016-03-22 16:09:41 -07:00
Alex Dadgar 0f73c3f402 Validate the artifact client side as well 2016-03-19 13:28:37 -07:00
Alex Dadgar 74a68c83f1 Test task runner downloading artifacts 2016-03-15 14:34:25 -07:00
Alex Dadgar ab44bc78a2 Get tests to pass 2016-03-15 13:28:57 -07:00
Alex Dadgar 9f878a16bf Download artifacts and remove old code for drivers 2016-03-15 13:28:57 -07:00
Alex Dadgar 144ccfb561 Killing a docker container that is dead is not an error 2016-03-02 16:27:01 -08:00
Alex Dadgar f8b047e088 Add Alloc ID/Name and Task Name to environment variables 2016-03-01 16:08:21 -08:00
Alex Dadgar 7fe8a4650f Acquire lock around handle 2016-02-29 10:45:08 -08:00
Alex Dadgar 61972c9ddc Refactor task runner to include driver starting into restart policy and add recoverable errors 2016-02-28 16:56:05 -08:00
Diptanu Choudhury e3d6c4a9dd Adding version information to snapshots 2016-02-24 19:06:30 -08:00
Alex Dadgar c08e3dbee8 Make updating alloc status async 2016-02-19 21:44:23 -08:00
Alex Dadgar e2a4c4ccc5 Client stores when it receives a task 2016-02-19 14:49:43 -08:00
Alex Dadgar 18d2d9c091 Killing a driver handle is retried with an exponential backoff 2016-02-16 21:00:49 -08:00
Alex Dadgar f6e0349d3b go vet 2016-02-12 16:08:58 -08:00
Alex Dadgar 4d7ed4f164 Strip as much copystructure as possible 2016-02-10 17:54:43 -08:00
Alex Dadgar 0c4c3fc4ee safe but slow 2016-02-10 13:44:53 -08:00
Alex Dadgar fdc7124032 Precise registration 2016-02-06 17:08:20 -08:00
Alex Dadgar c744e2f4f1 Update the consul service when the task/alloc changes 2016-02-06 17:08:20 -08:00
Alex Dadgar 41e1174f72 Client handles updates to KillTimeout and Restart Policy 2016-02-03 19:43:44 -08:00
Alex Dadgar b6f9e9c61c Move restart tracker creation into task runner 2016-02-03 16:16:48 -08:00
Alex Dadgar cf1e152f44 Clean interaction between alloc-runner and task-runner 2016-02-02 11:09:29 -08:00
Alex Dadgar a72d39bd04 Don't share task state with the alloc in the task runner 2016-02-01 17:47:53 -08:00
Alex Dadgar 3ba1c9b76b merge 2016-01-11 09:58:26 -08:00
Alex Dadgar 31c3e12957 merge 2015-12-18 12:17:13 -08:00
Diptanu Choudhury d8e51bb6b6 Moving the de-register once a task moves to DEAD state 2015-12-17 16:41:29 -08:00
Diptanu Choudhury 76486d71e2 Making the allocs hold service ids 2015-12-14 15:08:35 -08:00
Diptanu Choudhury 2c0822284b Tracking the tasks too 2015-11-24 17:26:30 -08:00
Diptanu Choudhury 135006699b Renamed consul client to service 2015-11-24 12:34:26 -08:00
Diptanu Choudhury a3d5b266a0 Registering Checks independently 2015-11-24 10:02:33 -08:00
Diptanu Choudhury b8c5268d88 Making the restart tracker aware of the exit codes 2015-11-23 10:56:38 -08:00
Diptanu Choudhury 4d2fe73dfb Not restarting if a task exited properly 2015-11-22 23:47:15 -08:00
Diptanu Choudhury 65bac7f4db Updating checks and services when allocs are refreshed 2015-11-18 17:33:29 -08:00
Diptanu Choudhury b8c2cc81f0 Defering calling the de-register from consul call when a service is not running 2015-11-18 02:37:34 -08:00
Diptanu Choudhury d6da6372cd Moving the logic to find port and host inside consul client 2015-11-18 01:18:29 -08:00
Diptanu Choudhury 404810043a Added the implementation of consul client 2015-11-18 00:50:45 -08:00
Alex Dadgar 11b43f8e1f Avoid calling destroy twice 2015-11-17 12:03:59 -08:00
Alex Dadgar ea0edd8c2f Change SetExitMessage from taking a string to an error 2015-11-16 15:14:21 -08:00
Alex Dadgar e76a613974 Use loop not recursion 2015-11-16 15:14:21 -08:00
Alex Dadgar b649039448 Fix the capacity 2015-11-16 15:14:21 -08:00
Alex Dadgar 82f51601db Track Task State in the client and capture Wait results 2015-11-16 15:14:21 -08:00
Diptanu Choudhury 3b4cb6dbc9 Saving state of the Task Runner while it's trying to update it 2015-11-12 15:53:42 -08:00
Alex Dadgar d3e2455459 Merge pull request #408 from hashicorp/f-client-restore
Client Restore State Fixes
2015-11-11 12:32:11 -08:00
Alex Dadgar 19d0c97da7 Client restores state properly 2015-11-09 15:55:31 -08:00
Diptanu Choudhury 0252b49c17 Updating snapshots of a TaskRunner when status of Task changes 2015-11-09 12:36:07 -08:00
Alex Dadgar edb43b27df Don't set the alloc status twice when not restarting 2015-11-06 15:26:01 -08:00
Diptanu Choudhury 3d5e02b3d7 Fixed some tests and refactored logic 2015-11-05 17:30:41 -08:00
Diptanu Choudhury fff38106ae Added some comments to code 2015-11-05 16:48:15 -08:00
Diptanu Choudhury a2a73b16d9 Added the client word to log lines 2015-11-05 16:39:57 -08:00
Diptanu Choudhury 44569d908f Passing restart tracker in the task runner 2015-11-05 16:38:19 -08:00
Diptanu Choudhury 86be2bf0be Cleaned up the logic to calculate restart duration 2015-11-05 15:16:29 -08:00
Diptanu Choudhury 3659335de1 Fixed the log statements 2015-11-05 11:13:05 -08:00
Diptanu Choudhury ea854a9220 Added the logic to restart Tasks if possible 2015-11-05 11:13:04 -08:00
Diptanu Choudhury b64ed61022 Setting the restart policy to AllocRunner and Task Runners 2015-11-05 11:13:04 -08:00
Alex Dadgar 9d3e3c0704 AllocDirBuilder that creates the alloc directory structure 2015-09-25 16:46:41 -07:00
Chris Bednarski a695e311dc Replace logging and config with DriverContext, which allows us to expand the dependency injection without changing the interface 2015-09-09 18:06:23 -07:00
Chris Bednarski 4eb8fc5188 Added config to drivers; needed for docker driver to get the socket endpoint 2015-09-08 12:43:02 -07:00
Armon Dadgar db33f76a61 client: remove TaskRunner dependence on AllocRunner 2015-08-29 19:42:35 -07:00