Commit graph

236 commits

Author SHA1 Message Date
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar 6bee23047a Fix variable capture and add tests
This PR fixes token revocation and adds tests to make sure it is
working. The 0.5.6 RC1's token revocation does not work becasue the
token's value is captured at the instantiation of the deferred
stoprenewal statement rather than its exectution.
2017-03-29 13:17:50 -07:00
Michael Schurter ae3810052d Merge pull request #2482 from hashicorp/f-2289-better-artifact-err
Improve artifact download error message
2017-03-28 12:48:22 -07:00
Alex Dadgar 1e95ae7e6a Merge pull request #2495 from hashicorp/b-vault-stop-renew
Stop Vault token renew on task exit
2017-03-28 11:14:18 -07:00
Alex Dadgar d1645f47b1 Stop Vault token renew on task exit
This PR fixes an oversight in which the client would attempt to renew a
token even after the task exits.

Fixes https://github.com/hashicorp/nomad/issues/2475
2017-03-28 10:53:15 -07:00
Michael Schurter 507862ade3 Add WrapRecoverable helper 2017-03-27 15:37:15 -07:00
Alex Dadgar 4ecebe7d8c Proper reference counting through task restarts
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Michael Schurter 0e6c564406 Improve artifact download error message
Fixes #2289

Unfortunately took more RecoverableError hijinx than I would have liked.
There might be a better way.
2017-03-24 15:26:05 -07:00
Alex Dadgar 7e6c08191d Fix vet 2017-03-24 12:24:47 -07:00
Alex Dadgar c3551c761e Fix panic when restarting non-running task
This PR fixes an issue that is hit when running templates with restart
mode in which the client could panic when the handle is not running.

Fixes https://github.com/hashicorp/nomad/issues/2479
2017-03-24 12:04:22 -07:00
Alex Dadgar b5e53652aa Handle git ssh artifacts
This PR adds handling for downloading git artifacts using ssh with the
format git@github.com:hashicorp/go-getter.git

Fixes https://github.com/hashicorp/nomad/issues/2430
2017-03-11 15:12:41 -08:00
Alex Dadgar 3fb285f7d3 Fix TestAllocRunner_SaveRestoreState 2017-03-02 20:45:46 -08:00
Alex Dadgar 5cd43e837a Merge branch 'master' into b-remount 2017-03-02 19:23:13 -08:00
Michael Schurter e03f64ea6a Safely ensure {dev,proc,alloc} are mounted
If they're unmounted by a reboot they'll be properly remounted.
2017-03-02 13:21:34 -08:00
Alex Dadgar af4e400b36 Update go-getter and add support for git and hg
Fixes https://github.com/hashicorp/nomad/issues/2042
2017-03-01 14:46:04 -08:00
Alex Dadgar fa853c9696 Fix two issues during client restore state
This PR fixes two issues:

1) A close of a nil stopCollection channel when restoring and prestart
fails. The failure will cause the killCh to be triggered which will
close collection before it has been initialized.

2) Fixes a deadlock in which the handleWaitCh is never triggered since
it is not initialized when there is an error in prestart and the killCh
is triggered.

Both fixes are by maintaining the loop invariant that the two channels
are valid after there is a handle.
2017-02-28 10:29:12 -08:00
Alex Dadgar cef7882827 Fix tests and docs 2017-02-22 18:28:07 -08:00
Diptanu Choudhury 98921575af Adding a task event for setup 2017-02-22 18:28:07 -08:00
Michael Schurter 51e4fe9915 Use getters & setters with nil guards 2017-02-09 17:44:58 -08:00
Michael Schurter 37e7e7a3e5 Fix upgrade path for created resources
This *might* be a fix for #2295 -- I've been unable to reproduce the
bug. However, this guard seems wise regardless. I should never be
overwriting an intialized created resources with a nil.
2017-02-09 13:54:33 -08:00
Michael Schurter aef3c2e380 Handle createdResourcs=nil
Combined with b522c472fdf this fixes #2256

Without these two commits in place upgrades to 0.5.3 panics.
2017-01-31 10:51:32 -08:00
Alex Dadgar 8196a58c4c Rename dispatch_input to dispatch_payload 2017-01-25 21:27:44 -08:00
Michael Schurter 054ee8df59 Fix index we get allocs by 2017-01-20 16:30:40 -08:00
Michael Schurter c5f222e4a6 Update created resources before exiting cleanup 2017-01-19 16:48:23 -08:00
Michael Schurter a93d43a9cf Exit early when cleanup succeeds 2017-01-19 15:07:01 -08:00
Michael Schurter 85f68aa00c Fix incorrect lock usage 2017-01-19 11:39:18 -08:00
Michael Schurter a3a3656dbb Switch to use recoverable errors from Cleanup
TaskRunner handles retrying but Cleanup handles all of CreatedResources.
2017-01-13 16:46:08 -08:00
Michael Schurter dc68aa1a5a Return errors from cleanup and let TaskRunner retry 2017-01-12 17:21:54 -08:00
Michael Schurter 4d081490e6 Add Cleanup method to Driver interface
Cleanup can be used for cleaning up resources created by drivers to run
a task. Initially the Docker driver is the only user (to remove
downloaded images).
2017-01-11 17:23:33 -08:00
Alex Dadgar 4127a3ea9d Merge pull request #2173 from hashicorp/b-stats
Don't retrieve Driver Stats if unsupported
2017-01-10 13:32:03 -08:00
Alex Dadgar 2be221d664 Don't retrieve Driver Stats if unsupported
This PR makes us only try to collect stats once if the Driver doesn't
support collecting stats.

Fixes https://github.com/hashicorp/nomad/issues/1986
2017-01-09 13:47:06 -08:00
Alex Dadgar 26e2c5bb74 Merge pull request #2164 from hashicorp/b-dispatch
Create Task directory structure in the Run method
2017-01-09 11:24:46 -08:00
Alex Dadgar 2a5fd85e3b Move to Run() 2017-01-08 13:55:12 -08:00
Alex Dadgar 2affef2972 Create task directory during Prestart() 2017-01-08 13:55:12 -08:00
Alex Dadgar 4ffd9a69e5 Send Driver events to servers immediately
This PR causes driver events to be sent to the server immediately rather
than waiting for Prestart() to finish.
2017-01-08 13:54:43 -08:00
Michael Schurter 3ea09ba16a Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Alex Dadgar 8d5f0fea69 Merge pull request #2128 from hashicorp/f-dispatch
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Michael Schurter ea92cd102a Append host env vars on every task env 2016-12-20 12:24:24 -08:00
Michael Schurter 2aa235f8f2 Rename InitializationMessage to DriverMessage 2016-12-20 11:51:09 -08:00
Alex Dadgar 159c819e08 Client writes payload to disk 2016-12-16 15:11:56 -08:00
Michael Schurter 770ed703d0 Add Driver.Prestart method
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.

Currently Prestart does two things:

* Any driver specific task environment building
* Download Docker images

This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar 960424f086 Merge pull request #1941 from hashicorp/b-complete-transistion
Task state "dead" is terminal
2016-11-04 17:16:10 -07:00
Alex Dadgar e6465e138b More precise marking of dead 2016-11-04 17:11:07 -07:00
Alex Dadgar 0fb7742c3c Task state "dead" is terminal 2016-11-04 16:57:24 -07:00
Alex Dadgar 8b7adb20e9 Fix tests 2016-11-04 15:10:18 -07:00
Alex Dadgar 4e8d39d674 Unique task 2016-11-04 14:53:37 -07:00
Alex Dadgar 4741a4b129 Create container much more robust 2016-11-04 14:39:56 -07:00
Alex Dadgar cd1791ed09 Download artifacts before templates 2016-10-31 11:29:26 -07:00
Alex Dadgar 6618f7a03d Fix passing of recoverable error from docker pull 2016-10-28 17:49:46 -07:00
Alex Dadgar fde7a24865 Consul-template fixes + PreviousAlloc in api 2016-10-28 15:50:35 -07:00
Alex Dadgar 4082732d3a Interpolate and then validate services 2016-10-25 14:27:49 -07:00
Alex Dadgar da8b05ba17 Fix merge 2016-10-24 17:04:10 -07:00
Alex Dadgar 03eba049ed Merge pull request #1848 from hashicorp/f-vault-error
Thread through whether DeriveToken error is recoverable or not
2016-10-24 15:01:18 -07:00
Alex Dadgar ede3a814ba Small fixes 2016-10-22 18:20:50 -07:00
Alex Dadgar 0070178741 Thread through whether DeriveToken error is recoverable or not 2016-10-22 18:08:30 -07:00
Alex Dadgar 46a7d1a0d7 Change how we mark tasks as failed and allow consul-template to fail tasks 2016-10-20 17:27:16 -07:00
Alex Dadgar b384bff053 Feedback 2016-10-18 15:01:04 -07:00
Alex Dadgar ba0b3963ef Comments 2016-10-18 11:36:04 -07:00
Alex Dadgar 4f8bfd7b18 Tests 2016-10-18 11:24:20 -07:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Alex Dadgar 53eeec9bc1 Merge pull request #1801 from hashicorp/f-signals
Consul-template signal change mode
2016-10-18 11:23:47 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Alex Dadgar bc35eaee21 Task runner sends signals 2016-10-10 15:09:00 -07:00
Alex Dadgar e2d49eb4a2 Comments 2016-10-06 15:21:59 -07:00
Alex Dadgar 68c5fe78f8 Tests 2016-10-06 15:17:34 -07:00
Alex Dadgar 8fb07bb083 Fix handling of restart in TaskEvents 2016-10-06 15:06:54 -07:00
Alex Dadgar 8eb7fa91cf Start of integration 2016-10-06 15:05:49 -07:00
Alex Dadgar 50efdb00e9 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Diptanu Choudhury f7a9b39e8c Ensuring that we are not emitting stats when handle is nil (#1723)
* Ensuring that we are not emitting stats when handle is nil

* Updated the changelog
2016-09-20 11:29:34 -07:00
Alex Dadgar ec152a6d12 Clean up vault client 2016-09-14 18:10:56 -07:00
Alex Dadgar 6702a29071 Vault token threaded 2016-09-14 13:30:01 -07:00
Michael Schurter 6cb6d9cdf1 Lock around saving state
Prevent interleaving state syncs as it could conceivably lead to
empty state files as per #1367
2016-09-02 16:07:06 -07:00
Vishal Nayak b6b73545ea Merge pull request #1606 from hashicorp/f-vault-client
VaultClient for Nomad client's interactions with Vault
2016-08-30 13:13:54 -04:00
Michael Schurter d31f373a5b Merge pull request #1653 from hashicorp/b-fix-artifact-retry
Don't fail other tasks when retrying artifact get
2016-08-26 09:53:39 -07:00
Michael Schurter 5ce26f82fe Don't fail other tasks when retrying artifact get
The artifact fetching may be retried and succeed, so don't set the task
as dead.

Fixes #1558
2016-08-25 13:16:41 -07:00
Ivo Verberk 9113244131 Don't duplicate TaskKilled event and check for TaskSiblingFailed. 2016-08-25 20:11:10 +02:00
vishalnayak 56e42cf03d Employ DeriveVaultToken API and flesh-up DeriveToken 2016-08-24 12:29:59 -04:00
Alex Dadgar 1da8566322 Merge pull request #1580 from hashicorp/f-disk-usage-monitoring
Monitor and enforce shared allocation directory disk usage
2016-08-23 09:49:53 -07:00
Diptanu Choudhury 4ca623bcfe blocking chained allocations until previous allocation hasn't terminated 2016-08-22 11:34:24 -05:00
Ivo Verberk 2a17895a83 Disk resource monitoring and enforcement 2016-08-18 07:59:03 +02:00
Diptanu Choudhury 28b3f511e0 Fixed some error messages 2016-08-10 15:17:32 -07:00
Kenjiro Nakayama 5c621b74e5 tiny: Return fmt.Errorf instead of duplicated error messages 2016-08-09 08:57:26 +09:00
Diptanu Choudhury 70d2f8ef1d Merge pull request #1534 from nak3/fix-intask_runner
tiny: print task name and error message for SaveState error
2016-08-08 13:37:25 -04:00
Kenjiro Nakayama e7863ea8ee tiny: print task name and error message for the SaveState error in task_runner 2016-08-07 13:33:58 +09:00
Kenjiro Nakayama 60b58eed84 Update GetArtifact by removing unused logger 2016-08-06 23:37:32 +09:00
Diptanu Choudhury 41b540fbc8 Allow operators to opt into publishing node and alloc metrics 2016-08-01 19:52:20 -07:00
Alex Dadgar 90748cedad Add killing event and mark task as not running when killed 2016-07-21 15:49:54 -07:00
Alex Dadgar c35b1be845 Set running when restoring 2016-06-28 13:47:59 -07:00
Diptanu Choudhury 88ac1b33a4 Not emitting per-pid stats and added the total ticks consumed by a Task 2016-06-20 17:30:25 -07:00
Alex Dadgar fe588a2469 Guard against restoring a nil task in task_runner 2016-06-16 11:55:40 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Diptanu Choudhury fd60cfd585 Emitting client resource usage metrics as guages instead of k/v pairs 2016-06-11 22:17:32 +02:00
Alex Dadgar b7e3a45fef fix channel being nil on restore 2016-06-07 15:03:08 -07:00
Diptanu Choudhury c21d606ebb Getting inodes used percent back 2016-06-06 16:10:34 -07:00
Alex Dadgar ba1a92eb8c Handle errors during stats collection 2016-06-03 14:23:18 -07:00
Diptanu Choudhury 667b478f3f Merge pull request #1226 from hashicorp/f-push-stats
Push Resource Usage stats to remote sinks
2016-06-02 23:14:59 +02:00
Diptanu Choudhury 35e31c1b81 Enqueing metrics only if they are not nil 2016-06-02 17:14:15 -04:00
Diptanu Choudhury 7efde782fa Sending metrics for tasks as well 2016-06-01 16:42:16 +02:00
Alex Dadgar 4e15611339 fix wait result being nil and some panics in the cli 2016-05-31 23:09:05 +00:00