Commit Graph

190 Commits

Author SHA1 Message Date
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar 001058227e watcher per alloc 2017-07-07 12:07:08 -07:00
Alex Dadgar ecee5e370e initial watcher 2017-07-07 12:07:08 -07:00
Michael Schurter 2d741c770b Merge pull request #2732 from hashicorp/b-persist-alloc-updates
Persist Alloc when EvalID changes
2017-07-03 14:46:43 -07:00
Michael Schurter 5ec52ec24a Destroy task group leader first
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.

This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter cff8546035 Fix spelling & re-add immutable state struct 2017-06-23 13:01:39 -07:00
Michael Schurter d359d3b554 Rename immutable -> alloc
meh; naming is hard
2017-06-23 10:58:36 -07:00
Michael Schurter af2fc0f1bc Persist Alloc when EvalID changes 2017-06-22 17:33:12 -07:00
Alex Dadgar ee8dd84965 Fix nil job on allocation
The way the copying was happening on the alloc_runner was by temporarily
setting the alloc.Job to nil, copying and then restoring it. This
created an issue in which when the alloc was shared (which it is in
server/client mode and between alloc_runner/task_runner) there were race
conditions that could create a panic.

Fixes https://github.com/hashicorp/nomad/issues/2605
2017-05-17 14:07:06 -04:00
Alex Dadgar 3cd7e06fba Fix test 2017-05-09 11:35:48 -07:00
Alex Dadgar ba70cc4f01 Merge branch 'master' into f-bolt-db 2017-05-09 11:11:55 -07:00
Alex Dadgar 843bc26e5d Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar 730e49a598 Helpful comment 2017-05-03 11:27:33 -07:00
Alex Dadgar 1d8444bc1e Fix tests 2017-05-03 11:15:30 -07:00
Alex Dadgar e00f9c9413 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar ec101b4760 Revert "metrics"
This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.
2017-05-02 09:28:11 -07:00
Alex Dadgar e010fdf8c0 metrics 2017-05-01 14:51:27 -07:00
Alex Dadgar d779defe65 Use batching 2017-05-01 14:50:34 -07:00
Alex Dadgar b94f855326 boltDB database for client state 2017-05-01 14:50:34 -07:00
Alex Dadgar bddedd7aba Don't deepcopy job when retrieving copy of Alloc
This PR removes deepcopying of the job attached to the allocation in the
alloc runner. This operation is called very often so removing reflect
from the code path and the potentially large number of mallocs need to
create a job reduced memory and cpu pressure.
2017-05-01 14:50:34 -07:00
Michael Schurter 095d2ee340 Switch java/exec to use Exec in Executor 2017-04-21 16:25:49 -07:00
Michael Schurter a305b68159 Restart tasks on upgrade with script checks and old executors 2017-04-21 16:25:49 -07:00
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar 61f4a2dac6 Sync allocation state before waiting for a destroy
This change ensures that the client syncs allocation state with the
servers before entering its wait loop for the allocation to be
destroyed.

Fixes https://github.com/hashicorp/nomad/issues/2563
2017-04-14 13:09:54 -07:00
Alex Dadgar c52000f792 FinishedAt only records when the task has actually started 2017-03-31 17:06:05 -07:00
Alex Dadgar 81b78f77e1 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar 8238a8601e Address comment 2017-03-09 21:05:34 -08:00
Alex Dadgar 9011a7984c Add metrics to show allocations on the client
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal

Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar 5be806a3df Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar 238b4bcafd Add Leader support to client 2017-02-10 17:55:19 -08:00
Michael Schurter acd11f678d Add COMPAT comment 2017-01-06 11:39:17 -08:00
Michael Schurter 86fcf96f72 Put a logger in AllocDir/TaskDir 2017-01-05 16:31:56 -08:00
Michael Schurter 5a6bd19eb7 Fix upgrade path for #2132
AllocRunner's state dropped the Context struct which needs to be
converted to the new AllocDir+TaskDir structs in RestoreState.

TaskRunner added a TaskDirBuilt flag, but it's safe to just let that
default to `false` and rebuild all task dirs once on upgrade.
2017-01-05 16:31:55 -08:00
Michael Schurter 774afd8800 Fail fast on taskdir errors 2017-01-05 16:31:55 -08:00
Michael Schurter 3ea09ba16a Move chroot building into TaskRunner
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Bastiaan Bakker 2c864172eb use snap.Alloc.TaskStates only after confirming snap.Alloc is not nil 2016-11-07 22:35:00 +01:00
Alex Dadgar 0fb7742c3c Task state "dead" is terminal 2016-11-04 16:57:24 -07:00
Alex Dadgar e85d0ebace Merge pull request #1840 from hashicorp/f-kill-fail
Change how we mark tasks as failed and allow consul-template to fail tasks
2016-10-24 13:40:52 -07:00
Michael Schurter 285e80ac0f Remove disk usage enforcement
Many thanks to @iverberk for the original PR (#1609), but we ended up
not wanting to ship this implementation with 0.5.

We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
2016-10-21 13:55:51 -07:00
Alex Dadgar 46a7d1a0d7 Change how we mark tasks as failed and allow consul-template to fail tasks 2016-10-20 17:27:16 -07:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Diptanu Choudhury d50c395421 Getting snapshot of allocation from remote node (#1741)
* Added the alloc dir move

* Moving allocdirs when starting allocations

* Added the migrate flag to ephemeral disk

* Stopping migration if the allocation doesn't need migration any more

* Added the GetAllocDir method

* refactored code

* Added a test for alloc runner

* Incorporated review comments
2016-10-03 09:59:57 -07:00
Alex Dadgar 50efdb00e9 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Alex Dadgar 83905075e5 Fix comment 2016-09-17 11:31:17 -07:00
Alex Dadgar 0f40bd41a3 Handle recovery failure 2016-09-15 12:50:44 -07:00
Alex Dadgar 688e616200 Fix token renewal 2016-09-15 11:20:51 -07:00
Alex Dadgar ec152a6d12 Clean up vault client 2016-09-14 18:10:56 -07:00
Alex Dadgar 6702a29071 Vault token threaded 2016-09-14 13:30:01 -07:00
Michael Schurter cd8606b9e3 Revert "A nil context isn't an error"
This reverts commit fe9fe4c26259c1ad3bd7e94bd711418aaf819b20.
2016-09-12 12:56:12 -07:00
Michael Schurter 8a57913a44 A nil context isn't an error 2016-09-02 16:24:53 -07:00
Michael Schurter f601361d58 Don't serialize task states twice in state files 2016-09-02 16:07:06 -07:00
Michael Schurter 6cb6d9cdf1 Lock around saving state
Prevent interleaving state syncs as it could conceivably lead to
empty state files as per #1367
2016-09-02 16:07:06 -07:00
Michael Schurter e7dd443447 Add sanity check to SaveState
Also just reuse the task states snapshot taken by `Alloc()` instead of
doing a redundant copy.
2016-09-02 16:07:06 -07:00
Alex Dadgar 2c8dd8bbd3 Revert "Introduce a Secret/ directory" 2016-09-01 17:23:15 -07:00
Alex Dadgar 5d3b47e648 Address comments and reserve 2016-08-31 18:11:02 -07:00
Alex Dadgar d59e14eed4 Interface + tests 2016-08-30 21:40:32 -07:00
Alex Dadgar 14b7126511 Secret dir, hello world 2016-08-29 15:41:52 -07:00
Ivo Verberk 2a17895a83 Disk resource monitoring and enforcement 2016-08-18 07:59:03 +02:00
Diptanu Choudhury 28b3f511e0 Fixed some error messages 2016-08-10 15:17:32 -07:00
Kenjiro Nakayama 6a810e6f1e Update after review 2016-08-09 08:57:26 +09:00
Kenjiro Nakayama 5c621b74e5 tiny: Return fmt.Errorf instead of duplicated error messages 2016-08-09 08:57:26 +09:00
Alex Dadgar 898435d372 Retrieve task runners in helper 2016-07-21 13:41:01 -07:00
Alex Dadgar 7b83503596 finer grain locking 2016-06-20 10:19:06 -07:00
Alex Dadgar 744270590b Guard against bad restore 2016-06-17 14:58:53 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Diptanu Choudhury a64062d6a6 Fixed the compilation on linux 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 0b0d0764e4 Changed signature of Allocation Stats Reporter 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 666b419dba Acquiring locks before iterating allocations and tasks 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 91d2cf319e Added some documentation 2016-05-28 19:42:34 -07:00
Diptanu Choudhury f3d0aecafe Reporting time series of stats 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 0fb0e0237f Added a client API to display resource usage of an allocation 2016-05-28 19:42:34 -07:00
Sean Chittenden dc28ab0cb5
Speling police 2016-05-15 09:41:34 -07:00
Alex Dadgar f64f03f87e Test task failure killing TG and fix setting the task as received on a restore 2016-03-25 12:51:40 -07:00
Alex Dadgar dced530c7c kill tasks in alloc when one fails 2016-03-25 12:50:25 -07:00
Alex Dadgar b80e61a66c Merge pull request #975 from hashicorp/f-rename-complete-alloc
Successful allocations are marked as complete instead of dead
2016-03-25 10:35:11 -07:00
Alex Dadgar 94522e7bed Successful allocations are marked as complete instead of dead 2016-03-23 18:08:19 -07:00
Diptanu Choudhury f6a932194f Removing references to old consul services and adding consul config to executor context 2016-03-23 12:19:19 -07:00
Alex Dadgar ad92e50a24 Avoid serializes Allocation.Resources 2016-03-01 14:09:25 -08:00
Alex Dadgar 61972c9ddc Refactor task runner to include driver starting into restart policy and add recoverable errors 2016-02-28 16:56:05 -08:00
Diptanu Choudhury e3d6c4a9dd Adding version information to snapshots 2016-02-24 19:06:30 -08:00
Alex Dadgar 51bacf674e address feedback 2016-02-21 21:32:32 -08:00
Alex Dadgar 281e2ca198 Batch client allocation updates to the server 2016-02-21 21:15:02 -08:00
Alex Dadgar 13e5597ca2 Reduce alloc lock contention in client 2016-02-19 19:51:55 -08:00
Alex Dadgar 99d2c173ff import 2016-02-19 16:31:04 -08:00
Alex Dadgar 2706aa2100 Better comment 2016-02-19 16:02:48 -08:00
Alex Dadgar d1011c9668 Fixes 2016-02-19 15:49:32 -08:00
Alex Dadgar e2a4c4ccc5 Client stores when it receives a task 2016-02-19 14:49:43 -08:00
Alex Dadgar 96fd272422 Increase Alloc channel buffers 2016-02-18 20:43:48 -08:00
Alex Dadgar f3d5598830 Unlock in error path 2016-02-11 08:38:16 -08:00
Alex Dadgar 4d7ed4f164 Strip as much copystructure as possible 2016-02-10 17:54:43 -08:00
Alex Dadgar 0c4c3fc4ee safe but slow 2016-02-10 13:44:53 -08:00
Alex Dadgar c744e2f4f1 Update the consul service when the task/alloc changes 2016-02-06 17:08:20 -08:00
Alex Dadgar e8067029cc Small fixes 2016-02-04 14:19:27 -08:00
Alex Dadgar 117bef6515 Fix AllocRunner not capturing destroy signal and tests 2016-02-04 13:09:53 -08:00
Alex Dadgar 41e1174f72 Client handles updates to KillTimeout and Restart Policy 2016-02-03 19:43:44 -08:00
Alex Dadgar b6f9e9c61c Move restart tracker creation into task runner 2016-02-03 16:16:48 -08:00
Alex Dadgar 6f20d3f435 Restart on-success shouldn't be user specifiable 2016-02-02 17:35:06 -08:00
Alex Dadgar cf1e152f44 Clean interaction between alloc-runner and task-runner 2016-02-02 11:09:29 -08:00
Alex Dadgar a72d39bd04 Don't share task state with the alloc in the task runner 2016-02-01 17:47:53 -08:00