Michael Schurter
2d741c770b
Merge pull request #2732 from hashicorp/b-persist-alloc-updates
...
Persist Alloc when EvalID changes
2017-07-03 14:46:43 -07:00
Michael Schurter
5ec52ec24a
Destroy task group leader first
...
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.
This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter
cff8546035
Fix spelling & re-add immutable state struct
2017-06-23 13:01:39 -07:00
Michael Schurter
d359d3b554
Rename immutable -> alloc
...
meh; naming is hard
2017-06-23 10:58:36 -07:00
Michael Schurter
af2fc0f1bc
Persist Alloc when EvalID changes
2017-06-22 17:33:12 -07:00
Alex Dadgar
ee8dd84965
Fix nil job on allocation
...
The way the copying was happening on the alloc_runner was by temporarily
setting the alloc.Job to nil, copying and then restoring it. This
created an issue in which when the alloc was shared (which it is in
server/client mode and between alloc_runner/task_runner) there were race
conditions that could create a panic.
Fixes https://github.com/hashicorp/nomad/issues/2605
2017-05-17 14:07:06 -04:00
Alex Dadgar
3cd7e06fba
Fix test
2017-05-09 11:35:48 -07:00
Alex Dadgar
ba70cc4f01
Merge branch 'master' into f-bolt-db
2017-05-09 11:11:55 -07:00
Alex Dadgar
843bc26e5d
Respond to comments
2017-05-09 10:50:24 -07:00
Alex Dadgar
730e49a598
Helpful comment
2017-05-03 11:27:33 -07:00
Alex Dadgar
1d8444bc1e
Fix tests
2017-05-03 11:15:30 -07:00
Alex Dadgar
e00f9c9413
Restore state + upgrade path
2017-05-02 18:21:49 -07:00
Alex Dadgar
ec101b4760
Revert "metrics"
...
This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.
2017-05-02 09:28:11 -07:00
Alex Dadgar
e010fdf8c0
metrics
2017-05-01 14:51:27 -07:00
Alex Dadgar
d779defe65
Use batching
2017-05-01 14:50:34 -07:00
Alex Dadgar
b94f855326
boltDB database for client state
2017-05-01 14:50:34 -07:00
Alex Dadgar
bddedd7aba
Don't deepcopy job when retrieving copy of Alloc
...
This PR removes deepcopying of the job attached to the allocation in the
alloc runner. This operation is called very often so removing reflect
from the code path and the potentially large number of mallocs need to
create a job reduced memory and cpu pressure.
2017-05-01 14:50:34 -07:00
Michael Schurter
095d2ee340
Switch java/exec to use Exec in Executor
2017-04-21 16:25:49 -07:00
Michael Schurter
a305b68159
Restart tasks on upgrade with script checks and old executors
2017-04-21 16:25:49 -07:00
Michael Schurter
e204a287ed
Refactor Consul Syncer into new ServiceClient
...
Fixes #2478 #2474 #1995 #2294
The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.
The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.
Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.
Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
61f4a2dac6
Sync allocation state before waiting for a destroy
...
This change ensures that the client syncs allocation state with the
servers before entering its wait loop for the allocation to be
destroyed.
Fixes https://github.com/hashicorp/nomad/issues/2563
2017-04-14 13:09:54 -07:00
Alex Dadgar
c52000f792
FinishedAt only records when the task has actually started
2017-03-31 17:06:05 -07:00
Alex Dadgar
81b78f77e1
Track task start/finish time & improve logs errors
...
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
8238a8601e
Address comment
2017-03-09 21:05:34 -08:00
Alex Dadgar
9011a7984c
Add metrics to show allocations on the client
...
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal
Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
5be806a3df
Fix vet script and fix vet problems
...
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
238b4bcafd
Add Leader support to client
2017-02-10 17:55:19 -08:00
Michael Schurter
acd11f678d
Add COMPAT comment
2017-01-06 11:39:17 -08:00
Michael Schurter
86fcf96f72
Put a logger in AllocDir/TaskDir
2017-01-05 16:31:56 -08:00
Michael Schurter
5a6bd19eb7
Fix upgrade path for #2132
...
AllocRunner's state dropped the Context struct which needs to be
converted to the new AllocDir+TaskDir structs in RestoreState.
TaskRunner added a TaskDirBuilt flag, but it's safe to just let that
default to `false` and rebuild all task dirs once on upgrade.
2017-01-05 16:31:55 -08:00
Michael Schurter
774afd8800
Fail fast on taskdir errors
2017-01-05 16:31:55 -08:00
Michael Schurter
3ea09ba16a
Move chroot building into TaskRunner
...
* Refactor AllocDir to have a TaskDir struct per task.
* Drivers expose filesystem isolation preference
* Fix lxc mounting of `secrets/`
2017-01-05 16:31:49 -08:00
Bastiaan Bakker
2c864172eb
use snap.Alloc.TaskStates only after confirming snap.Alloc is not nil
2016-11-07 22:35:00 +01:00
Alex Dadgar
0fb7742c3c
Task state "dead" is terminal
2016-11-04 16:57:24 -07:00
Alex Dadgar
e85d0ebace
Merge pull request #1840 from hashicorp/f-kill-fail
...
Change how we mark tasks as failed and allow consul-template to fail tasks
2016-10-24 13:40:52 -07:00
Michael Schurter
285e80ac0f
Remove disk usage enforcement
...
Many thanks to @iverberk for the original PR (#1609 ), but we ended up
not wanting to ship this implementation with 0.5.
We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
2016-10-21 13:55:51 -07:00
Alex Dadgar
46a7d1a0d7
Change how we mark tasks as failed and allow consul-template to fail tasks
2016-10-20 17:27:16 -07:00
Alex Dadgar
36cfe6e89e
Large refactor of task runner and Vault token rehandling
2016-10-18 11:24:20 -07:00
Diptanu Choudhury
d50c395421
Getting snapshot of allocation from remote node ( #1741 )
...
* Added the alloc dir move
* Moving allocdirs when starting allocations
* Added the migrate flag to ephemeral disk
* Stopping migration if the allocation doesn't need migration any more
* Added the GetAllocDir method
* refactored code
* Added a test for alloc runner
* Incorporated review comments
2016-10-03 09:59:57 -07:00
Alex Dadgar
50efdb00e9
Merge pull request #1713 from hashicorp/f-alloc-runner-vault
...
Vault integration in client
2016-09-20 16:15:55 -07:00
Alex Dadgar
83905075e5
Fix comment
2016-09-17 11:31:17 -07:00
Alex Dadgar
0f40bd41a3
Handle recovery failure
2016-09-15 12:50:44 -07:00
Alex Dadgar
688e616200
Fix token renewal
2016-09-15 11:20:51 -07:00
Alex Dadgar
ec152a6d12
Clean up vault client
2016-09-14 18:10:56 -07:00
Alex Dadgar
6702a29071
Vault token threaded
2016-09-14 13:30:01 -07:00
Michael Schurter
cd8606b9e3
Revert "A nil context isn't an error"
...
This reverts commit fe9fe4c26259c1ad3bd7e94bd711418aaf819b20.
2016-09-12 12:56:12 -07:00
Michael Schurter
8a57913a44
A nil context isn't an error
2016-09-02 16:24:53 -07:00
Michael Schurter
f601361d58
Don't serialize task states twice in state files
2016-09-02 16:07:06 -07:00
Michael Schurter
6cb6d9cdf1
Lock around saving state
...
Prevent interleaving state syncs as it could conceivably lead to
empty state files as per #1367
2016-09-02 16:07:06 -07:00
Michael Schurter
e7dd443447
Add sanity check to SaveState
...
Also just reuse the task states snapshot taken by `Alloc()` instead of
doing a redundant copy.
2016-09-02 16:07:06 -07:00
Alex Dadgar
2c8dd8bbd3
Revert "Introduce a Secret/ directory"
2016-09-01 17:23:15 -07:00
Alex Dadgar
5d3b47e648
Address comments and reserve
2016-08-31 18:11:02 -07:00
Alex Dadgar
d59e14eed4
Interface + tests
2016-08-30 21:40:32 -07:00
Alex Dadgar
14b7126511
Secret dir, hello world
2016-08-29 15:41:52 -07:00
Ivo Verberk
2a17895a83
Disk resource monitoring and enforcement
2016-08-18 07:59:03 +02:00
Diptanu Choudhury
28b3f511e0
Fixed some error messages
2016-08-10 15:17:32 -07:00
Kenjiro Nakayama
6a810e6f1e
Update after review
2016-08-09 08:57:26 +09:00
Kenjiro Nakayama
5c621b74e5
tiny: Return fmt.Errorf instead of duplicated error messages
2016-08-09 08:57:26 +09:00
Alex Dadgar
898435d372
Retrieve task runners in helper
2016-07-21 13:41:01 -07:00
Alex Dadgar
7b83503596
finer grain locking
2016-06-20 10:19:06 -07:00
Alex Dadgar
744270590b
Guard against bad restore
2016-06-17 14:58:53 -07:00
Alex Dadgar
fdda90229f
only support latest and remove ring buffer
2016-06-12 09:32:38 -07:00
Alex Dadgar
e952540f6f
Allocation resources returned in a struct
2016-06-11 21:04:10 -07:00
Diptanu Choudhury
a64062d6a6
Fixed the compilation on linux
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
0b0d0764e4
Changed signature of Allocation Stats Reporter
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
666b419dba
Acquiring locks before iterating allocations and tasks
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
91d2cf319e
Added some documentation
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
f3d0aecafe
Reporting time series of stats
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
0fb0e0237f
Added a client API to display resource usage of an allocation
2016-05-28 19:42:34 -07:00
Sean Chittenden
dc28ab0cb5
Speling police
2016-05-15 09:41:34 -07:00
Alex Dadgar
f64f03f87e
Test task failure killing TG and fix setting the task as received on a restore
2016-03-25 12:51:40 -07:00
Alex Dadgar
dced530c7c
kill tasks in alloc when one fails
2016-03-25 12:50:25 -07:00
Alex Dadgar
b80e61a66c
Merge pull request #975 from hashicorp/f-rename-complete-alloc
...
Successful allocations are marked as complete instead of dead
2016-03-25 10:35:11 -07:00
Alex Dadgar
94522e7bed
Successful allocations are marked as complete instead of dead
2016-03-23 18:08:19 -07:00
Diptanu Choudhury
f6a932194f
Removing references to old consul services and adding consul config to executor context
2016-03-23 12:19:19 -07:00
Alex Dadgar
ad92e50a24
Avoid serializes Allocation.Resources
2016-03-01 14:09:25 -08:00
Alex Dadgar
61972c9ddc
Refactor task runner to include driver starting into restart policy and add recoverable errors
2016-02-28 16:56:05 -08:00
Diptanu Choudhury
e3d6c4a9dd
Adding version information to snapshots
2016-02-24 19:06:30 -08:00
Alex Dadgar
51bacf674e
address feedback
2016-02-21 21:32:32 -08:00
Alex Dadgar
281e2ca198
Batch client allocation updates to the server
2016-02-21 21:15:02 -08:00
Alex Dadgar
13e5597ca2
Reduce alloc lock contention in client
2016-02-19 19:51:55 -08:00
Alex Dadgar
99d2c173ff
import
2016-02-19 16:31:04 -08:00
Alex Dadgar
2706aa2100
Better comment
2016-02-19 16:02:48 -08:00
Alex Dadgar
d1011c9668
Fixes
2016-02-19 15:49:32 -08:00
Alex Dadgar
e2a4c4ccc5
Client stores when it receives a task
2016-02-19 14:49:43 -08:00
Alex Dadgar
96fd272422
Increase Alloc channel buffers
2016-02-18 20:43:48 -08:00
Alex Dadgar
f3d5598830
Unlock in error path
2016-02-11 08:38:16 -08:00
Alex Dadgar
4d7ed4f164
Strip as much copystructure as possible
2016-02-10 17:54:43 -08:00
Alex Dadgar
0c4c3fc4ee
safe but slow
2016-02-10 13:44:53 -08:00
Alex Dadgar
c744e2f4f1
Update the consul service when the task/alloc changes
2016-02-06 17:08:20 -08:00
Alex Dadgar
e8067029cc
Small fixes
2016-02-04 14:19:27 -08:00
Alex Dadgar
117bef6515
Fix AllocRunner not capturing destroy signal and tests
2016-02-04 13:09:53 -08:00
Alex Dadgar
41e1174f72
Client handles updates to KillTimeout and Restart Policy
2016-02-03 19:43:44 -08:00
Alex Dadgar
b6f9e9c61c
Move restart tracker creation into task runner
2016-02-03 16:16:48 -08:00
Alex Dadgar
6f20d3f435
Restart on-success shouldn't be user specifiable
2016-02-02 17:35:06 -08:00
Alex Dadgar
cf1e152f44
Clean interaction between alloc-runner and task-runner
2016-02-02 11:09:29 -08:00
Alex Dadgar
a72d39bd04
Don't share task state with the alloc in the task runner
2016-02-01 17:47:53 -08:00
Alex Dadgar
b5260fc14e
Fix locks and use task runners state not alloc state
2016-02-01 15:43:59 -08:00
Alex Dadgar
2d98c0eadd
Fix double pull with introduction of AllocModifyIndex
2016-02-01 15:43:59 -08:00
Alex Dadgar
a5e9e2068c
Make NewRestartTracker private
2015-12-18 12:17:54 -08:00