Commit graph

349 commits

Author SHA1 Message Date
Michael Schurter e41a654917 switch from alloc blocker to new interface
interface has 3 implementations:

1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter ee04717a0b initial attempt at refactoring blocked/migrating 2017-08-11 16:21:35 -07:00
Alex Dadgar ecee5e370e initial watcher 2017-07-07 12:07:08 -07:00
Michael Schurter 644f0cfaa4 Consistently quote alloc ids in client logs 2017-07-06 10:24:52 -07:00
Michael Schurter 4fd9ef6a8c Tiny client race condition fix
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter 596727230b Suggest wiping out alloc dir too 2017-07-03 12:29:21 -07:00
Michael Schurter 11f68bfca2 Add more logging to restore state errors 2017-07-03 11:58:41 -07:00
Mark Mickan c196d320f8 Add tests for migrating symlinks in alloc and local directories 2017-06-04 15:56:22 +09:30
Mark Mickan 236f24c9a4 Include symlinks in snapshots when migrating disks
Fixes #2685
2017-06-04 00:36:18 +09:30
Alex Dadgar b1eea2269a Fix deadlock 2017-05-31 14:05:47 -07:00
Michael Schurter ffc2b36dc7 Merge pull request #2636 from hashicorp/f-gc-alloc-limit
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter dd51aa1cb9 Merge pull request #2654 from hashicorp/f-env-consul
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Alex Dadgar 28aef447e9 Fix perms to just set exec bit 2017-05-25 14:44:13 -07:00
Michael Schurter fd9bef768f Move task env into execcontext
Also inject PATH into rkt commands since we're no longer appending host
env vars for it.
2017-05-23 13:53:34 -07:00
Michael Schurter 3841692138 gc_max_allocs should include blocked & migrating 2017-05-12 16:03:22 -07:00
Michael Schurter 0453c2709c Add new gc_max_allocs tuneable
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar 68c3a2bd98 Fix vet errors 2017-05-11 13:08:08 -07:00
Alex Dadgar 843bc26e5d Respond to comments 2017-05-09 10:50:24 -07:00
Alex Dadgar e00f9c9413 Restore state + upgrade path 2017-05-02 18:21:49 -07:00
Alex Dadgar ec101b4760 Revert "metrics"
This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.
2017-05-02 09:28:11 -07:00
Alex Dadgar 8e516b5dc2 Async and sync saving of client state 2017-05-01 16:16:53 -07:00
Alex Dadgar a7fd08d42a perf 2017-05-01 16:01:50 -07:00
Alex Dadgar e010fdf8c0 metrics 2017-05-01 14:51:27 -07:00
Alex Dadgar b94f855326 boltDB database for client state 2017-05-01 14:50:34 -07:00
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar 2321e8a4a0 Hash host ID so its stable and well distributed
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.

Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.

Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar 81b78f77e1 Track task start/finish time & improve logs errors
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar 5e7e19de4b Merge pull request #2461 from hashicorp/b-groups
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar 4ecebe7d8c Proper reference counting through task restarts
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar a171a014b3 Various fixes for setting user/group of task
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.

Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar 70e4feb045 Limit parallelism during garbage collection
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar 9011a7984c Add metrics to show allocations on the client
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal

Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar 5be806a3df Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar 6910678c21 Allow random UUID 2017-02-27 13:42:37 -08:00
Alex Dadgar 7203dee7ab Add allocated/unallocated metrics to client 2017-02-16 18:28:11 -08:00
Sean Chittenden c4c321c770
Unconditionally lowercase the node ID read from disk. 2017-02-06 16:20:17 -08:00
Sean Chittenden adb5be23ef
Add better verification of a host's HostID. 2017-02-02 16:24:32 -08:00
Sean Chittenden bb4347e277
Slight mis-merge: secret-id in dev mode is random and needs to be returned. 2017-02-01 22:20:52 -08:00
Sean Chittenden bb422a2258
Generate a durable NodeID if possible, otherwise fall back to a random HostID. 2017-02-01 22:11:33 -08:00
Diptanu Choudhury 11d7cb1230 Making the GC related fields tunable 2017-01-31 15:51:20 -08:00
Diptanu Choudhury 84a491f85a Locking appropriately before closing the channel to indicate migration 2017-01-23 10:46:57 -08:00
Michael Schurter 054ee8df59 Fix index we get allocs by 2017-01-20 16:30:40 -08:00
Diptanu Choudhury 1999b7eebb Merge pull request #2159 from hashicorp/b-consul-config
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury e927de02d2 Moved functions to helper from structs 2017-01-18 15:55:14 -08:00
Alex Dadgar 5d2b56b387 Random wait 2017-01-11 13:24:23 -08:00
Alex Dadgar c19985244a GetAllocs uses a blocking query
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.

This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.

The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations.  However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.

The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter 86fcf96f72 Put a logger in AllocDir/TaskDir 2017-01-05 16:31:56 -08:00
Diptanu Choudhury 247bda9a88 Unlocking if we return before adding a new alloc runner 2017-01-05 13:18:48 -08:00
Diptanu Choudhury 9721a1ab04 Fixed how alloc lock is held 2017-01-05 13:06:56 -08:00
Michael Schurter 13064768ac Fix race when shutting down in dev mode
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.

Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.

Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00