Commit graph

360 commits

Author SHA1 Message Date
Diptanu Choudhury 11d7cb1230 Making the GC related fields tunable 2017-01-31 15:51:20 -08:00
Diptanu Choudhury 84a491f85a Locking appropriately before closing the channel to indicate migration 2017-01-23 10:46:57 -08:00
Michael Schurter 054ee8df59 Fix index we get allocs by 2017-01-20 16:30:40 -08:00
Diptanu Choudhury 1999b7eebb Merge pull request #2159 from hashicorp/b-consul-config
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury e927de02d2 Moved functions to helper from structs 2017-01-18 15:55:14 -08:00
Alex Dadgar 5d2b56b387 Random wait 2017-01-11 13:24:23 -08:00
Alex Dadgar c19985244a GetAllocs uses a blocking query
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.

This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.

The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations.  However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.

The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter 86fcf96f72 Put a logger in AllocDir/TaskDir 2017-01-05 16:31:56 -08:00
Diptanu Choudhury 247bda9a88 Unlocking if we return before adding a new alloc runner 2017-01-05 13:18:48 -08:00
Diptanu Choudhury 9721a1ab04 Fixed how alloc lock is held 2017-01-05 13:06:56 -08:00
Michael Schurter 13064768ac Fix race when shutting down in dev mode
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.

Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.

Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00
Michael Schurter 4a9a574d9d Merge pull request #2054 from hashicorp/f-prestart
Add Driver.Prestart method
2016-12-20 16:18:56 -08:00
Diptanu Choudhury b6120e2fc8 Removing the alloc runner from GC if it is destroyed by the server 2016-12-20 11:14:22 -08:00
Diptanu Choudhury 6e6e0d364a Added comments 2016-12-20 10:49:48 -08:00
Diptanu Choudhury 36b5545d6b Making the gc allocator understand real disk usage 2016-12-16 18:34:59 -08:00
Diptanu Choudhury 7aef9bcabe Added the stats collector to GC 2016-12-14 15:11:11 -08:00
Diptanu Choudhury e855cd587b Refactored hoststats collector 2016-12-14 15:07:42 -08:00
Diptanu Choudhury 0ffd92668d GC-ing before we start a new allocation 2016-12-14 15:04:06 -08:00
Diptanu Choudhury afdaa979f7 Added a garbage collector for allocations 2016-12-14 15:01:12 -08:00
Alex Dadgar 648ad2ebc5 Merge pull request #2096 from hashicorp/b-addAlloc
Fix race and remove panic
2016-12-13 13:50:17 -08:00
Diptanu Choudhury 53fb09023c cancelling waiting for remote allocation if the alloc doesn't need migration 2016-12-13 13:06:33 -08:00
Alex Dadgar 3cbd237512 Fix race and remove panic 2016-12-13 12:34:23 -08:00
Christoffer Kylvåg 6a1f32b8ba #1680: Continue after not being able to stat a mountpoint 2016-12-13 12:28:57 +01:00
Diptanu Choudhury cbf73908ff Setting the appropriate file permissions which un-archiving compressed alloc dir 2016-12-05 17:04:43 -08:00
Diptanu Choudhury bc17cacca0 Merge pull request #2017 from hashicorp/b-sticky
Not moving alloc data when sticky is turned off
2016-12-05 14:11:45 -08:00
Diptanu Choudhury 21f49564d3 Not moving alloc data when sticky is turned off 2016-12-05 14:00:01 -08:00
Michael Schurter 770ed703d0 Add Driver.Prestart method
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.

Currently Prestart does two things:

* Any driver specific task environment building
* Download Docker images

This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar 86ed1fb2e5 Disallow stale queries when deriving Vault tokens
This PR disallows stale queries when deriving a Vault token. Allowing
stale queries could result in the allocation not existing on the server
that is servicing the request.
2016-12-01 11:13:36 -08:00
Alex Dadgar ec4d6936ff add debug panic 2016-11-29 15:57:40 -08:00
Diptanu Choudhury f67217297c Ensuring allocs are not added multiple times to blocking queue 2016-11-29 11:19:37 -08:00
Alex Dadgar 88c7e04348 Check for Ephemeral Disk being nil 2016-11-15 10:03:06 -08:00
Alex Dadgar ee921ccbb2 Merge pull request #1949 from carlpett/blacklist-fingerprints-and-drivers
Support blacklisting fingerprinters
2016-11-09 10:31:17 -08:00
Calle Pettersson 4304755c12 Address comments from PR 2016-11-09 11:50:16 +01:00
Calle Pettersson 8632696e2d Add blacklisting of drivers 2016-11-08 18:30:07 +01:00
Calle Pettersson b603bb007e Add blacklisting of fingerprinters 2016-11-08 18:29:44 +01:00
Alex Dadgar 9015e79aaa Add compatibility code for secret ID while upgrading cluster in both server/client mode on single nodes 2016-11-07 16:52:08 -08:00
Diptanu Choudhury 1a8fa8c8d5 Making Nomad TLS configs region aware 2016-11-01 11:55:29 -07:00
Diptanu Choudhury 4079545a92 Making the client use tls if the node from which migration has to be made has enabled tls 2016-10-31 10:20:04 -07:00
Michael Schurter cc115fe984 Swap log line classifiers to be consistent 2016-10-28 14:59:48 -07:00
Diptanu Choudhury 3182d0454f Adding the alloc if we can't find the TG 2016-10-27 15:45:10 -07:00
Diptanu Choudhury 0682a1a113 Not blocking for remote alloc if the alloc is not sticky 2016-10-27 12:04:55 -07:00
Alex Dadgar 150b678a6b Merge pull request #1806 from hashicorp/f-docker4mac-fixes
A couple fixes to make Docker For Mac work
2016-10-27 09:29:40 -07:00
Diptanu Choudhury 50ca5e1e9d Merge pull request #1853 from hashicorp/f-rpc-http-tls
TLS support for http and RPC
2016-10-25 16:14:43 -07:00
Diptanu Choudhury 7c61e115bd Moved tlsutil into helpers 2016-10-25 16:05:37 -07:00
Diptanu Choudhury 353e7fc7f1 Moving the certs into tlsutil package 2016-10-25 16:01:53 -07:00
Diptanu Choudhury cf35aeac84 Moving the TLSConfig to structs 2016-10-25 15:57:38 -07:00
Alex Dadgar 03eba049ed Merge pull request #1848 from hashicorp/f-vault-error
Thread through whether DeriveToken error is recoverable or not
2016-10-24 15:01:18 -07:00
Alex Dadgar 692a809919 Merge pull request #1842 from hashicorp/f-version-and-id
Print the version and client node ID
2016-10-24 10:13:33 -07:00
Diptanu Choudhury 2e3118e69c Implemented TLS support for http and rpc 2016-10-23 22:22:00 -07:00
Alex Dadgar ede3a814ba Small fixes 2016-10-22 18:20:50 -07:00
Alex Dadgar 0070178741 Thread through whether DeriveToken error is recoverable or not 2016-10-22 18:08:30 -07:00
Michael Schurter 285e80ac0f Remove disk usage enforcement
Many thanks to @iverberk for the original PR (#1609), but we ended up
not wanting to ship this implementation with 0.5.

We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
2016-10-21 13:55:51 -07:00
Alex Dadgar aa0d8d0d8d Print the version and client node ID 2016-10-20 17:46:04 -07:00
Evan Phoenix e7a98d5500 Make EvalSymlink errors more verbose 2016-10-12 17:07:21 -07:00
Evan Phoenix f8a65a3b9d Resolve alloc/state directories to make Docker For Mac happy
* In -dev mode, `ioutil.TempDir` is used for the alloc and state
directories.
* `TempDir` uses `$TMPDIR`, which os OS X contains a per user
directory which is under `/var/folder`.
* `/var` is actually a symlink to `/private/var`
* Docker For Mac validates the directories that are passed to bind and on
OS X. That whitelist contains `/private`, but not `/var`. It does not
expand the path, and so any paths in `$TMPDIR` fail the whitelist check.

And thusly, by expanding the alloc/state directories the value passed
for binding does contain `/private` and Docker For Mac is happy.
2016-10-12 17:06:25 -07:00
Michael Schurter 6dea6df919 Restore lost chan inits 2016-10-03 14:56:50 -07:00
Diptanu Choudhury d50c395421 Getting snapshot of allocation from remote node (#1741)
* Added the alloc dir move

* Moving allocdirs when starting allocations

* Added the migrate flag to ephemeral disk

* Stopping migration if the allocation doesn't need migration any more

* Added the GetAllocDir method

* refactored code

* Added a test for alloc runner

* Incorporated review comments
2016-10-03 09:59:57 -07:00
Michael Schurter b117725dc9 Only log consul errors once since last succesful run 2016-09-28 17:18:45 -07:00
Michael Schurter d486de3804 Remove unused const 2016-09-27 16:04:01 -07:00
Michael Schurter 2e696c5e61 Fix lies found in comments by fact checkers 2016-09-26 16:51:53 -07:00
Michael Schurter 11cf9686a6 No need to put reaper ticker on the struct 2016-09-26 16:15:19 -07:00
Michael Schurter 2eb0062959 Drop clumsy timeout on discovery notifications
It's better to just let goroutines fallback to their longer retry
intervals then try to be clever here.
2016-09-26 16:05:21 -07:00
Michael Schurter 307e674eca Flip disco chan; clarify method names/comments 2016-09-26 15:52:40 -07:00
Michael Schurter 888ee21270 Return csv of servers from Stats, not just count 2016-09-26 15:40:26 -07:00
Michael Schurter 7dc0079dd2 doDisco -> triggerDiscoveryCh; discovered -> serversDiscoveredCh
Also fix log line formatting
2016-09-26 15:21:28 -07:00
Michael Schurter 434e4be97c noServers -> noServersErr 2016-09-26 15:12:35 -07:00
Michael Schurter b2ddb85a78 consul -> Consul 2016-09-26 15:06:57 -07:00
Michael Schurter 37cfb2769c Replace periodic handlers with event driven disco
Remove use of periodic consul handlers in the client and just use
goroutines. Consul Discovery is now triggered with a chan instead of
using a timer and deadline to trigger.

Once discovery is complete a chan is ticked so all goroutines waiting
for servers will run.

Should speed up bootstraping and recovery while decreasing spinning on
timers.
2016-09-23 17:02:48 -07:00
Michael Schurter 2ab5264595 Retry all servers on RPC call failure
rpcproxy is refactored into serverlist which prioritizes good servers
over servers in a remote DC or who have had a failure.

Registration, heartbeating, and alloc status updating will retry faster
when new servers are discovered.

Consul discovery will be retried more quickly when no servers are
available (eg on startup or an outage).
2016-09-23 11:44:48 -07:00
Alex Dadgar 50efdb00e9 Merge pull request #1713 from hashicorp/f-alloc-runner-vault
Vault integration in client
2016-09-20 16:15:55 -07:00
Alex Dadgar 64de46432a Merge pull request #1677 from hashicorp/f-vault-implicit-constraint
Vault implicit Task Group constraint + allow root tokens
2016-09-20 16:15:32 -07:00
Alex Dadgar ec152a6d12 Clean up vault client 2016-09-14 18:10:56 -07:00
Alex Dadgar 6702a29071 Vault token threaded 2016-09-14 13:30:01 -07:00
Robert Neumayer 8dc19dbd10 Log adding of servers at INFO level 2016-09-14 22:24:17 +02:00
Alex Dadgar 2c8dd8bbd3 Revert "Introduce a Secret/ directory" 2016-09-01 17:23:15 -07:00
Alex Dadgar b0adaa5301 Allow root token 2016-09-01 12:05:08 -07:00
Alex Dadgar 1ed454dd60 Merge pull request #1671 from hashicorp/f-secret-dir2
Introduce a Secret/ directory
2016-09-01 09:56:17 -07:00
Alex Dadgar 9fa23e3536 Symlink on windows 2016-08-31 21:41:44 -07:00
Alex Dadgar 5d3b47e648 Address comments and reserve 2016-08-31 18:11:02 -07:00
vishalnayak 55a6f06e15 Addressed review feedback 2016-08-30 13:08:13 -04:00
vishalnayak 3808dd0ff8 Return only fatal error to renewal error channel 2016-08-30 12:46:59 -04:00
vishalnayak a0dbfe25b3 Fix tests 2016-08-29 21:30:06 -04:00
vishalnayak 82f6209e97 tokenDeriver function pointer to derive tokens.
Remove rpc*, connPool, node and region from vaultclient.
2016-08-29 20:32:05 -04:00
Alex Dadgar 14b7126511 Secret dir, hello world 2016-08-29 15:41:52 -07:00
vishalnayak 56e42cf03d Employ DeriveVaultToken API and flesh-up DeriveToken 2016-08-24 12:29:59 -04:00
vishalnayak 6002e596c4 VaultClient for Nomad Client 2016-08-24 09:43:45 -04:00
Diptanu Choudhury 1e1eef56a1 Putting the mock driver behind a build flag 2016-08-22 15:02:28 -05:00
Diptanu Choudhury 4ca623bcfe blocking chained allocations until previous allocation hasn't terminated 2016-08-22 11:34:24 -05:00
Alex Dadgar a90dafe9ab handle the upgrade case 2016-08-18 19:01:24 -07:00
Alex Dadgar 895c31f605 Nodes generate Secret ID and used for retrieving allocations and registering 2016-08-17 16:31:47 -07:00
Alex Dadgar 84820db86f If the client detects that a heartbeat has failed because it is not registered, reregister 2016-08-15 17:24:09 -07:00
Diptanu Choudhury 28b3f511e0 Fixed some error messages 2016-08-10 15:17:32 -07:00
Kenjiro Nakayama 6a810e6f1e Update after review 2016-08-09 08:57:26 +09:00
Kenjiro Nakayama 5c621b74e5 tiny: Return fmt.Errorf instead of duplicated error messages 2016-08-09 08:57:26 +09:00
Diptanu Choudhury 41b540fbc8 Allow operators to opt into publishing node and alloc metrics 2016-08-01 19:52:20 -07:00
Cameron Davison 777bdf4a1e
fix setup consul syncer error message 2016-07-28 22:14:52 -05:00
Alex Dadgar ebac5cb283 Node.Register handles the case of transistioning to ready and creating evals 2016-07-21 15:22:02 -07:00
Diptanu Choudhury 5b39a5db40 Fixed a debug message 2016-07-09 00:12:53 -07:00
Sean Chittenden 03c571c61b
Consolidate fingerprinters into a single map. 2016-07-08 23:37:14 -07:00
Sean Chittenden 8bdb38d016
Code golf
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00