open-nomad

Author	SHA1	Message	Date
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00
Alex Dadgar	d306da846c	changelog and feedback	2017-09-14 14:08:58 -07:00
Alex Dadgar	07ed83fdd5	Non-locked accessors to common Node fields This PR removes locking around commonly accessed node attributes that do not need to be locked. The locking could cause nodes to TTL as the heartbeat code path was acquiring a lock that could be held for an excessively long time. An example of this is when Vault is inaccessible, since the fingerprint is run with a lock held but the Vault fingerprinter makes the API calls with a large timeout. Fixes https://github.com/hashicorp/nomad/issues/2689	2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo	848af92183	fix panic in emitting tagged metrics	2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo	0ef43c3c5f	final code review fixups	2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo	a8cbd0b559	fixups from code review	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f72e4aad13	labels depend on full setup of client beforehand	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	87a814397d	refactor to use baseLabels	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	b2953d905a	pass in commonly used values	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	c634043069	create base labels to be used in every metric	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f5ea83da8d	emit metrics using labels, add option for backwards compatibility	2017-09-05 14:12:57 +00:00
Armon Dadgar	76a03f2d8e	Address @dadgar feedback	2017-09-04 13:05:53 -07:00
Armon Dadgar	688897561b	client: adding token cache for ACL resolution	2017-09-04 13:05:36 -07:00
Armon Dadgar	c2e72e8a9c	client: create ACL and Policy cache	2017-09-04 13:05:35 -07:00
Michael Schurter	7342e23669	Move migrating state into prevAllocWatcher	2017-08-14 16:02:28 -07:00
Michael Schurter	e41a654917	switch from alloc blocker to new interface interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block	2017-08-11 16:21:35 -07:00
Michael Schurter	ee04717a0b	initial attempt at refactoring blocked/migrating	2017-08-11 16:21:35 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Michael Schurter	644f0cfaa4	Consistently quote alloc ids in client logs	2017-07-06 10:24:52 -07:00
Michael Schurter	4fd9ef6a8c	Tiny client race condition fix Plus some logging improvements that may help with #2563	2017-07-05 16:15:19 -07:00
Michael Schurter	596727230b	Suggest wiping out alloc dir too	2017-07-03 12:29:21 -07:00
Michael Schurter	11f68bfca2	Add more logging to restore state errors	2017-07-03 11:58:41 -07:00
Mark Mickan	c196d320f8	Add tests for migrating symlinks in alloc and local directories	2017-06-04 15:56:22 +09:30
Mark Mickan	236f24c9a4	Include symlinks in snapshots when migrating disks Fixes #2685	2017-06-04 00:36:18 +09:30
Alex Dadgar	b1eea2269a	Fix deadlock	2017-05-31 14:05:47 -07:00
Michael Schurter	ffc2b36dc7	Merge pull request #2636 from hashicorp/f-gc-alloc-limit Add new gc_max_allocs tuneable	2017-05-30 16:14:09 -07:00
Michael Schurter	dd51aa1cb9	Merge pull request #2654 from hashicorp/f-env-consul Add envconsul-like support and refactor environment handling	2017-05-30 14:40:14 -07:00
Alex Dadgar	28aef447e9	Fix perms to just set exec bit	2017-05-25 14:44:13 -07:00
Michael Schurter	fd9bef768f	Move task env into execcontext Also inject PATH into rkt commands since we're no longer appending host env vars for it.	2017-05-23 13:53:34 -07:00
Michael Schurter	3841692138	gc_max_allocs should include blocked & migrating	2017-05-12 16:03:22 -07:00
Michael Schurter	0453c2709c	Add new gc_max_allocs tuneable More than gc_max_allocs may be running on a node, but terminal allocs will be garbage collected to try to keep the total number below the limit.	2017-05-11 17:18:02 -07:00
Alex Dadgar	68c3a2bd98	Fix vet errors	2017-05-11 13:08:08 -07:00
Alex Dadgar	843bc26e5d	Respond to comments	2017-05-09 10:50:24 -07:00
Alex Dadgar	e00f9c9413	Restore state + upgrade path	2017-05-02 18:21:49 -07:00
Alex Dadgar	ec101b4760	Revert "metrics" This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.	2017-05-02 09:28:11 -07:00
Alex Dadgar	8e516b5dc2	Async and sync saving of client state	2017-05-01 16:16:53 -07:00
Alex Dadgar	a7fd08d42a	perf	2017-05-01 16:01:50 -07:00
Alex Dadgar	e010fdf8c0	metrics	2017-05-01 14:51:27 -07:00
Alex Dadgar	b94f855326	boltDB database for client state	2017-05-01 14:50:34 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00
Alex Dadgar	2321e8a4a0	Hash host ID so its stable and well distributed This PR takes the host ID and runs it through a hash so that it is well distributed. This makes it so that machines that report similar host IDs are easily distinguished. Instances of similar IDs occur on EC2 where the ID is prefixed and on motherboards created in the same batch. Fixes https://github.com/hashicorp/nomad/issues/2534	2017-04-10 11:44:51 -07:00
Alex Dadgar	81b78f77e1	Track task start/finish time & improve logs errors This PR adds tracking to when a task starts and finishes and the logs API takes advantage of this and returns better errors when asking for logs that do not exist.	2017-03-31 16:14:11 -07:00
Alex Dadgar	5e7e19de4b	Merge pull request #2461 from hashicorp/b-groups Various fixes for setting user/group of task	2017-03-28 11:13:27 -07:00
Alex Dadgar	4ecebe7d8c	Proper reference counting through task restarts This PR fixes an issue in which the reference count on a Docker image would become inflated through task restarts.	2017-03-25 17:05:53 -07:00
Alex Dadgar	a171a014b3	Various fixes for setting user/group of task This PR fixes two issues: * Folder permissions in -dev mode were incorrect and not suitable for running as a particular user. * Was not setting the group membership properly for the launched process. Fixes https://github.com/hashicorp/nomad/issues/2160	2017-03-20 14:21:13 -07:00
Alex Dadgar	70e4feb045	Limit parallelism during garbage collection This PR introduces a parallelism limit during garbage collection. This is used to avoid large resource usage spikes if garbage collecting many allocations at once.	2017-03-10 16:27:00 -08:00
Alex Dadgar	9011a7984c	Add metrics to show allocations on the client This PR adds the following metrics to the client: client.allocations.migrating client.allocations.blocked client.allocations.pending client.allocations.running client.allocations.terminal Also adds some missing fields to the API version of the evaluation.	2017-03-09 12:37:41 -08:00
Alex Dadgar	5be806a3df	Fix vet script and fix vet problems This PR fixes our vet script and fixes all the missed vet changes. It also fixes pointers being printed in `nomad stop <job>` and `nomad node-status <node>`.	2017-02-27 16:00:19 -08:00
Alex Dadgar	6910678c21	Allow random UUID	2017-02-27 13:42:37 -08:00
Alex Dadgar	7203dee7ab	Add allocated/unallocated metrics to client	2017-02-16 18:28:11 -08:00
Sean Chittenden	c4c321c770	Unconditionally lowercase the node ID read from disk.	2017-02-06 16:20:17 -08:00
Sean Chittenden	adb5be23ef	Add better verification of a host's HostID.	2017-02-02 16:24:32 -08:00
Sean Chittenden	bb4347e277	Slight mis-merge: secret-id in dev mode is random and needs to be returned.	2017-02-01 22:20:52 -08:00
Sean Chittenden	bb422a2258	Generate a durable NodeID if possible, otherwise fall back to a random HostID.	2017-02-01 22:11:33 -08:00
Diptanu Choudhury	11d7cb1230	Making the GC related fields tunable	2017-01-31 15:51:20 -08:00
Diptanu Choudhury	84a491f85a	Locking appropriately before closing the channel to indicate migration	2017-01-23 10:46:57 -08:00
Michael Schurter	054ee8df59	Fix index we get allocs by	2017-01-20 16:30:40 -08:00
Diptanu Choudhury	1999b7eebb	Merge pull request #2159 from hashicorp/b-consul-config Fixed merging consul config	2017-01-18 16:14:54 -08:00
Diptanu Choudhury	e927de02d2	Moved functions to helper from structs	2017-01-18 15:55:14 -08:00
Alex Dadgar	5d2b56b387	Random wait	2017-01-11 13:24:23 -08:00
Alex Dadgar	c19985244a	GetAllocs uses a blocking query This PR makes GetAllocs use a blocking query as well as adding a sanity check to the clients watchAllocation code to ensure it gets the correct allocations. This PR fixes https://github.com/hashicorp/nomad/issues/2119 and https://github.com/hashicorp/nomad/issues/2153. The issue was that the client was talking to two different servers, one to check which allocations to pull and the other to pull those allocations. However the latter call was not with a blocking query and thus the client would not retreive the allocations it requested. The logging has been improved to make the problem more clear as well.	2017-01-10 13:30:35 -08:00
Michael Schurter	86fcf96f72	Put a logger in AllocDir/TaskDir	2017-01-05 16:31:56 -08:00
Diptanu Choudhury	247bda9a88	Unlocking if we return before adding a new alloc runner	2017-01-05 13:18:48 -08:00
Diptanu Choudhury	9721a1ab04	Fixed how alloc lock is held	2017-01-05 13:06:56 -08:00
Michael Schurter	13064768ac	Fix race when shutting down in dev mode Client.Shutdown holds the allocLock when destroying alloc runners in dev mode. Client.updateAllocStatus can be called during AllocRunner shutdown and calls getAllocRunners which tries to acquire allocLock.RLock. This deadlocks since Client.Shutdown already has the write lock. Switching Client.Shutdown to use getAllocRunners and not hold a lock during AllocRunner shutdown is the solution.	2017-01-03 17:21:50 -08:00
Michael Schurter	4a9a574d9d	Merge pull request #2054 from hashicorp/f-prestart Add Driver.Prestart method	2016-12-20 16:18:56 -08:00
Diptanu Choudhury	b6120e2fc8	Removing the alloc runner from GC if it is destroyed by the server	2016-12-20 11:14:22 -08:00
Diptanu Choudhury	6e6e0d364a	Added comments	2016-12-20 10:49:48 -08:00
Diptanu Choudhury	36b5545d6b	Making the gc allocator understand real disk usage	2016-12-16 18:34:59 -08:00
Diptanu Choudhury	7aef9bcabe	Added the stats collector to GC	2016-12-14 15:11:11 -08:00
Diptanu Choudhury	e855cd587b	Refactored hoststats collector	2016-12-14 15:07:42 -08:00
Diptanu Choudhury	0ffd92668d	GC-ing before we start a new allocation	2016-12-14 15:04:06 -08:00
Diptanu Choudhury	afdaa979f7	Added a garbage collector for allocations	2016-12-14 15:01:12 -08:00
Alex Dadgar	648ad2ebc5	Merge pull request #2096 from hashicorp/b-addAlloc Fix race and remove panic	2016-12-13 13:50:17 -08:00
Diptanu Choudhury	53fb09023c	cancelling waiting for remote allocation if the alloc doesn't need migration	2016-12-13 13:06:33 -08:00
Alex Dadgar	3cbd237512	Fix race and remove panic	2016-12-13 12:34:23 -08:00
Christoffer Kylvåg	6a1f32b8ba	#1680 : Continue after not being able to stat a mountpoint	2016-12-13 12:28:57 +01:00
Diptanu Choudhury	cbf73908ff	Setting the appropriate file permissions which un-archiving compressed alloc dir	2016-12-05 17:04:43 -08:00
Diptanu Choudhury	bc17cacca0	Merge pull request #2017 from hashicorp/b-sticky Not moving alloc data when sticky is turned off	2016-12-05 14:11:45 -08:00
Diptanu Choudhury	21f49564d3	Not moving alloc data when sticky is turned off	2016-12-05 14:00:01 -08:00
Michael Schurter	770ed703d0	Add Driver.Prestart method The Driver.Prestart method currently does very little but lays the foundation for where lifecycle plugins can interleave execution _after_ task environment setup but _before_ the task starts. Currently Prestart does two things: * Any driver specific task environment building * Download Docker images This change also attaches a TaskEvent emitter to Drivers, so they can emit events during task initialization.	2016-12-02 11:03:48 -08:00
Alex Dadgar	86ed1fb2e5	Disallow stale queries when deriving Vault tokens This PR disallows stale queries when deriving a Vault token. Allowing stale queries could result in the allocation not existing on the server that is servicing the request.	2016-12-01 11:13:36 -08:00
Alex Dadgar	ec4d6936ff	add debug panic	2016-11-29 15:57:40 -08:00
Diptanu Choudhury	f67217297c	Ensuring allocs are not added multiple times to blocking queue	2016-11-29 11:19:37 -08:00
Alex Dadgar	88c7e04348	Check for Ephemeral Disk being nil	2016-11-15 10:03:06 -08:00
Alex Dadgar	ee921ccbb2	Merge pull request #1949 from carlpett/blacklist-fingerprints-and-drivers Support blacklisting fingerprinters	2016-11-09 10:31:17 -08:00
Calle Pettersson	4304755c12	Address comments from PR	2016-11-09 11:50:16 +01:00
Calle Pettersson	8632696e2d	Add blacklisting of drivers	2016-11-08 18:30:07 +01:00
Calle Pettersson	b603bb007e	Add blacklisting of fingerprinters	2016-11-08 18:29:44 +01:00
Alex Dadgar	9015e79aaa	Add compatibility code for secret ID while upgrading cluster in both server/client mode on single nodes	2016-11-07 16:52:08 -08:00
Diptanu Choudhury	1a8fa8c8d5	Making Nomad TLS configs region aware	2016-11-01 11:55:29 -07:00
Diptanu Choudhury	4079545a92	Making the client use tls if the node from which migration has to be made has enabled tls	2016-10-31 10:20:04 -07:00
Michael Schurter	cc115fe984	Swap log line classifiers to be consistent	2016-10-28 14:59:48 -07:00
Diptanu Choudhury	3182d0454f	Adding the alloc if we can't find the TG	2016-10-27 15:45:10 -07:00
Diptanu Choudhury	0682a1a113	Not blocking for remote alloc if the alloc is not sticky	2016-10-27 12:04:55 -07:00
Alex Dadgar	150b678a6b	Merge pull request #1806 from hashicorp/f-docker4mac-fixes A couple fixes to make Docker For Mac work	2016-10-27 09:29:40 -07:00
Diptanu Choudhury	50ca5e1e9d	Merge pull request #1853 from hashicorp/f-rpc-http-tls TLS support for http and RPC	2016-10-25 16:14:43 -07:00
Diptanu Choudhury	7c61e115bd	Moved tlsutil into helpers	2016-10-25 16:05:37 -07:00
Diptanu Choudhury	353e7fc7f1	Moving the certs into tlsutil package	2016-10-25 16:01:53 -07:00
Diptanu Choudhury	cf35aeac84	Moving the TLSConfig to structs	2016-10-25 15:57:38 -07:00
Alex Dadgar	03eba049ed	Merge pull request #1848 from hashicorp/f-vault-error Thread through whether DeriveToken error is recoverable or not	2016-10-24 15:01:18 -07:00
Alex Dadgar	692a809919	Merge pull request #1842 from hashicorp/f-version-and-id Print the version and client node ID	2016-10-24 10:13:33 -07:00
Diptanu Choudhury	2e3118e69c	Implemented TLS support for http and rpc	2016-10-23 22:22:00 -07:00
Alex Dadgar	ede3a814ba	Small fixes	2016-10-22 18:20:50 -07:00
Alex Dadgar	0070178741	Thread through whether DeriveToken error is recoverable or not	2016-10-22 18:08:30 -07:00
Michael Schurter	285e80ac0f	Remove disk usage enforcement Many thanks to @iverberk for the original PR (#1609), but we ended up not wanting to ship this implementation with 0.5. We'll come back to it after 0.5 and hopefully find a way to leverage filesystem accounting and quotas, so we can skip the expensive polling.	2016-10-21 13:55:51 -07:00
Alex Dadgar	aa0d8d0d8d	Print the version and client node ID	2016-10-20 17:46:04 -07:00
Evan Phoenix	e7a98d5500	Make EvalSymlink errors more verbose	2016-10-12 17:07:21 -07:00
Evan Phoenix	f8a65a3b9d	Resolve alloc/state directories to make Docker For Mac happy * In -dev mode, `ioutil.TempDir` is used for the alloc and state directories. * `TempDir` uses `$TMPDIR`, which os OS X contains a per user directory which is under `/var/folder`. * `/var` is actually a symlink to `/private/var` * Docker For Mac validates the directories that are passed to bind and on OS X. That whitelist contains `/private`, but not `/var`. It does not expand the path, and so any paths in `$TMPDIR` fail the whitelist check. And thusly, by expanding the alloc/state directories the value passed for binding does contain `/private` and Docker For Mac is happy.	2016-10-12 17:06:25 -07:00
Michael Schurter	6dea6df919	Restore lost chan inits	2016-10-03 14:56:50 -07:00
Diptanu Choudhury	d50c395421	Getting snapshot of allocation from remote node (#1741 ) * Added the alloc dir move * Moving allocdirs when starting allocations * Added the migrate flag to ephemeral disk * Stopping migration if the allocation doesn't need migration any more * Added the GetAllocDir method * refactored code * Added a test for alloc runner * Incorporated review comments	2016-10-03 09:59:57 -07:00
Michael Schurter	b117725dc9	Only log consul errors once since last succesful run	2016-09-28 17:18:45 -07:00
Michael Schurter	d486de3804	Remove unused const	2016-09-27 16:04:01 -07:00
Michael Schurter	2e696c5e61	Fix lies found in comments by fact checkers	2016-09-26 16:51:53 -07:00
Michael Schurter	11cf9686a6	No need to put reaper ticker on the struct	2016-09-26 16:15:19 -07:00
Michael Schurter	2eb0062959	Drop clumsy timeout on discovery notifications It's better to just let goroutines fallback to their longer retry intervals then try to be clever here.	2016-09-26 16:05:21 -07:00
Michael Schurter	307e674eca	Flip disco chan; clarify method names/comments	2016-09-26 15:52:40 -07:00
Michael Schurter	888ee21270	Return csv of servers from Stats, not just count	2016-09-26 15:40:26 -07:00
Michael Schurter	7dc0079dd2	doDisco -> triggerDiscoveryCh; discovered -> serversDiscoveredCh Also fix log line formatting	2016-09-26 15:21:28 -07:00
Michael Schurter	434e4be97c	noServers -> noServersErr	2016-09-26 15:12:35 -07:00
Michael Schurter	b2ddb85a78	consul -> Consul	2016-09-26 15:06:57 -07:00
Michael Schurter	37cfb2769c	Replace periodic handlers with event driven disco Remove use of periodic consul handlers in the client and just use goroutines. Consul Discovery is now triggered with a chan instead of using a timer and deadline to trigger. Once discovery is complete a chan is ticked so all goroutines waiting for servers will run. Should speed up bootstraping and recovery while decreasing spinning on timers.	2016-09-23 17:02:48 -07:00
Michael Schurter	2ab5264595	Retry all servers on RPC call failure rpcproxy is refactored into serverlist which prioritizes good servers over servers in a remote DC or who have had a failure. Registration, heartbeating, and alloc status updating will retry faster when new servers are discovered. Consul discovery will be retried more quickly when no servers are available (eg on startup or an outage).	2016-09-23 11:44:48 -07:00
Alex Dadgar	50efdb00e9	Merge pull request #1713 from hashicorp/f-alloc-runner-vault Vault integration in client	2016-09-20 16:15:55 -07:00
Alex Dadgar	64de46432a	Merge pull request #1677 from hashicorp/f-vault-implicit-constraint Vault implicit Task Group constraint + allow root tokens	2016-09-20 16:15:32 -07:00
Alex Dadgar	ec152a6d12	Clean up vault client	2016-09-14 18:10:56 -07:00
Alex Dadgar	6702a29071	Vault token threaded	2016-09-14 13:30:01 -07:00
Robert Neumayer	8dc19dbd10	Log adding of servers at INFO level	2016-09-14 22:24:17 +02:00
Alex Dadgar	2c8dd8bbd3	Revert "Introduce a Secret/ directory"	2016-09-01 17:23:15 -07:00
Alex Dadgar	b0adaa5301	Allow root token	2016-09-01 12:05:08 -07:00
Alex Dadgar	1ed454dd60	Merge pull request #1671 from hashicorp/f-secret-dir2 Introduce a Secret/ directory	2016-09-01 09:56:17 -07:00
Alex Dadgar	9fa23e3536	Symlink on windows	2016-08-31 21:41:44 -07:00
Alex Dadgar	5d3b47e648	Address comments and reserve	2016-08-31 18:11:02 -07:00
vishalnayak	55a6f06e15	Addressed review feedback	2016-08-30 13:08:13 -04:00
vishalnayak	3808dd0ff8	Return only fatal error to renewal error channel	2016-08-30 12:46:59 -04:00
vishalnayak	a0dbfe25b3	Fix tests	2016-08-29 21:30:06 -04:00
vishalnayak	82f6209e97	tokenDeriver function pointer to derive tokens. Remove rpc*, connPool, node and region from vaultclient.	2016-08-29 20:32:05 -04:00
Alex Dadgar	14b7126511	Secret dir, hello world	2016-08-29 15:41:52 -07:00
vishalnayak	56e42cf03d	Employ DeriveVaultToken API and flesh-up DeriveToken	2016-08-24 12:29:59 -04:00
vishalnayak	6002e596c4	VaultClient for Nomad Client	2016-08-24 09:43:45 -04:00
Diptanu Choudhury	1e1eef56a1	Putting the mock driver behind a build flag	2016-08-22 15:02:28 -05:00
Diptanu Choudhury	4ca623bcfe	blocking chained allocations until previous allocation hasn't terminated	2016-08-22 11:34:24 -05:00

1 2 3 4 5 ...

472 commits