open-nomad

Author	SHA1	Message	Date
Preetha Appan	9618f52746	Remove error wrapping and make vault connection server side errors clearer.	2018-03-13 17:09:03 -05:00
Alex Dadgar	4844317cc2	Merge pull request #3890 from hashicorp/b-heartbeat Heartbeat improvements and handling failures during establishing leadership	2018-03-12 14:41:59 -07:00
Josh Soref	173ce63fe9	spelling: transition	2018-03-11 19:06:05 +00:00
Josh Soref	782c704de6	spelling: thresholds	2018-03-11 19:03:47 +00:00
Josh Soref	8149694f3a	spelling: server	2018-03-11 18:55:30 +00:00
Josh Soref	258d76ec13	spelling: registry	2018-03-11 18:41:13 +00:00
Josh Soref	3c1ce6d16d	spelling: otherwise	2018-03-11 18:34:27 +00:00
Josh Soref	1ef6d6319e	spelling: labels	2018-03-11 18:21:44 +00:00
Josh Soref	52b83328fc	spelling: heartbeating	2018-03-11 18:12:19 +00:00
Josh Soref	c9b86bbc2f	spelling: controls	2018-03-11 17:50:39 +00:00
Josh Soref	e78cf9c81a	spelling: already	2018-03-11 17:39:04 +00:00
Josh Soref	b8b46d3f74	spelling: allocation	2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo	122d1c4e4a	simplify retry logic	2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo	355805db56	reset timer after updating node copy	2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo	a72aaaf47f	add network resources equal method, use time ticker remove impossible test case	2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo	e736e31820	use time ticker, update how network resources are compared	2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo	5059065b52	improved testing; node networks comparison	2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo	1f31b39fe8	code review fixups	2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo	ed8c8afbcd	edge trigger node update test update config copy trigger	2018-02-26 12:36:04 -05:00
Alex Dadgar	49a47483d1	Registering back to initializing Fix a bug in which if the node attributes/meta changed, we would re-register the node in status initializing. This would incorrectly trigger the client to log that it missed its heartbeat. It would change the status of the Node to initializing until the next heartbeat occured.	2018-02-16 17:49:31 -08:00
Alex Dadgar	eff4455c68	Fix original client server list behavior	2018-02-15 16:04:53 -08:00
Alex Dadgar	f9cf642436	Client tls	2018-02-15 15:22:57 -08:00
Alex Dadgar	e685211892	Code review feedback	2018-02-15 13:59:02 -08:00
Alex Dadgar	2c0ad26374	New RPC Modes and basic setup for streaming RPC handlers	2018-02-15 13:59:01 -08:00
Alex Dadgar	9bc75f0ad4	Fix manager tests and make testagent recover from port conflicts	2018-02-15 13:59:01 -08:00
Alex Dadgar	3f1f8604bb	initial round of comment review	2018-02-15 13:59:01 -08:00
Alex Dadgar	c8c1284bc3	SetServer command actually returns an error if given an invalid server	2018-02-15 13:59:01 -08:00
Alex Dadgar	3f786b904b	use server manager	2018-02-15 13:59:01 -08:00
Alex Dadgar	6dd1c9f49d	Refactor	2018-02-15 13:59:00 -08:00
Alex Dadgar	1472b943d6	Stats Endpoint	2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo	4a26959825	code review feedback	2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo	d626d24488	remove dependency on client for fingerprint manager	2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo	e012e5ab8a	add fingerprint manager	2018-02-07 18:10:33 -05:00
Chelsea Holland Komlo	b21233fe23	update log message	2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo	6f9c0ab361	req/resp should be within config locks; rename for detected fingerprints changelog	2018-02-01 19:00:39 -05:00
Chelsea Holland Komlo	b8e8064835	code review fixup	2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo	7b53474a6e	add applicable boolean to fingerprint response public fields and remove getter functions	2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo	9482c322b7	locks for fingerprint reads/writes	2018-01-30 11:32:45 -05:00
Chelsea Holland Komlo	7c19de797c	create safe getters and setters for fingerprint response	2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo	896d6f8058	fixups from code review	2018-01-26 07:04:32 -05:00
Chelsea Holland Komlo	9a8344333b	refactor Fingerprint to request/response construct	2018-01-24 11:54:02 -05:00
Chelsea Holland Komlo	649f86f094	refactor creating a new tls configuration	2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo	6c9f9c8ac3	adding additional test assertions; differentiate reloading agent and http server	2018-01-16 07:34:39 -05:00
Chelsea Holland Komlo	214d128eb9	reload raft transport layer fix up linting	2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo	0708d34135	call reload on agent, client, and server separately	2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo	9741097406	reloading tls config should be atomic for clients/servers	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	ae7fc4695e	fixups from code review Revert "close raft long-lived connections" This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625. reload raft connections on changing tls	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	acd3d1b162	fix up downgrading client to plaintext add locks around changing server configuration	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	c0ad9a4627	add ability to upgrade/downgrade nomad agents tls configurations via sighup	2018-01-08 09:21:06 -05:00
Alex Dadgar	91ffbbb517	Review feedback	2017-12-07 16:10:57 -08:00
Alex Dadgar	02baa6c52b	Handle race between fingerprinters and registration	2017-12-07 13:09:37 -08:00
Alex Dadgar	4409fdacc0	Drop trace logging	2017-12-06 18:02:24 -08:00
Alex Dadgar	cd9a7f14b8	Add logging around heartbeats	2017-12-06 17:57:50 -08:00
Chelsea Komlo	2dfda33703	Nomad agent reload TLS configuration on SIGHUP (#3479 ) * Allow server TLS configuration to be reloaded via SIGHUP * dynamic tls reloading for nomad agents * code cleanup and refactoring * ensure keyloader is initialized, add comments * allow downgrading from TLS * initalize keyloader if necessary * integration test for tls reload * fix up test to assert success on reloaded TLS configuration * failure in loading a new TLS config should remain at current Reload only the config if agent is already using TLS * reload agent configuration before specific server/client lock keyloader before loading/caching a new certificate * introduce a get-or-set method for keyloader * fixups from code review * fix up linting errors * fixups from code review * add lock for config updates; improve copy of tls config * GetCertificate only reloads certificates dynamically for the server * config updates/copies should be on agent * improve http integration test * simplify agent reloading storing a local copy of config * reuse the same keyloader when reloading * Test that server and client get reloaded but keep keyloader * Keyloader exposes GetClientCertificate as well for outgoing connections * Fix spelling * correct changelog style	2017-11-14 17:53:23 -08:00
Michael Schurter	1769db98b7	Fix regression by returning error on unknown alloc	2017-11-01 15:16:38 -05:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00
Alex Dadgar	d306da846c	changelog and feedback	2017-09-14 14:08:58 -07:00
Alex Dadgar	07ed83fdd5	Non-locked accessors to common Node fields This PR removes locking around commonly accessed node attributes that do not need to be locked. The locking could cause nodes to TTL as the heartbeat code path was acquiring a lock that could be held for an excessively long time. An example of this is when Vault is inaccessible, since the fingerprint is run with a lock held but the Vault fingerprinter makes the API calls with a large timeout. Fixes https://github.com/hashicorp/nomad/issues/2689	2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo	848af92183	fix panic in emitting tagged metrics	2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo	0ef43c3c5f	final code review fixups	2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo	a8cbd0b559	fixups from code review	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f72e4aad13	labels depend on full setup of client beforehand	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	87a814397d	refactor to use baseLabels	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	b2953d905a	pass in commonly used values	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	c634043069	create base labels to be used in every metric	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f5ea83da8d	emit metrics using labels, add option for backwards compatibility	2017-09-05 14:12:57 +00:00
Armon Dadgar	76a03f2d8e	Address @dadgar feedback	2017-09-04 13:05:53 -07:00
Armon Dadgar	688897561b	client: adding token cache for ACL resolution	2017-09-04 13:05:36 -07:00
Armon Dadgar	c2e72e8a9c	client: create ACL and Policy cache	2017-09-04 13:05:35 -07:00
Michael Schurter	7342e23669	Move migrating state into prevAllocWatcher	2017-08-14 16:02:28 -07:00
Michael Schurter	e41a654917	switch from alloc blocker to new interface interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block	2017-08-11 16:21:35 -07:00
Michael Schurter	ee04717a0b	initial attempt at refactoring blocked/migrating	2017-08-11 16:21:35 -07:00
Alex Dadgar	ecee5e370e	initial watcher	2017-07-07 12:07:08 -07:00
Michael Schurter	644f0cfaa4	Consistently quote alloc ids in client logs	2017-07-06 10:24:52 -07:00
Michael Schurter	4fd9ef6a8c	Tiny client race condition fix Plus some logging improvements that may help with #2563	2017-07-05 16:15:19 -07:00
Michael Schurter	596727230b	Suggest wiping out alloc dir too	2017-07-03 12:29:21 -07:00
Michael Schurter	11f68bfca2	Add more logging to restore state errors	2017-07-03 11:58:41 -07:00
Mark Mickan	c196d320f8	Add tests for migrating symlinks in alloc and local directories	2017-06-04 15:56:22 +09:30
Mark Mickan	236f24c9a4	Include symlinks in snapshots when migrating disks Fixes #2685	2017-06-04 00:36:18 +09:30
Alex Dadgar	b1eea2269a	Fix deadlock	2017-05-31 14:05:47 -07:00
Michael Schurter	ffc2b36dc7	Merge pull request #2636 from hashicorp/f-gc-alloc-limit Add new gc_max_allocs tuneable	2017-05-30 16:14:09 -07:00
Michael Schurter	dd51aa1cb9	Merge pull request #2654 from hashicorp/f-env-consul Add envconsul-like support and refactor environment handling	2017-05-30 14:40:14 -07:00
Alex Dadgar	28aef447e9	Fix perms to just set exec bit	2017-05-25 14:44:13 -07:00
Michael Schurter	fd9bef768f	Move task env into execcontext Also inject PATH into rkt commands since we're no longer appending host env vars for it.	2017-05-23 13:53:34 -07:00
Michael Schurter	3841692138	gc_max_allocs should include blocked & migrating	2017-05-12 16:03:22 -07:00
Michael Schurter	0453c2709c	Add new gc_max_allocs tuneable More than gc_max_allocs may be running on a node, but terminal allocs will be garbage collected to try to keep the total number below the limit.	2017-05-11 17:18:02 -07:00
Alex Dadgar	68c3a2bd98	Fix vet errors	2017-05-11 13:08:08 -07:00
Alex Dadgar	843bc26e5d	Respond to comments	2017-05-09 10:50:24 -07:00
Alex Dadgar	e00f9c9413	Restore state + upgrade path	2017-05-02 18:21:49 -07:00
Alex Dadgar	ec101b4760	Revert "metrics" This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.	2017-05-02 09:28:11 -07:00
Alex Dadgar	8e516b5dc2	Async and sync saving of client state	2017-05-01 16:16:53 -07:00
Alex Dadgar	a7fd08d42a	perf	2017-05-01 16:01:50 -07:00
Alex Dadgar	e010fdf8c0	metrics	2017-05-01 14:51:27 -07:00
Alex Dadgar	b94f855326	boltDB database for client state	2017-05-01 14:50:34 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00
Alex Dadgar	2321e8a4a0	Hash host ID so its stable and well distributed This PR takes the host ID and runs it through a hash so that it is well distributed. This makes it so that machines that report similar host IDs are easily distinguished. Instances of similar IDs occur on EC2 where the ID is prefixed and on motherboards created in the same batch. Fixes https://github.com/hashicorp/nomad/issues/2534	2017-04-10 11:44:51 -07:00
Alex Dadgar	81b78f77e1	Track task start/finish time & improve logs errors This PR adds tracking to when a task starts and finishes and the logs API takes advantage of this and returns better errors when asking for logs that do not exist.	2017-03-31 16:14:11 -07:00
Alex Dadgar	5e7e19de4b	Merge pull request #2461 from hashicorp/b-groups Various fixes for setting user/group of task	2017-03-28 11:13:27 -07:00
Alex Dadgar	4ecebe7d8c	Proper reference counting through task restarts This PR fixes an issue in which the reference count on a Docker image would become inflated through task restarts.	2017-03-25 17:05:53 -07:00
Alex Dadgar	a171a014b3	Various fixes for setting user/group of task This PR fixes two issues: * Folder permissions in -dev mode were incorrect and not suitable for running as a particular user. * Was not setting the group membership properly for the launched process. Fixes https://github.com/hashicorp/nomad/issues/2160	2017-03-20 14:21:13 -07:00
Alex Dadgar	70e4feb045	Limit parallelism during garbage collection This PR introduces a parallelism limit during garbage collection. This is used to avoid large resource usage spikes if garbage collecting many allocations at once.	2017-03-10 16:27:00 -08:00
Alex Dadgar	9011a7984c	Add metrics to show allocations on the client This PR adds the following metrics to the client: client.allocations.migrating client.allocations.blocked client.allocations.pending client.allocations.running client.allocations.terminal Also adds some missing fields to the API version of the evaluation.	2017-03-09 12:37:41 -08:00
Alex Dadgar	5be806a3df	Fix vet script and fix vet problems This PR fixes our vet script and fixes all the missed vet changes. It also fixes pointers being printed in `nomad stop <job>` and `nomad node-status <node>`.	2017-02-27 16:00:19 -08:00
Alex Dadgar	6910678c21	Allow random UUID	2017-02-27 13:42:37 -08:00
Alex Dadgar	7203dee7ab	Add allocated/unallocated metrics to client	2017-02-16 18:28:11 -08:00
Sean Chittenden	c4c321c770	Unconditionally lowercase the node ID read from disk.	2017-02-06 16:20:17 -08:00
Sean Chittenden	adb5be23ef	Add better verification of a host's HostID.	2017-02-02 16:24:32 -08:00
Sean Chittenden	bb4347e277	Slight mis-merge: secret-id in dev mode is random and needs to be returned.	2017-02-01 22:20:52 -08:00
Sean Chittenden	bb422a2258	Generate a durable NodeID if possible, otherwise fall back to a random HostID.	2017-02-01 22:11:33 -08:00
Diptanu Choudhury	11d7cb1230	Making the GC related fields tunable	2017-01-31 15:51:20 -08:00
Diptanu Choudhury	84a491f85a	Locking appropriately before closing the channel to indicate migration	2017-01-23 10:46:57 -08:00
Michael Schurter	054ee8df59	Fix index we get allocs by	2017-01-20 16:30:40 -08:00
Diptanu Choudhury	1999b7eebb	Merge pull request #2159 from hashicorp/b-consul-config Fixed merging consul config	2017-01-18 16:14:54 -08:00
Diptanu Choudhury	e927de02d2	Moved functions to helper from structs	2017-01-18 15:55:14 -08:00
Alex Dadgar	5d2b56b387	Random wait	2017-01-11 13:24:23 -08:00
Alex Dadgar	c19985244a	GetAllocs uses a blocking query This PR makes GetAllocs use a blocking query as well as adding a sanity check to the clients watchAllocation code to ensure it gets the correct allocations. This PR fixes https://github.com/hashicorp/nomad/issues/2119 and https://github.com/hashicorp/nomad/issues/2153. The issue was that the client was talking to two different servers, one to check which allocations to pull and the other to pull those allocations. However the latter call was not with a blocking query and thus the client would not retreive the allocations it requested. The logging has been improved to make the problem more clear as well.	2017-01-10 13:30:35 -08:00
Michael Schurter	86fcf96f72	Put a logger in AllocDir/TaskDir	2017-01-05 16:31:56 -08:00
Diptanu Choudhury	247bda9a88	Unlocking if we return before adding a new alloc runner	2017-01-05 13:18:48 -08:00
Diptanu Choudhury	9721a1ab04	Fixed how alloc lock is held	2017-01-05 13:06:56 -08:00
Michael Schurter	13064768ac	Fix race when shutting down in dev mode Client.Shutdown holds the allocLock when destroying alloc runners in dev mode. Client.updateAllocStatus can be called during AllocRunner shutdown and calls getAllocRunners which tries to acquire allocLock.RLock. This deadlocks since Client.Shutdown already has the write lock. Switching Client.Shutdown to use getAllocRunners and not hold a lock during AllocRunner shutdown is the solution.	2017-01-03 17:21:50 -08:00
Michael Schurter	4a9a574d9d	Merge pull request #2054 from hashicorp/f-prestart Add Driver.Prestart method	2016-12-20 16:18:56 -08:00
Diptanu Choudhury	b6120e2fc8	Removing the alloc runner from GC if it is destroyed by the server	2016-12-20 11:14:22 -08:00
Diptanu Choudhury	6e6e0d364a	Added comments	2016-12-20 10:49:48 -08:00
Diptanu Choudhury	36b5545d6b	Making the gc allocator understand real disk usage	2016-12-16 18:34:59 -08:00
Diptanu Choudhury	7aef9bcabe	Added the stats collector to GC	2016-12-14 15:11:11 -08:00
Diptanu Choudhury	e855cd587b	Refactored hoststats collector	2016-12-14 15:07:42 -08:00
Diptanu Choudhury	0ffd92668d	GC-ing before we start a new allocation	2016-12-14 15:04:06 -08:00
Diptanu Choudhury	afdaa979f7	Added a garbage collector for allocations	2016-12-14 15:01:12 -08:00
Alex Dadgar	648ad2ebc5	Merge pull request #2096 from hashicorp/b-addAlloc Fix race and remove panic	2016-12-13 13:50:17 -08:00
Diptanu Choudhury	53fb09023c	cancelling waiting for remote allocation if the alloc doesn't need migration	2016-12-13 13:06:33 -08:00
Alex Dadgar	3cbd237512	Fix race and remove panic	2016-12-13 12:34:23 -08:00
Christoffer Kylvåg	6a1f32b8ba	#1680 : Continue after not being able to stat a mountpoint	2016-12-13 12:28:57 +01:00
Diptanu Choudhury	cbf73908ff	Setting the appropriate file permissions which un-archiving compressed alloc dir	2016-12-05 17:04:43 -08:00
Diptanu Choudhury	bc17cacca0	Merge pull request #2017 from hashicorp/b-sticky Not moving alloc data when sticky is turned off	2016-12-05 14:11:45 -08:00
Diptanu Choudhury	21f49564d3	Not moving alloc data when sticky is turned off	2016-12-05 14:00:01 -08:00
Michael Schurter	770ed703d0	Add Driver.Prestart method The Driver.Prestart method currently does very little but lays the foundation for where lifecycle plugins can interleave execution _after_ task environment setup but _before_ the task starts. Currently Prestart does two things: * Any driver specific task environment building * Download Docker images This change also attaches a TaskEvent emitter to Drivers, so they can emit events during task initialization.	2016-12-02 11:03:48 -08:00
Alex Dadgar	86ed1fb2e5	Disallow stale queries when deriving Vault tokens This PR disallows stale queries when deriving a Vault token. Allowing stale queries could result in the allocation not existing on the server that is servicing the request.	2016-12-01 11:13:36 -08:00
Alex Dadgar	ec4d6936ff	add debug panic	2016-11-29 15:57:40 -08:00
Diptanu Choudhury	f67217297c	Ensuring allocs are not added multiple times to blocking queue	2016-11-29 11:19:37 -08:00
Alex Dadgar	88c7e04348	Check for Ephemeral Disk being nil	2016-11-15 10:03:06 -08:00
Alex Dadgar	ee921ccbb2	Merge pull request #1949 from carlpett/blacklist-fingerprints-and-drivers Support blacklisting fingerprinters	2016-11-09 10:31:17 -08:00
Calle Pettersson	4304755c12	Address comments from PR	2016-11-09 11:50:16 +01:00
Calle Pettersson	8632696e2d	Add blacklisting of drivers	2016-11-08 18:30:07 +01:00
Calle Pettersson	b603bb007e	Add blacklisting of fingerprinters	2016-11-08 18:29:44 +01:00
Alex Dadgar	9015e79aaa	Add compatibility code for secret ID while upgrading cluster in both server/client mode on single nodes	2016-11-07 16:52:08 -08:00
Diptanu Choudhury	1a8fa8c8d5	Making Nomad TLS configs region aware	2016-11-01 11:55:29 -07:00
Diptanu Choudhury	4079545a92	Making the client use tls if the node from which migration has to be made has enabled tls	2016-10-31 10:20:04 -07:00
Michael Schurter	cc115fe984	Swap log line classifiers to be consistent	2016-10-28 14:59:48 -07:00
Diptanu Choudhury	3182d0454f	Adding the alloc if we can't find the TG	2016-10-27 15:45:10 -07:00
Diptanu Choudhury	0682a1a113	Not blocking for remote alloc if the alloc is not sticky	2016-10-27 12:04:55 -07:00
Alex Dadgar	150b678a6b	Merge pull request #1806 from hashicorp/f-docker4mac-fixes A couple fixes to make Docker For Mac work	2016-10-27 09:29:40 -07:00
Diptanu Choudhury	50ca5e1e9d	Merge pull request #1853 from hashicorp/f-rpc-http-tls TLS support for http and RPC	2016-10-25 16:14:43 -07:00
Diptanu Choudhury	7c61e115bd	Moved tlsutil into helpers	2016-10-25 16:05:37 -07:00
Diptanu Choudhury	353e7fc7f1	Moving the certs into tlsutil package	2016-10-25 16:01:53 -07:00
Diptanu Choudhury	cf35aeac84	Moving the TLSConfig to structs	2016-10-25 15:57:38 -07:00
Alex Dadgar	03eba049ed	Merge pull request #1848 from hashicorp/f-vault-error Thread through whether DeriveToken error is recoverable or not	2016-10-24 15:01:18 -07:00
Alex Dadgar	692a809919	Merge pull request #1842 from hashicorp/f-version-and-id Print the version and client node ID	2016-10-24 10:13:33 -07:00
Diptanu Choudhury	2e3118e69c	Implemented TLS support for http and rpc	2016-10-23 22:22:00 -07:00
Alex Dadgar	ede3a814ba	Small fixes	2016-10-22 18:20:50 -07:00
Alex Dadgar	0070178741	Thread through whether DeriveToken error is recoverable or not	2016-10-22 18:08:30 -07:00
Michael Schurter	285e80ac0f	Remove disk usage enforcement Many thanks to @iverberk for the original PR (#1609), but we ended up not wanting to ship this implementation with 0.5. We'll come back to it after 0.5 and hopefully find a way to leverage filesystem accounting and quotas, so we can skip the expensive polling.	2016-10-21 13:55:51 -07:00
Alex Dadgar	aa0d8d0d8d	Print the version and client node ID	2016-10-20 17:46:04 -07:00
Evan Phoenix	e7a98d5500	Make EvalSymlink errors more verbose	2016-10-12 17:07:21 -07:00
Evan Phoenix	f8a65a3b9d	Resolve alloc/state directories to make Docker For Mac happy * In -dev mode, `ioutil.TempDir` is used for the alloc and state directories. * `TempDir` uses `$TMPDIR`, which os OS X contains a per user directory which is under `/var/folder`. * `/var` is actually a symlink to `/private/var` * Docker For Mac validates the directories that are passed to bind and on OS X. That whitelist contains `/private`, but not `/var`. It does not expand the path, and so any paths in `$TMPDIR` fail the whitelist check. And thusly, by expanding the alloc/state directories the value passed for binding does contain `/private` and Docker For Mac is happy.	2016-10-12 17:06:25 -07:00
Michael Schurter	6dea6df919	Restore lost chan inits	2016-10-03 14:56:50 -07:00
Diptanu Choudhury	d50c395421	Getting snapshot of allocation from remote node (#1741 ) * Added the alloc dir move * Moving allocdirs when starting allocations * Added the migrate flag to ephemeral disk * Stopping migration if the allocation doesn't need migration any more * Added the GetAllocDir method * refactored code * Added a test for alloc runner * Incorporated review comments	2016-10-03 09:59:57 -07:00
Michael Schurter	b117725dc9	Only log consul errors once since last succesful run	2016-09-28 17:18:45 -07:00
Michael Schurter	d486de3804	Remove unused const	2016-09-27 16:04:01 -07:00
Michael Schurter	2e696c5e61	Fix lies found in comments by fact checkers	2016-09-26 16:51:53 -07:00
Michael Schurter	11cf9686a6	No need to put reaper ticker on the struct	2016-09-26 16:15:19 -07:00
Michael Schurter	2eb0062959	Drop clumsy timeout on discovery notifications It's better to just let goroutines fallback to their longer retry intervals then try to be clever here.	2016-09-26 16:05:21 -07:00
Michael Schurter	307e674eca	Flip disco chan; clarify method names/comments	2016-09-26 15:52:40 -07:00
Michael Schurter	888ee21270	Return csv of servers from Stats, not just count	2016-09-26 15:40:26 -07:00
Michael Schurter	7dc0079dd2	doDisco -> triggerDiscoveryCh; discovered -> serversDiscoveredCh Also fix log line formatting	2016-09-26 15:21:28 -07:00
Michael Schurter	434e4be97c	noServers -> noServersErr	2016-09-26 15:12:35 -07:00
Michael Schurter	b2ddb85a78	consul -> Consul	2016-09-26 15:06:57 -07:00
Michael Schurter	37cfb2769c	Replace periodic handlers with event driven disco Remove use of periodic consul handlers in the client and just use goroutines. Consul Discovery is now triggered with a chan instead of using a timer and deadline to trigger. Once discovery is complete a chan is ticked so all goroutines waiting for servers will run. Should speed up bootstraping and recovery while decreasing spinning on timers.	2016-09-23 17:02:48 -07:00
Michael Schurter	2ab5264595	Retry all servers on RPC call failure rpcproxy is refactored into serverlist which prioritizes good servers over servers in a remote DC or who have had a failure. Registration, heartbeating, and alloc status updating will retry faster when new servers are discovered. Consul discovery will be retried more quickly when no servers are available (eg on startup or an outage).	2016-09-23 11:44:48 -07:00
Alex Dadgar	50efdb00e9	Merge pull request #1713 from hashicorp/f-alloc-runner-vault Vault integration in client	2016-09-20 16:15:55 -07:00
Alex Dadgar	64de46432a	Merge pull request #1677 from hashicorp/f-vault-implicit-constraint Vault implicit Task Group constraint + allow root tokens	2016-09-20 16:15:32 -07:00
Alex Dadgar	ec152a6d12	Clean up vault client	2016-09-14 18:10:56 -07:00
Alex Dadgar	6702a29071	Vault token threaded	2016-09-14 13:30:01 -07:00
Robert Neumayer	8dc19dbd10	Log adding of servers at INFO level	2016-09-14 22:24:17 +02:00
Alex Dadgar	2c8dd8bbd3	Revert "Introduce a Secret/ directory"	2016-09-01 17:23:15 -07:00
Alex Dadgar	b0adaa5301	Allow root token	2016-09-01 12:05:08 -07:00
Alex Dadgar	1ed454dd60	Merge pull request #1671 from hashicorp/f-secret-dir2 Introduce a Secret/ directory	2016-09-01 09:56:17 -07:00
Alex Dadgar	9fa23e3536	Symlink on windows	2016-08-31 21:41:44 -07:00
Alex Dadgar	5d3b47e648	Address comments and reserve	2016-08-31 18:11:02 -07:00
vishalnayak	55a6f06e15	Addressed review feedback	2016-08-30 13:08:13 -04:00
vishalnayak	3808dd0ff8	Return only fatal error to renewal error channel	2016-08-30 12:46:59 -04:00
vishalnayak	a0dbfe25b3	Fix tests	2016-08-29 21:30:06 -04:00
vishalnayak	82f6209e97	tokenDeriver function pointer to derive tokens. Remove rpc*, connPool, node and region from vaultclient.	2016-08-29 20:32:05 -04:00
Alex Dadgar	14b7126511	Secret dir, hello world	2016-08-29 15:41:52 -07:00
vishalnayak	56e42cf03d	Employ DeriveVaultToken API and flesh-up DeriveToken	2016-08-24 12:29:59 -04:00
vishalnayak	6002e596c4	VaultClient for Nomad Client	2016-08-24 09:43:45 -04:00
Diptanu Choudhury	1e1eef56a1	Putting the mock driver behind a build flag	2016-08-22 15:02:28 -05:00
Diptanu Choudhury	4ca623bcfe	blocking chained allocations until previous allocation hasn't terminated	2016-08-22 11:34:24 -05:00
Alex Dadgar	a90dafe9ab	handle the upgrade case	2016-08-18 19:01:24 -07:00
Alex Dadgar	895c31f605	Nodes generate Secret ID and used for retrieving allocations and registering	2016-08-17 16:31:47 -07:00
Alex Dadgar	84820db86f	If the client detects that a heartbeat has failed because it is not registered, reregister	2016-08-15 17:24:09 -07:00
Diptanu Choudhury	28b3f511e0	Fixed some error messages	2016-08-10 15:17:32 -07:00
Kenjiro Nakayama	6a810e6f1e	Update after review	2016-08-09 08:57:26 +09:00
Kenjiro Nakayama	5c621b74e5	tiny: Return fmt.Errorf instead of duplicated error messages	2016-08-09 08:57:26 +09:00
Diptanu Choudhury	41b540fbc8	Allow operators to opt into publishing node and alloc metrics	2016-08-01 19:52:20 -07:00
Cameron Davison	777bdf4a1e	fix setup consul syncer error message	2016-07-28 22:14:52 -05:00
Alex Dadgar	ebac5cb283	Node.Register handles the case of transistioning to ready and creating evals	2016-07-21 15:22:02 -07:00
Diptanu Choudhury	5b39a5db40	Fixed a debug message	2016-07-09 00:12:53 -07:00
Sean Chittenden	03c571c61b	Consolidate fingerprinters into a single `map`.	2016-07-08 23:37:14 -07:00
Sean Chittenden	8bdb38d016	Code golf Pointed out by: @dadgar	2016-06-21 14:26:01 -07:00
Sean Chittenden	df4fe2e502	Fix the shuffling of remote datacenters. Pointed out by: @ryanuber	2016-06-21 13:37:22 -07:00
Sean Chittenden	9a60999100	Pass a logger arg to `NewClient` and `NewServer`	2016-06-16 23:29:23 -07:00
Sean Chittenden	fd18eb7fdb	Only register the Client services reaper when `consul.auto_advertise` is enabled	2016-06-16 18:24:58 -07:00
Sean Chittenden	952b6ce7b5	Only auto-join clients if `client_auto_join` is true	2016-06-16 14:47:21 -07:00
Sean Chittenden	af55b74114	Merge pull request #1276 from hashicorp/f-consul-server-autojoin Teach Nomad servers how to fall back to Consul.	2016-06-16 14:40:45 -07:00
Sean Chittenden	008d75184b	Use the `%+q` verb in log messages (vs `%q`).	2016-06-16 11:03:51 -07:00
Alex Dadgar	7375d828e1	remove trace	2016-06-15 15:47:59 -07:00
Sean Chittenden	5e0ced2ae7	Shuffle all datacenters vs only the nearest N datacenters. Per discussion, we want to be aggressive about fanning out vs possibly fixating on only local DCs. With RPC forwarding in place, a random walk may be less optimal from a network latency perspective, but it is guaranteed to eventually result in a converged state because all DCs are candidates during the bootstrapping process.	2016-06-15 12:40:51 -07:00
Sean Chittenden	2123460cf0	Bump various Consul search limits Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest. Server: Search factor increased from 3 to 5 times the bootstrap_expect. This should allow for faster convergence in large environments (e.g. sub-5min for 10K Consul DCs).	2016-06-15 12:40:51 -07:00
Alex Dadgar	cf99fc3173	Use Status.Peers instead of Status.Ping	2016-06-15 12:00:20 -07:00
Alex Dadgar	4b04e503f3	address comments	2016-06-13 17:32:18 -07:00
Alex Dadgar	8bbf4a55e5	Fix IDs and domain scoping	2016-06-13 16:30:58 -07:00
Diptanu Choudhury	d019d8ef8e	implemented reconciliation of unwanted services	2016-06-13 14:52:26 +02:00
Alex Dadgar	a82c2bb058	Do not reconcile in client and cleanup executor a bit	2016-06-12 18:22:07 -07:00
Alex Dadgar	8e231fa382	Rename ConsulService back to Service	2016-06-12 16:36:49 -07:00
Alex Dadgar	fdda90229f	only support latest and remove ring buffer	2016-06-12 09:32:38 -07:00
Alex Dadgar	e952540f6f	Allocation resources returned in a struct	2016-06-11 21:04:10 -07:00
Sean Chittenden	2f036231e5	Merge pull request #1201 from hashicorp/f-dyn-server-list Dynamic Server Lists/Client Bootstrapping via consul.	2016-06-11 18:58:25 -04:00
Sean Chittenden	92e2cfb0ad	Walk the DCs from nearest to most remote.	2016-06-11 18:52:21 -04:00
Sean Chittenden	2968545201	Walk the DCs from nearest to most remote, no limit on the search.	2016-06-11 18:23:06 -04:00
Sean Chittenden	917766a3df	Prefer `%+q` over `%q` in log messages.	2016-06-11 18:17:20 -04:00
Diptanu Choudhury	fd60cfd585	Emitting client resource usage metrics as guages instead of k/v pairs	2016-06-11 22:17:32 +02:00
Sean Chittenden	bbd8dfa798	goling(1) compliance pass (e.g. Rpc* -> RPC)	2016-06-10 23:38:28 -04:00
Sean Chittenden	bc771d35df	Query for the Nomad service across multiple Consul datacenters.	2016-06-10 23:05:14 -04:00
Sean Chittenden	26b1e826d7	golint(1) police	2016-06-10 15:54:39 -04:00
Sean Chittenden	f139d0c68b	Properly guard consulPullHeartbeatDeadline behind heartbeatLock	2016-06-10 15:54:39 -04:00
Sean Chittenden	ed29946f5e	Populate the RPC Proxy's server list if heartbeat did not include a leader. It's possible that a Nomad Client is heartbeating with a Nomad server that has become issolated from the quorum of Nomad Servers. When 3x the heartbeatTTL has been exceeded, append the Consul server list to the primary primary server list. When the next RPCProxy rebalance occurs, there is a chance one of the servers discovered from Consul will be in the majority. When client reattaches to a Nomad Server in the majority, it will include a heartbeat and will reset the TTLs AND will clear the primary server list to include only values from the heartbeat.	2016-06-10 15:54:39 -04:00
Sean Chittenden	9a223936bb	Generate and sync Consul ServiceIDs consistently	2016-06-10 15:54:39 -04:00
Sean Chittenden	7956eb0c80	Rename structs.Task's `Service` attribute to `ConsulService`	2016-06-10 15:54:39 -04:00
Sean Chittenden	8c813630e6	Move package client/consul/sync to command/agent/consul. This has been done to allow the Server and Client to reuse the same Syncer because the Agent may be running Client, Server, or both simultaneously and we only want one Syncer object alive in the agent.	2016-06-10 15:54:39 -04:00
Sean Chittenden	fda03c5c9e	Change the signature of the PeriodicCallback to return an error I KNEW I should have done this when I wrote it, but didn't want to go back and audit the handlers to include the appropriate return handling, but now that the code is taking shape, make this change.	2016-06-10 15:54:39 -04:00
Sean Chittenden	555f4fe135	Change client/consul.NewSyncer() to accept a shutdown channel In addition to the API changing, consul.Syncer can now be signaled to shutdown via the Shutdown() method, which will call the Run()'ing sync task to exit gracefully.	2016-06-10 15:54:39 -04:00

... 3 4 5 6 7 ...

628 commits