open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	23c66e37c5	Handle errors during snapshotting If an alloc dir is being GC'd (removed) during snapshotting the walk func will be passed an error. Previously we didn't check for an error so a panic would occur when we'd try to use a nil `fileInfo`.	2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo	2208964948	Support StopTimeout for Docker tasksw Update github.com/fsouza/go-dockerclient	2017-11-29 14:33:05 -05:00
Preetha Appan	6ad65c51e6	Missed assert in one place	2017-11-20 13:04:38 -06:00
Preetha Appan	747bd59daa	Better error validation, and added test case for invalid sysctl inputs	2017-11-20 12:07:18 -06:00
Preetha Appan	c68973747b	Address some review comments	2017-11-20 11:15:09 -06:00
Preetha Appan	39ef9ee76d	Fix gofmt warnings	2017-11-18 09:23:09 -06:00
Preetha Appan	e53dd15f58	Fix test compilation after rebase	2017-11-17 17:46:04 -06:00
Samuel BERTHE	0fca2e19c8	review(docker driver): sysctls -> sysctl + ulimits -> ulimit	2017-11-17 16:30:45 -06:00
Samuel BERTHE	6c93922cb7	Oops	2017-11-17 16:14:14 -06:00
Samuel BERTHE	c8363bc44b	💄	2017-11-17 16:03:22 -06:00
Samuel BERTHE	281ab90484	test(docker driver): testing sysctls and ulimits	2017-11-17 16:03:22 -06:00
Samuel BERTHE	b9a10ff7fa	feat(docker driver): adds sysctls and ulimits configs	2017-11-17 16:03:22 -06:00
Alex Dadgar	69d3bf7392	Merge pull request #3559 from hashicorp/b-metrics Don't emit metrics for non-running tasks	2017-11-17 10:33:23 -08:00
Michael Schurter	3845c8d200	Merge pull request #3562 from hashicorp/b-3561-rkt-rm Remove rkt pods when exiting	2017-11-16 17:30:21 -08:00
Michael Schurter	737fb45640	Merge pull request #3551 from hashicorp/b-3419-docker-409-bug Fix Docker name conflict bug by updating dockerclient	2017-11-16 16:38:54 -08:00
Michael Schurter	437fce9954	Improve rktRemove error message	2017-11-16 15:45:14 -08:00
Michael Schurter	3ceec0caab	Remove rkt pods when exiting Fixes #3561	2017-11-16 14:33:44 -08:00
Charlie Voiselle	7a231897a5	Merge pull request #3556 from angrycub/f-fingerprint-log-level Dropped loglevel for AWS fingerprinter env read misses to DEBUG	2017-11-16 16:27:25 -05:00
Charlie Voiselle	969ddf9c2a	Lowered to DEBUG from AD feedback	2017-11-16 14:13:03 -05:00
Alex Dadgar	05b1588cea	Only publish metric when the task is running and dev mode publishes metrics	2017-11-15 13:21:06 -08:00
Alex Dadgar	07963f0b6d	Merge pull request #3546 from hashicorp/f-heuristic Better interface selection heuristic	2017-11-15 12:51:21 -08:00
Alex Dadgar	97ec3974a9	Use interface attached to default route	2017-11-15 11:32:32 -08:00
Michael Schurter	f86f0bd9ea	Handle leader task being dead in RestoreState Fixes the panic mentioned in https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932 While a leader task dying serially stops all follower tasks, the synchronizing of state is asynchrnous. Nomad can shutdown before all follower tasks have updated their state to dead thus saving the state necessary to hit this panic: have a non-terminal alloc with a dead leader. The actual fix is a simple nil check to not assume non-terminal allocs leader's have a TaskRunner.	2017-11-15 10:36:13 -08:00
Charlie Voiselle	1197637251	Dropped loglevel for AWS fingerprinter env reads Certain environments use WARN for serious logging; however, it's very possible to have machines without some of the fingerprinted keys (public-ipv4 and public-hostname specifcally). Setting log level to INFO seems more consistent with this possibility.	2017-11-15 18:20:59 +00:00
Chelsea Komlo	2dfda33703	Nomad agent reload TLS configuration on SIGHUP (#3479 ) * Allow server TLS configuration to be reloaded via SIGHUP * dynamic tls reloading for nomad agents * code cleanup and refactoring * ensure keyloader is initialized, add comments * allow downgrading from TLS * initalize keyloader if necessary * integration test for tls reload * fix up test to assert success on reloaded TLS configuration * failure in loading a new TLS config should remain at current Reload only the config if agent is already using TLS * reload agent configuration before specific server/client lock keyloader before loading/caching a new certificate * introduce a get-or-set method for keyloader * fixups from code review * fix up linting errors * fixups from code review * add lock for config updates; improve copy of tls config * GetCertificate only reloads certificates dynamically for the server * config updates/copies should be on agent * improve http integration test * simplify agent reloading storing a local copy of config * reuse the same keyloader when reloading * Test that server and client get reloaded but keep keyloader * Keyloader exposes GetClientCertificate as well for outgoing connections * Fix spelling * correct changelog style	2017-11-14 17:53:23 -08:00
Michael Schurter	3023336b39	Add a test demonstrating the bug Fails on Docker 17.09, passes on Docker 17.06 and earlier	2017-11-14 15:25:52 -08:00
Alex Dadgar	ee31e15f51	Better interface selection heuristic This PR introduces a better interface selection heuristic such that we select interfaces with globally routable unicast addresses over link local addresses. Fixes https://github.com/hashicorp/nomad/issues/3487	2017-11-13 15:13:43 -08:00
Preetha Appan	926c9ed997	Make device mounting unit test verify configuration via docker inspect	2017-11-13 09:56:54 -06:00
Preetha Appan	dc2d5fb5a4	Unit test (linux only) that tests mounting a device in the docker driver	2017-11-13 09:56:54 -06:00
Preetha Appan	4834710e45	Add default value for cgroup permissions for device if not set	2017-11-13 09:56:54 -06:00
Preetha Appan	9cdee6991c	Remove unnecessary check since validate method already checks this	2017-11-13 09:56:54 -06:00
Preetha Appan	110c1fd4f0	Add support for passing device into docker driver	2017-11-13 09:56:54 -06:00
Alex Dadgar	d1358ec1b6	alway load all templates	2017-11-10 12:35:51 -08:00
Alex Dadgar	a3ea0c17a0	Handle multiple environment templates Fixes https://github.com/hashicorp/nomad/issues/3498	2017-11-10 11:08:19 -08:00
Alex Dadgar	b3edc12dd9	Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown Qemu driver: graceful shutdown feature	2017-11-03 16:41:34 -07:00
Michael Schurter	690b8f4cfb	Remove noisy log line Didn't mean to commit this	2017-11-03 16:00:30 -07:00
Matt Mercer	11e2870875	Qemu driver: clean up logging; fail unsupported features on Windows	2017-11-03 15:40:20 -07:00
Alex Dadgar	6034916ad1	fix spelling mistake	2017-11-03 15:04:59 -07:00
Alex Dadgar	a23033932a	Merge pull request #3459 from multani/docker-oom-notification docker: log that a container has been killed by the OOM killer	2017-11-03 13:24:03 -07:00
Matt Mercer	cef9ba9770	Qemu driver: tweaks in response to PR feedback Remove attribute for long qemu monitor path; misc cleanup; update tests	2017-11-03 11:28:56 -07:00
Preetha Appan	0eaef09675	Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.	2017-11-03 10:10:06 -05:00
Preetha Appan	5f09c968b3	Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.	2017-11-03 09:13:01 -05:00
Alex Dadgar	b4af10edde	Alloc Runner doesn't panic on restoration.	2017-11-02 16:14:13 -07:00
Alex Dadgar	abd28cbd7d	Merge pull request #3493 from hashicorp/f-remove-atlas Remove Atlas and Scada from codebase	2017-11-02 16:00:44 -07:00
Michael Schurter	eedbe8efbb	Merge pull request #3490 from hashicorp/f-gc-logging Make unable-to-gc log level adaptive	2017-11-02 14:32:40 -07:00
Diptanu Choudhury	cb68889652	Added the node_id as a tag	2017-11-02 13:29:10 -07:00
Alex Dadgar	701f462d33	remove atlas	2017-11-02 11:27:21 -07:00
Michael Schurter	fc33c945be	Make unable-to-gc log level adaptive WARNing when someone has over 50 non-terminal allocs was just too confusing. Tested manually with `gc_max_allocs = 10` and bumping a job from `count = 19` to `count = 21`: ``` 2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations ... 2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations ```	2017-11-02 10:57:42 -07:00
Diptanu Choudhury	8a9d0d40b1	Added support for tagged metrics	2017-11-02 10:07:57 -07:00
Diptanu Choudhury	5f522c6de3	Incrementing the start counter when we are actually starting a container	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	44535e5d10	Recording counter for dead allocs properly	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	0b34e811b7	Added metrics to track task/alloc start/restarts/dead events	2017-11-02 09:51:20 -07:00
Matt Mercer	00f90323c2	Qemu driver: defer cleanup sooner	2017-11-01 17:37:43 -07:00
Matt Mercer	43256af5f3	Qemu driver: clean up test logging; retry integration test for longer	2017-11-01 17:21:56 -07:00
Matt Mercer	b1145705d3	Use strings.Replace() instead of custom function	2017-11-01 15:31:35 -07:00
Matt Mercer	d51d174fa0	Qemu driver: basic testing of graceful shutdown feature	2017-11-01 15:31:30 -07:00
Matt Mercer	c26013ea0b	Qemu driver: include PIDs in log output	2017-11-01 15:31:24 -07:00
Matt Mercer	38d9a391aa	Qemu driver: ensure proper cleanup of resources	2017-11-01 15:31:20 -07:00
Matt Mercer	46f7e2fa4c	Qemu driver: minor logging fixes	2017-11-01 15:31:14 -07:00
Matt Mercer	4afb9dfa2d	Standardize driver.qemu logging prefix	2017-11-01 15:30:44 -07:00
Matt Mercer	5127e75569	Qemu driver: add graceful shutdown feature	2017-11-01 15:30:36 -07:00
Michael Schurter	1769db98b7	Fix regression by returning error on unknown alloc	2017-11-01 15:16:38 -05:00
Michael Schurter	9f26b9a403	Fix race in test	2017-11-01 15:16:38 -05:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	c710550551	fix test	2017-10-30 12:35:31 -07:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Jonathan Ballet	5429d1c656	docker: changed OOM killed error message	2017-10-27 20:30:52 +02:00
Jonathan Ballet	12615bde9c	docker: log that a container has been killed by the OOM killer Fix: #2203 (at least for Docker tasks)	2017-10-27 18:05:27 +02:00
Alex Dadgar	f117eb28c7	go style vars	2017-10-25 10:49:34 -07:00
Alex Dadgar	3f8495dd0e	fix two flaky tests	2017-10-23 18:15:52 -07:00
Alex Dadgar	cb0d0ef009	move to consul freeport implementation	2017-10-23 16:51:40 -07:00
Alex Dadgar	dbc014b360	Standardize retrieving a free port into a helper package	2017-10-23 16:48:20 -07:00
Alex Dadgar	4a69e1ad15	don't double parallel	2017-10-23 16:48:06 -07:00
Alex Dadgar	96ca2bbe4c	respond to comments	2017-10-23 15:50:27 -07:00
Alex Dadgar	99c81b5848	Skip if no docker	2017-10-19 16:55:10 -07:00
Alex Dadgar	593536664e	fix flaky java tests	2017-10-19 16:49:57 -07:00
Alex Dadgar	4bc452b479	Undo darwin user setting	2017-10-19 16:49:57 -07:00
Alex Dadgar	c7c6964313	Run as user on mac	2017-10-19 16:49:57 -07:00
Alex Dadgar	55a1dffa2f	sudo docker works	2017-10-19 16:49:57 -07:00
Alex Dadgar	805e7b3b62	docker tests	2017-10-19 16:49:57 -07:00
Michael Schurter	797f49702e	Add logging around moby/moby#32648 bug	2017-10-18 10:44:03 -07:00
Michael Schurter	22ac450b2f	Properly fail rkt fingerprinting on old vesions	2017-10-16 13:58:58 -07:00
Michael Schurter	d7732c1a58	Squelch repeated rkt version warnings	2017-10-16 12:09:47 -07:00
Michael Schurter	b5fd075d74	Test fixes from #3383	2017-10-13 15:45:35 -07:00
Michael Schurter	b63eee17e9	Merge pull request #3383 from hashicorp/b-migrate-token base64 migrate token	2017-10-13 13:46:54 -07:00
Michael Schurter	dfd2967cdb	Merge pull request #3376 from hashicorp/f-node-acls Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc	2017-10-13 11:51:48 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Alex Dadgar	85178d6048	rkt remove allocid	2017-10-13 10:07:50 -07:00
Adam Stankiewicz	cefbc72b49	Remove AllocID from ExecutorContext	2017-10-13 17:07:49 +02:00
Michael Schurter	4a70d4356a	Alloc watcher must send Node.SecretID as AuthToken An auth token is required if ACLs are enabled	2017-10-12 16:38:02 -07:00
Michael Schurter	84d8a51be1	SecretID -> AuthToken	2017-10-12 15:16:33 -07:00
Michael Schurter	59ff94cd71	Don't panic on unexpeced Consul response Fixes #3326	2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	a77e462465	add tests for functionality	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Alex Dadgar	6d3d0a9391	Nomad UI Command	2017-10-09 23:01:55 -07:00
Michael Schurter	f788974f8a	Merge pull request #3288 from simar7/qemu-improvements qemu: Add bound checks for memory assignment	2017-10-02 14:47:05 -07:00
Simarpreet Singh	d801584c46	qemu: Fix lower memory bound to 128M Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 14:29:44 -07:00
Simarpreet Singh	10d7d6dab0	gofmt: format qemu.go and qemu_test.go Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 13:16:48 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Michael Schurter	77f1fe40e7	Properly autodetect Docker IP in Windows Our Docker network plugin autodetection code was erroneously treating Window's default network `nat` as a plugin and defaulting to it instead of the host. Fixes #3218	2017-09-27 16:49:23 -07:00
Michael Schurter	a8a87af7ed	Only build rkt driver on linux Build stub for non-linux targets	2017-09-27 14:21:45 -07:00
Simarpreet Singh	3d99e71de8	qemu: Add bound checks for memory assignment Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-09-26 21:07:48 -07:00
Michael Schurter	d7229ce6c5	Merge pull request #3256 from dalegaard/master Enable rkt driver to use address_mode = 'driver'	2017-09-26 18:04:37 -05:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Lasse Dalegaard	9f584d1114	Ignore rkt network failure if container died early If the container dies before the network can be read, we now ignore the error coming out of the network information polling loop. Nomad will restart the task regardless, so we might be masking the actual error. The polling loop for the rkt network information, inside the `Start` method, was getting a bit unwieldy. It's been refactored out so it's not a seperate function.	2017-09-27 00:15:27 +02:00
Lasse Dalegaard	b43ec57c02	Make rkt port mapping test not exit immediately The rkt port mapping test currently starts redis with --version, which obviously makes redis exit again almost immediately. This means that the container exists before the network status can be queried, and so the test fails.	2017-09-26 23:10:24 +02:00
Lasse Dalegaard	17d155d316	Improve rkt driver network status poll loop The network status poll loop will now report any networks it ignored, as well as a no-networks situations.	2017-09-26 21:49:45 +02:00
Lasse Dalegaard	bafd32fda0	Refactor rkt network status loop The network status poll loop for the rkt drivers `Start` method was a bit messy, and could not display the last encountered error. Here we clean it up.	2017-09-26 21:27:12 +02:00
Lasse Dalegaard	5e9e2b07bd	Small logging fix in rkt/driver	2017-09-26 19:36:13 +02:00
Lasse Dalegaard	3d25fd3b00	Bump minimum rkt version to 1.27.0. The changes introduces in #3256 require at least rkt 1.27.0 because of a bug in the JSON output of `rkt status` in previous versions. Here we upgrade all references to rkt's minimum version, and also make travis and vagrant use this version when running tests. Finally we add a CHANGELOG notice.	2017-09-26 19:15:43 +02:00
Lasse Dalegaard	f55f2b8f24	Turn rkt network status failure into Start failure If the rkt driver cannot get the network status, for a task with a configured port mapping, it will now fail the Start() call and kill the task instead of simply logging. This matches the Docker behavior. If no port map is specified, the warnings will be logged but the task will be allowed to start.	2017-09-26 10:20:57 +02:00
Lasse Dalegaard	55a2e60e1a	Test for rkt driver setting DriverNetwork To test that the rkt driver correctly sets a DriverNetwork, at least when a port mapping is requested, we amend the TestRktDriver_PortsMapping test with a small check.	2017-09-26 09:10:50 +02:00
Lasse Dalegaard	2d307d5beb	Discard errors from rkt status and cat-manifest Since we don't actually show these errors anywhere, just discard them right away.	2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00
Lasse Dalegaard	cbcbe0da2e	Expose rkt DriverNetwork Currently the rkt driver does not expose a DriverNetwork instance after starting the container, which means that address_mode = 'driver' does not work. To get the container network information, we can call `rkt status` on the UUID of the container and grab the container IP from there. For the port map, we need to grab the pod manifest as it will tell us which ports the container exposes. We then cross-reference the configured port name with the container port names, and use that to create a correct port mapping. To avoid doing a (bad) reimplementation of the appc schema(which rkt uses for its manifest) and rkt apis, we pull those in as vendored dependencies. The versions used are the same ones that rkt use in their glide dependency configuration for version 1.28.0.	2017-09-21 00:34:22 +02:00
Lasse Dalegaard	7ac599d509	Use rkt prepare + run-prepared instead of run. The rkt driver currently executes run and asks that the pod UUID is written to a file that is then polled for changes for up to five seconds. Many container fetches will take longer than this, so this method will often not be able to track the pod UUID reliably. To avoid this problem, rkt allows pods to be first prepared, which will return their UUID, and then run as a second invocation. Here we convert the rkt driver's Start method to use this method instead. This way, the UUID will always be tracked correctly.	2017-09-21 00:17:31 +02:00
Michael Schurter	f92ffe5af5	Merge pull request #3105 from hashicorp/f-876-restart-unhealthy Restart unhealthy tasks	2017-09-17 19:38:32 -07:00
epipho	a16c97394f	Fix incorrect docker stats	2017-09-16 00:43:03 -04:00
Michael Schurter	67a4a169a9	Name const after what it represents	2017-09-15 14:57:18 -07:00
Michael Schurter	79a7bf3d7c	Cleanup and test restart failure code	2017-09-15 14:54:37 -07:00
Michael Schurter	06ca379da0	Add comments	2017-09-15 14:34:36 -07:00
Michael Schurter	4dbaa52aba	Fold SetFailure into SetRestartTriggered	2017-09-14 16:48:39 -07:00
Michael Schurter	ed77c0944b	DRY up restart handling a bit. All 3 error/failure cases share restart logic, but 2 of them have special cased conditions.	2017-09-14 16:48:39 -07:00
Michael Schurter	73fb71ca10	RestartDelay isn't needed as checks are re-added on restarts @dadgar made the excellent observation in #3105 that TaskRunner removes and re-registers checks on restarts. This means checkWatcher doesn't need to do any internal restart tracking. Individual checks can just remove themselves and be re-added when the task restarts.	2017-09-14 16:48:39 -07:00
Michael Schurter	06dd86adbd	Remove unused lastStart field	2017-09-14 16:47:41 -07:00
Michael Schurter	0447f79288	Removed partially implemented allocLock	2017-09-14 16:47:41 -07:00
Michael Schurter	ade29ecbed	Improve check watcher logging and add tests Also expose a mock Consul Agent to allow testing ServiceClient and checkWatcher from TaskRunner without actually talking to a real Consul.	2017-09-14 16:47:41 -07:00
Michael Schurter	a137676358	Add comments and move delay calc to TaskRunner	2017-09-14 16:46:54 -07:00
Michael Schurter	8a87475498	Use existing restart policy infrastructure	2017-09-14 16:46:54 -07:00
Michael Schurter	22690c5f4c	Add check watcher for restarting unhealthy tasks	2017-09-14 16:46:54 -07:00
Alex Dadgar	d306da846c	changelog and feedback	2017-09-14 14:08:58 -07:00
Alex Dadgar	07ed83fdd5	Non-locked accessors to common Node fields This PR removes locking around commonly accessed node attributes that do not need to be locked. The locking could cause nodes to TTL as the heartbeat code path was acquiring a lock that could be held for an excessively long time. An example of this is when Vault is inaccessible, since the fingerprint is run with a lock held but the Vault fingerprinter makes the API calls with a large timeout. Fixes https://github.com/hashicorp/nomad/issues/2689	2017-09-14 14:08:26 -07:00
Chelsea Komlo	536d38454b	Merge pull request #3191 from hashicorp/b-tagged-metrics-panic Fix panic in emitting tagged allocation metrics	2017-09-11 14:28:50 -04:00
Armon Dadgar	d4aed839d2	Merge pull request #3185 from hashicorp/f-acl-reset Add ability to reset ACL bootstrap process	2017-09-11 10:47:17 -07:00
Armon Dadgar	3d5ecaafff	Address @dadgar feedback	2017-09-11 10:30:59 -07:00
Alex Dadgar	b3958faa14	Merge pull request #3187 from hashicorp/b-windows-docker Fix MemorySwappiness on Windows Docker	2017-09-11 09:56:49 -07:00
Alex Dadgar	1cd8f7523f	Merge pull request #3184 from hashicorp/b-docker-logging Fix docker user specified syslogging	2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo	848af92183	fix panic in emitting tagged metrics	2017-09-11 15:32:37 +00:00
Alex Dadgar	d3a9463358	Fix MemorySwappiness on Windows Docker Fixes https://github.com/hashicorp/nomad/issues/3181	2017-09-10 17:46:45 -07:00
Alex Dadgar	3ec7946b3e	Fix invalid CPU stats on Windows This PR fixes an issue introduced in Nomad 0.6.0 due to https://github.com/shirou/gopsutil/issues/420. The issue arised from the fact that the Windows stats from gopsutil reports CPUs in percentages where we expected ticks.	2017-09-10 15:30:48 -07:00
Alex Dadgar	637ae9580a	Fix docker user specified syslogging	2017-09-10 14:57:48 -07:00
James Nugent	448145872f	client: Guard against "NaN" values from floats This commit protects against finding `0.NaN` tokens in JSON streams because of infinity representation on serialization.	2017-09-08 16:21:07 -05:00
Alex Dadgar	31f9e099d9	Merge pull request #3148 from clinta/purge-stopped Always purge stopped containers	2017-09-05 17:18:05 -07:00
Alex Dadgar	6fdaf38389	Fix repo name passed to docker credential helpers This PR fixes the server url passed to docker credential helpers and fixes stderr capture. Fixes https://github.com/hashicorp/nomad/issues/2957	2017-09-05 16:43:21 -07:00
Alex Dadgar	21564c7c04	Parse Docker mounts correctly (#3163 ) * Parse Docker mounts correctly This PR fixes the parsing of Docker mounts and adds testing to ensure no regressions. Fixes https://github.com/hashicorp/nomad/issues/3156 * Review feedback	2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo	0ef43c3c5f	final code review fixups	2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo	dea1fa089b	fix up travis test failure via race condition	2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo	a8cbd0b559	fixups from code review	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f72e4aad13	labels depend on full setup of client beforehand	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	87a814397d	refactor to use baseLabels	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	b2953d905a	pass in commonly used values	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	c634043069	create base labels to be used in every metric	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f5ea83da8d	emit metrics using labels, add option for backwards compatibility	2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo	0175f80775	add metrics options to client config	2017-09-05 14:12:57 +00:00
Armon Dadgar	b8bf35f087	ACL RPCs allow stale reads for scalability	2017-09-04 13:07:44 -07:00
Armon Dadgar	f31cd6a618	client: fixing policy resolution after ACL endpoint enforcement	2017-09-04 13:05:53 -07:00
Armon Dadgar	ddcc5f89bc	Add ErrPermissionDenied, rename TokenNotFound	2017-09-04 13:05:53 -07:00
Armon Dadgar	76a03f2d8e	Address @dadgar feedback	2017-09-04 13:05:53 -07:00
Armon Dadgar	e3f32ca6f1	client: adding token resolution logic	2017-09-04 13:05:36 -07:00
Armon Dadgar	688897561b	client: adding token cache for ACL resolution	2017-09-04 13:05:36 -07:00
Armon Dadgar	c2e72e8a9c	client: create ACL and Policy cache	2017-09-04 13:05:35 -07:00
Armon Dadgar	792f176a44	agent: thread ACL config to client	2017-09-04 13:04:45 -07:00
Clint Armstrong	b5c2636313	Always purge stopped containers	2017-08-31 14:28:48 -04:00
Clint Armstrong	7e35ab6abb	fix logging re-init	2017-08-30 12:36:31 -04:00
Michael Schurter	78823d559b	Squelch logspam when unable to get disk usage stats To reproduce logspam: ``` $ docker plugin install --grant-all-permissions vieux/sshfs $ nomad agent -dev ... 2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied ... repeats every collection period ... ```	2017-08-28 12:04:32 -07:00
Alex Dadgar	876732833f	Merge pull request #3073 from clinta/docker-500 Allow retry of 500 API errors to be handled by restart policies	2017-08-24 16:57:36 -07:00
Alex Dadgar	fd7d614ae4	Handle interfaces that only have link-local addrs This PR changes the fingerprint handling of network interfaces that only contain link local addresses. The new behavior is to prefer globally routable addresses and if none are detected, to fall back to link local addresses if the operator hasn't disallowed it. This gives us pre 0.6 behavior for interfaces with only link local addresses but 0.6+ behavior for IPv6 interfaces that will always have a link-local address. Fixes https://github.com/hashicorp/nomad/issues/3005 /cc diptanuc	2017-08-23 15:32:22 -07:00
Alex Dadgar	211a793530	resolve feedback	2017-08-23 14:17:00 -07:00
Alex Dadgar	653733e093	Clean up docker mounts	2017-08-22 14:12:44 -07:00
Clint Armstrong	ae230395ba	Allow retry of 500 API errors to be handled by restart policies	2017-08-22 14:04:46 -04:00
Michael Schurter	51a27cc83d	Merge pull request #3031 from hashicorp/f-2924-consul-headers Add Header and Method support for HTTP checks	2017-08-18 13:35:08 -07:00
Michael Schurter	7ebd429a86	Merge mistake made go fmt fail	2017-08-18 13:19:44 -07:00
Michael Schurter	5c015da3cb	Merge pull request #3021 from clinta/docker-mount2 Expose docker mount options	2017-08-17 16:57:09 -07:00
Michael Schurter	ff3944a981	Update and test service/check interpolation	2017-08-17 16:49:14 -07:00
Michael Schurter	b4813747d0	Merge pull request #3043 from hashicorp/f-2441-shutdown-delay Add optional shutdown delay to tasks	2017-08-17 14:37:48 -07:00
Michael Schurter	c709251ed6	Lower ShutdownDelay for non-Travis testing	2017-08-17 14:23:42 -07:00
Michael Schurter	b33b2fb4c0	Lower shutdown delay in test	2017-08-17 13:57:22 -07:00
Michael Schurter	0726ca75e3	Make shutdown delay log DEBUG, not INFO	2017-08-17 11:28:33 -07:00
Clint Armstrong	f0460156ae	restrict mount to volume type	2017-08-17 09:52:13 -04:00
Michael Schurter	d529b422b2	Add optional shutdown delay to tasks Fixes #2441 Defaults to 0 (no delay) for backward compat and because this feature should be opt-in.	2017-08-16 17:59:46 -07:00
Alex Dadgar	d6187cd3e8	Fix tests	2017-08-16 16:26:52 -07:00
Alex Dadgar	1a86aecf55	Add version package This PR adds a version package and consolidates version strings into a Version struct.	2017-08-16 15:44:21 -07:00
Alex Dadgar	3d69961c3a	Must be root for TestAllocDir_CreateDir	2017-08-16 10:46:14 -07:00
Alex Dadgar	7dd86b5dfe	Merge pull request #3025 from hashicorp/f-health-events Emit task events explaining alloc health	2017-08-15 12:23:46 -07:00
Alex Dadgar	bb165b97ef	comments	2017-08-15 12:23:29 -07:00
Michael Schurter	1126268a81	Fix formatting	2017-08-15 10:37:02 -07:00
Michael Schurter	74d5c272c6	Cleanup comments and return val	2017-08-14 16:59:03 -07:00
Michael Schurter	46b7fd45d7	spelling	2017-08-14 16:55:59 -07:00
Michael Schurter	de8ea243b6	Return move errors from local Migrate like remote Since alloc runner just logs these errors and continues there's no reason not to return it.	2017-08-14 16:48:56 -07:00
Michael Schurter	7342e23669	Move migrating state into prevAllocWatcher	2017-08-14 16:02:28 -07:00
Alex Dadgar	fdc0115427	test	2017-08-12 14:42:53 -07:00
Alex Dadgar	56801349eb	Refactor health watcher and emit events	2017-08-12 14:23:36 -07:00
Michael Schurter	4601419d63	Soft fail on migration errors	2017-08-11 16:50:30 -07:00
Michael Schurter	3dbd764969	Exit if alloc listener closes Add test for that case, add comments, remove debug logging	2017-08-11 16:22:02 -07:00
Michael Schurter	b7915bdac7	Update tests for new blocking/migrating code	2017-08-11 16:21:57 -07:00
Michael Schurter	ad6cec9e82	Set failed status instead of panic'ing Fixup some TODOs and formatting left from new prevAllocWatcher code.	2017-08-11 16:21:35 -07:00
Michael Schurter	e41a654917	switch from alloc blocker to new interface interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block	2017-08-11 16:21:35 -07:00

... 2 3 4 5 6 ...

2806 Commits