open-nomad

Commit Graph

Author	SHA1	Message	Date
Chelsea Holland Komlo	0708d34135	call reload on agent, client, and server separately	2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo	9741097406	reloading tls config should be atomic for clients/servers	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	ae7fc4695e	fixups from code review Revert "close raft long-lived connections" This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625. reload raft connections on changing tls	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	acd3d1b162	fix up downgrading client to plaintext add locks around changing server configuration	2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo	c0ad9a4627	add ability to upgrade/downgrade nomad agents tls configurations via sighup	2018-01-08 09:21:06 -05:00
Michael Schurter	ef76c65da1	Lookup euid outside of loop	2017-12-13 11:50:12 -08:00
Michael Schurter	5032bf4f5a	Skip tests that require root when not root Also skip Chown on allocdir migration on Windows and when non-root. Windows doesn't support it, and it will always fail as a non-root user.	2017-12-12 16:58:27 -08:00
Alex Dadgar	f0b0697b57	Keyify struct	2017-12-11 17:23:14 -08:00
Michael Schurter	c4d4ead199	Fix test broken by mock updates	2017-12-08 16:45:25 -08:00
Michael Schurter	4b20441eef	Validate port label for host address mode Also skip getting an address for script checks which don't use them. Fixed a weird invalid reserved port in a TaskRunner test helper as well as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause other tests to fail, but we were referencing an invalid PortLabel and just not catching it before.	2017-12-08 12:03:43 -08:00
Michael Schurter	30dd570061	Fix interpolation bug with service/check updates Previously if only an interpolated variable used in a service or check was changed we interpolated the old and new services and checks with the new variable, so nothing appeared to have changed.	2017-12-08 12:03:00 -08:00
Michael Schurter	4347026f83	Test Consul from TaskRunner thoroughly Rely less on the mockConsulServiceClient because the real consul.ServiceClient needs all the testing it can get!	2017-12-08 12:03:00 -08:00
Alex Dadgar	a0d6b6a121	Merge pull request #3630 from hashicorp/b-periodic Handle race between fingerprinters and registration	2017-12-07 16:11:13 -08:00
Alex Dadgar	91ffbbb517	Review feedback	2017-12-07 16:10:57 -08:00
Chelsea Komlo	c8e0cb3044	Merge pull request #3591 from hashicorp/b-1755-stop Allow controlling the stop signal for drivers	2017-12-07 17:06:43 -05:00
Alex Dadgar	02baa6c52b	Handle race between fingerprinters and registration	2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo	61fa8ad4ba	code review fixes	2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo	77ab41124b	set default kill signal on executor shutdown	2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo	6cae8fe6e6	extend configurable kill signal to java driver	2017-12-07 11:40:10 -05:00
Alex Dadgar	4409fdacc0	Drop trace logging	2017-12-06 18:02:24 -08:00
Alex Dadgar	cd9a7f14b8	Add logging around heartbeats	2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo	350319239c	change location of default kill signal	2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo	7dfb64f941	extract signal helper into utils	2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo	b08611cfac	move kill_signal to task level, extend to docker	2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo	80de7d5ebd	allow controlling the stop signal in exec/raw_exec	2017-12-06 11:28:45 -05:00
Chelsea Komlo	9ae849e09c	Merge pull request #3612 from hashicorp/docker-rkt-user Set user for rkt tasks	2017-12-05 17:45:08 -05:00
Michael Schurter	b66aa5b7f6	Merge pull request #3563 from hashicorp/b-snapshot-atomic Atomic Snapshotting / Sticky Volume Migration	2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo	4463dc607e	fix up test	2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo	7284f2385a	remove unused user option	2017-12-04 18:01:31 -05:00
Michael Schurter	6ccc4219d3	Merge pull request #3615 from hashicorp/b-rkt-host-ports rkt: Don't require port_map with host networking	2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo	7c74968452	add ability to specify user for rkt	2017-12-04 14:21:48 -05:00
Michael Schurter	2bf1d6d85e	rkt: Don't require port_map with host networking Also don't try to return a DriverNetwork with host networking. None will ever exist as that's the point of host networking: rkt won't create a network namespace.	2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo	4ee2122536	get KillTimeout in seconds, not nanoseconds	2017-12-01 10:43:00 -05:00
Michael Schurter	5e975bbd0f	Add comment and normalize err check ordering as per PR comments	2017-11-29 17:26:11 -08:00
Michael Schurter	d996c3a231	Check for error file when receiving snapshots	2017-11-29 17:26:11 -08:00
Michael Schurter	ca946679f6	Destroy partially migrated alloc dirs Test that snapshot errors don't return a valid tar currently fails.	2017-11-29 17:26:11 -08:00
Michael Schurter	23c66e37c5	Handle errors during snapshotting If an alloc dir is being GC'd (removed) during snapshotting the walk func will be passed an error. Previously we didn't check for an error so a panic would occur when we'd try to use a nil `fileInfo`.	2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo	2208964948	Support StopTimeout for Docker tasksw Update github.com/fsouza/go-dockerclient	2017-11-29 14:33:05 -05:00
Preetha Appan	6ad65c51e6	Missed assert in one place	2017-11-20 13:04:38 -06:00
Preetha Appan	747bd59daa	Better error validation, and added test case for invalid sysctl inputs	2017-11-20 12:07:18 -06:00
Preetha Appan	c68973747b	Address some review comments	2017-11-20 11:15:09 -06:00
Preetha Appan	39ef9ee76d	Fix gofmt warnings	2017-11-18 09:23:09 -06:00
Preetha Appan	e53dd15f58	Fix test compilation after rebase	2017-11-17 17:46:04 -06:00
Samuel BERTHE	0fca2e19c8	review(docker driver): sysctls -> sysctl + ulimits -> ulimit	2017-11-17 16:30:45 -06:00
Samuel BERTHE	6c93922cb7	Oops	2017-11-17 16:14:14 -06:00
Samuel BERTHE	c8363bc44b	💄	2017-11-17 16:03:22 -06:00
Samuel BERTHE	281ab90484	test(docker driver): testing sysctls and ulimits	2017-11-17 16:03:22 -06:00
Samuel BERTHE	b9a10ff7fa	feat(docker driver): adds sysctls and ulimits configs	2017-11-17 16:03:22 -06:00
Alex Dadgar	69d3bf7392	Merge pull request #3559 from hashicorp/b-metrics Don't emit metrics for non-running tasks	2017-11-17 10:33:23 -08:00
Michael Schurter	3845c8d200	Merge pull request #3562 from hashicorp/b-3561-rkt-rm Remove rkt pods when exiting	2017-11-16 17:30:21 -08:00
Michael Schurter	737fb45640	Merge pull request #3551 from hashicorp/b-3419-docker-409-bug Fix Docker name conflict bug by updating dockerclient	2017-11-16 16:38:54 -08:00
Michael Schurter	437fce9954	Improve rktRemove error message	2017-11-16 15:45:14 -08:00
Michael Schurter	3ceec0caab	Remove rkt pods when exiting Fixes #3561	2017-11-16 14:33:44 -08:00
Charlie Voiselle	7a231897a5	Merge pull request #3556 from angrycub/f-fingerprint-log-level Dropped loglevel for AWS fingerprinter env read misses to DEBUG	2017-11-16 16:27:25 -05:00
Charlie Voiselle	969ddf9c2a	Lowered to DEBUG from AD feedback	2017-11-16 14:13:03 -05:00
Alex Dadgar	05b1588cea	Only publish metric when the task is running and dev mode publishes metrics	2017-11-15 13:21:06 -08:00
Alex Dadgar	07963f0b6d	Merge pull request #3546 from hashicorp/f-heuristic Better interface selection heuristic	2017-11-15 12:51:21 -08:00
Alex Dadgar	97ec3974a9	Use interface attached to default route	2017-11-15 11:32:32 -08:00
Michael Schurter	f86f0bd9ea	Handle leader task being dead in RestoreState Fixes the panic mentioned in https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932 While a leader task dying serially stops all follower tasks, the synchronizing of state is asynchrnous. Nomad can shutdown before all follower tasks have updated their state to dead thus saving the state necessary to hit this panic: have a non-terminal alloc with a dead leader. The actual fix is a simple nil check to not assume non-terminal allocs leader's have a TaskRunner.	2017-11-15 10:36:13 -08:00
Charlie Voiselle	1197637251	Dropped loglevel for AWS fingerprinter env reads Certain environments use WARN for serious logging; however, it's very possible to have machines without some of the fingerprinted keys (public-ipv4 and public-hostname specifcally). Setting log level to INFO seems more consistent with this possibility.	2017-11-15 18:20:59 +00:00
Chelsea Komlo	2dfda33703	Nomad agent reload TLS configuration on SIGHUP (#3479 ) * Allow server TLS configuration to be reloaded via SIGHUP * dynamic tls reloading for nomad agents * code cleanup and refactoring * ensure keyloader is initialized, add comments * allow downgrading from TLS * initalize keyloader if necessary * integration test for tls reload * fix up test to assert success on reloaded TLS configuration * failure in loading a new TLS config should remain at current Reload only the config if agent is already using TLS * reload agent configuration before specific server/client lock keyloader before loading/caching a new certificate * introduce a get-or-set method for keyloader * fixups from code review * fix up linting errors * fixups from code review * add lock for config updates; improve copy of tls config * GetCertificate only reloads certificates dynamically for the server * config updates/copies should be on agent * improve http integration test * simplify agent reloading storing a local copy of config * reuse the same keyloader when reloading * Test that server and client get reloaded but keep keyloader * Keyloader exposes GetClientCertificate as well for outgoing connections * Fix spelling * correct changelog style	2017-11-14 17:53:23 -08:00
Michael Schurter	3023336b39	Add a test demonstrating the bug Fails on Docker 17.09, passes on Docker 17.06 and earlier	2017-11-14 15:25:52 -08:00
Alex Dadgar	ee31e15f51	Better interface selection heuristic This PR introduces a better interface selection heuristic such that we select interfaces with globally routable unicast addresses over link local addresses. Fixes https://github.com/hashicorp/nomad/issues/3487	2017-11-13 15:13:43 -08:00
Preetha Appan	926c9ed997	Make device mounting unit test verify configuration via docker inspect	2017-11-13 09:56:54 -06:00
Preetha Appan	dc2d5fb5a4	Unit test (linux only) that tests mounting a device in the docker driver	2017-11-13 09:56:54 -06:00
Preetha Appan	4834710e45	Add default value for cgroup permissions for device if not set	2017-11-13 09:56:54 -06:00
Preetha Appan	9cdee6991c	Remove unnecessary check since validate method already checks this	2017-11-13 09:56:54 -06:00
Preetha Appan	110c1fd4f0	Add support for passing device into docker driver	2017-11-13 09:56:54 -06:00
Alex Dadgar	d1358ec1b6	alway load all templates	2017-11-10 12:35:51 -08:00
Alex Dadgar	a3ea0c17a0	Handle multiple environment templates Fixes https://github.com/hashicorp/nomad/issues/3498	2017-11-10 11:08:19 -08:00
Alex Dadgar	b3edc12dd9	Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown Qemu driver: graceful shutdown feature	2017-11-03 16:41:34 -07:00
Michael Schurter	690b8f4cfb	Remove noisy log line Didn't mean to commit this	2017-11-03 16:00:30 -07:00
Matt Mercer	11e2870875	Qemu driver: clean up logging; fail unsupported features on Windows	2017-11-03 15:40:20 -07:00
Alex Dadgar	6034916ad1	fix spelling mistake	2017-11-03 15:04:59 -07:00
Alex Dadgar	a23033932a	Merge pull request #3459 from multani/docker-oom-notification docker: log that a container has been killed by the OOM killer	2017-11-03 13:24:03 -07:00
Matt Mercer	cef9ba9770	Qemu driver: tweaks in response to PR feedback Remove attribute for long qemu monitor path; misc cleanup; update tests	2017-11-03 11:28:56 -07:00
Preetha Appan	0eaef09675	Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.	2017-11-03 10:10:06 -05:00
Preetha Appan	5f09c968b3	Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.	2017-11-03 09:13:01 -05:00
Alex Dadgar	b4af10edde	Alloc Runner doesn't panic on restoration.	2017-11-02 16:14:13 -07:00
Alex Dadgar	abd28cbd7d	Merge pull request #3493 from hashicorp/f-remove-atlas Remove Atlas and Scada from codebase	2017-11-02 16:00:44 -07:00
Michael Schurter	eedbe8efbb	Merge pull request #3490 from hashicorp/f-gc-logging Make unable-to-gc log level adaptive	2017-11-02 14:32:40 -07:00
Diptanu Choudhury	cb68889652	Added the node_id as a tag	2017-11-02 13:29:10 -07:00
Alex Dadgar	701f462d33	remove atlas	2017-11-02 11:27:21 -07:00
Michael Schurter	fc33c945be	Make unable-to-gc log level adaptive WARNing when someone has over 50 non-terminal allocs was just too confusing. Tested manually with `gc_max_allocs = 10` and bumping a job from `count = 19` to `count = 21`: ``` 2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations ... 2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations ```	2017-11-02 10:57:42 -07:00
Diptanu Choudhury	8a9d0d40b1	Added support for tagged metrics	2017-11-02 10:07:57 -07:00
Diptanu Choudhury	5f522c6de3	Incrementing the start counter when we are actually starting a container	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	44535e5d10	Recording counter for dead allocs properly	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	0b34e811b7	Added metrics to track task/alloc start/restarts/dead events	2017-11-02 09:51:20 -07:00
Matt Mercer	00f90323c2	Qemu driver: defer cleanup sooner	2017-11-01 17:37:43 -07:00
Matt Mercer	43256af5f3	Qemu driver: clean up test logging; retry integration test for longer	2017-11-01 17:21:56 -07:00
Matt Mercer	b1145705d3	Use strings.Replace() instead of custom function	2017-11-01 15:31:35 -07:00
Matt Mercer	d51d174fa0	Qemu driver: basic testing of graceful shutdown feature	2017-11-01 15:31:30 -07:00
Matt Mercer	c26013ea0b	Qemu driver: include PIDs in log output	2017-11-01 15:31:24 -07:00
Matt Mercer	38d9a391aa	Qemu driver: ensure proper cleanup of resources	2017-11-01 15:31:20 -07:00
Matt Mercer	46f7e2fa4c	Qemu driver: minor logging fixes	2017-11-01 15:31:14 -07:00
Matt Mercer	4afb9dfa2d	Standardize driver.qemu logging prefix	2017-11-01 15:30:44 -07:00
Matt Mercer	5127e75569	Qemu driver: add graceful shutdown feature	2017-11-01 15:30:36 -07:00
Michael Schurter	1769db98b7	Fix regression by returning error on unknown alloc	2017-11-01 15:16:38 -05:00
Michael Schurter	9f26b9a403	Fix race in test	2017-11-01 15:16:38 -05:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	c710550551	fix test	2017-10-30 12:35:31 -07:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Jonathan Ballet	5429d1c656	docker: changed OOM killed error message	2017-10-27 20:30:52 +02:00
Jonathan Ballet	12615bde9c	docker: log that a container has been killed by the OOM killer Fix: #2203 (at least for Docker tasks)	2017-10-27 18:05:27 +02:00
Alex Dadgar	f117eb28c7	go style vars	2017-10-25 10:49:34 -07:00
Alex Dadgar	3f8495dd0e	fix two flaky tests	2017-10-23 18:15:52 -07:00
Alex Dadgar	cb0d0ef009	move to consul freeport implementation	2017-10-23 16:51:40 -07:00
Alex Dadgar	dbc014b360	Standardize retrieving a free port into a helper package	2017-10-23 16:48:20 -07:00
Alex Dadgar	4a69e1ad15	don't double parallel	2017-10-23 16:48:06 -07:00
Alex Dadgar	96ca2bbe4c	respond to comments	2017-10-23 15:50:27 -07:00
Alex Dadgar	99c81b5848	Skip if no docker	2017-10-19 16:55:10 -07:00
Alex Dadgar	593536664e	fix flaky java tests	2017-10-19 16:49:57 -07:00
Alex Dadgar	4bc452b479	Undo darwin user setting	2017-10-19 16:49:57 -07:00
Alex Dadgar	c7c6964313	Run as user on mac	2017-10-19 16:49:57 -07:00
Alex Dadgar	55a1dffa2f	sudo docker works	2017-10-19 16:49:57 -07:00
Alex Dadgar	805e7b3b62	docker tests	2017-10-19 16:49:57 -07:00
Michael Schurter	797f49702e	Add logging around moby/moby#32648 bug	2017-10-18 10:44:03 -07:00
Michael Schurter	22ac450b2f	Properly fail rkt fingerprinting on old vesions	2017-10-16 13:58:58 -07:00
Michael Schurter	d7732c1a58	Squelch repeated rkt version warnings	2017-10-16 12:09:47 -07:00
Michael Schurter	b5fd075d74	Test fixes from #3383	2017-10-13 15:45:35 -07:00
Michael Schurter	b63eee17e9	Merge pull request #3383 from hashicorp/b-migrate-token base64 migrate token	2017-10-13 13:46:54 -07:00
Michael Schurter	dfd2967cdb	Merge pull request #3376 from hashicorp/f-node-acls Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc	2017-10-13 11:51:48 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Alex Dadgar	85178d6048	rkt remove allocid	2017-10-13 10:07:50 -07:00
Adam Stankiewicz	cefbc72b49	Remove AllocID from ExecutorContext	2017-10-13 17:07:49 +02:00
Michael Schurter	4a70d4356a	Alloc watcher must send Node.SecretID as AuthToken An auth token is required if ACLs are enabled	2017-10-12 16:38:02 -07:00
Michael Schurter	84d8a51be1	SecretID -> AuthToken	2017-10-12 15:16:33 -07:00
Michael Schurter	59ff94cd71	Don't panic on unexpeced Consul response Fixes #3326	2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	a77e462465	add tests for functionality	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Alex Dadgar	6d3d0a9391	Nomad UI Command	2017-10-09 23:01:55 -07:00
Michael Schurter	f788974f8a	Merge pull request #3288 from simar7/qemu-improvements qemu: Add bound checks for memory assignment	2017-10-02 14:47:05 -07:00
Simarpreet Singh	d801584c46	qemu: Fix lower memory bound to 128M Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 14:29:44 -07:00
Simarpreet Singh	10d7d6dab0	gofmt: format qemu.go and qemu_test.go Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 13:16:48 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Michael Schurter	77f1fe40e7	Properly autodetect Docker IP in Windows Our Docker network plugin autodetection code was erroneously treating Window's default network `nat` as a plugin and defaulting to it instead of the host. Fixes #3218	2017-09-27 16:49:23 -07:00
Michael Schurter	a8a87af7ed	Only build rkt driver on linux Build stub for non-linux targets	2017-09-27 14:21:45 -07:00
Simarpreet Singh	3d99e71de8	qemu: Add bound checks for memory assignment Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-09-26 21:07:48 -07:00
Michael Schurter	d7229ce6c5	Merge pull request #3256 from dalegaard/master Enable rkt driver to use address_mode = 'driver'	2017-09-26 18:04:37 -05:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Lasse Dalegaard	9f584d1114	Ignore rkt network failure if container died early If the container dies before the network can be read, we now ignore the error coming out of the network information polling loop. Nomad will restart the task regardless, so we might be masking the actual error. The polling loop for the rkt network information, inside the `Start` method, was getting a bit unwieldy. It's been refactored out so it's not a seperate function.	2017-09-27 00:15:27 +02:00
Lasse Dalegaard	b43ec57c02	Make rkt port mapping test not exit immediately The rkt port mapping test currently starts redis with --version, which obviously makes redis exit again almost immediately. This means that the container exists before the network status can be queried, and so the test fails.	2017-09-26 23:10:24 +02:00
Lasse Dalegaard	17d155d316	Improve rkt driver network status poll loop The network status poll loop will now report any networks it ignored, as well as a no-networks situations.	2017-09-26 21:49:45 +02:00
Lasse Dalegaard	bafd32fda0	Refactor rkt network status loop The network status poll loop for the rkt drivers `Start` method was a bit messy, and could not display the last encountered error. Here we clean it up.	2017-09-26 21:27:12 +02:00
Lasse Dalegaard	5e9e2b07bd	Small logging fix in rkt/driver	2017-09-26 19:36:13 +02:00
Lasse Dalegaard	3d25fd3b00	Bump minimum rkt version to 1.27.0. The changes introduces in #3256 require at least rkt 1.27.0 because of a bug in the JSON output of `rkt status` in previous versions. Here we upgrade all references to rkt's minimum version, and also make travis and vagrant use this version when running tests. Finally we add a CHANGELOG notice.	2017-09-26 19:15:43 +02:00
Lasse Dalegaard	f55f2b8f24	Turn rkt network status failure into Start failure If the rkt driver cannot get the network status, for a task with a configured port mapping, it will now fail the Start() call and kill the task instead of simply logging. This matches the Docker behavior. If no port map is specified, the warnings will be logged but the task will be allowed to start.	2017-09-26 10:20:57 +02:00
Lasse Dalegaard	55a2e60e1a	Test for rkt driver setting DriverNetwork To test that the rkt driver correctly sets a DriverNetwork, at least when a port mapping is requested, we amend the TestRktDriver_PortsMapping test with a small check.	2017-09-26 09:10:50 +02:00
Lasse Dalegaard	2d307d5beb	Discard errors from rkt status and cat-manifest Since we don't actually show these errors anywhere, just discard them right away.	2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00
Lasse Dalegaard	cbcbe0da2e	Expose rkt DriverNetwork Currently the rkt driver does not expose a DriverNetwork instance after starting the container, which means that address_mode = 'driver' does not work. To get the container network information, we can call `rkt status` on the UUID of the container and grab the container IP from there. For the port map, we need to grab the pod manifest as it will tell us which ports the container exposes. We then cross-reference the configured port name with the container port names, and use that to create a correct port mapping. To avoid doing a (bad) reimplementation of the appc schema(which rkt uses for its manifest) and rkt apis, we pull those in as vendored dependencies. The versions used are the same ones that rkt use in their glide dependency configuration for version 1.28.0.	2017-09-21 00:34:22 +02:00
Lasse Dalegaard	7ac599d509	Use rkt prepare + run-prepared instead of run. The rkt driver currently executes run and asks that the pod UUID is written to a file that is then polled for changes for up to five seconds. Many container fetches will take longer than this, so this method will often not be able to track the pod UUID reliably. To avoid this problem, rkt allows pods to be first prepared, which will return their UUID, and then run as a second invocation. Here we convert the rkt driver's Start method to use this method instead. This way, the UUID will always be tracked correctly.	2017-09-21 00:17:31 +02:00
Michael Schurter	f92ffe5af5	Merge pull request #3105 from hashicorp/f-876-restart-unhealthy Restart unhealthy tasks	2017-09-17 19:38:32 -07:00
epipho	a16c97394f	Fix incorrect docker stats	2017-09-16 00:43:03 -04:00
Michael Schurter	67a4a169a9	Name const after what it represents	2017-09-15 14:57:18 -07:00
Michael Schurter	79a7bf3d7c	Cleanup and test restart failure code	2017-09-15 14:54:37 -07:00
Michael Schurter	06ca379da0	Add comments	2017-09-15 14:34:36 -07:00
Michael Schurter	4dbaa52aba	Fold SetFailure into SetRestartTriggered	2017-09-14 16:48:39 -07:00
Michael Schurter	ed77c0944b	DRY up restart handling a bit. All 3 error/failure cases share restart logic, but 2 of them have special cased conditions.	2017-09-14 16:48:39 -07:00
Michael Schurter	73fb71ca10	RestartDelay isn't needed as checks are re-added on restarts @dadgar made the excellent observation in #3105 that TaskRunner removes and re-registers checks on restarts. This means checkWatcher doesn't need to do any internal restart tracking. Individual checks can just remove themselves and be re-added when the task restarts.	2017-09-14 16:48:39 -07:00
Michael Schurter	06dd86adbd	Remove unused lastStart field	2017-09-14 16:47:41 -07:00
Michael Schurter	0447f79288	Removed partially implemented allocLock	2017-09-14 16:47:41 -07:00
Michael Schurter	ade29ecbed	Improve check watcher logging and add tests Also expose a mock Consul Agent to allow testing ServiceClient and checkWatcher from TaskRunner without actually talking to a real Consul.	2017-09-14 16:47:41 -07:00
Michael Schurter	a137676358	Add comments and move delay calc to TaskRunner	2017-09-14 16:46:54 -07:00
Michael Schurter	8a87475498	Use existing restart policy infrastructure	2017-09-14 16:46:54 -07:00
Michael Schurter	22690c5f4c	Add check watcher for restarting unhealthy tasks	2017-09-14 16:46:54 -07:00
Alex Dadgar	d306da846c	changelog and feedback	2017-09-14 14:08:58 -07:00
Alex Dadgar	07ed83fdd5	Non-locked accessors to common Node fields This PR removes locking around commonly accessed node attributes that do not need to be locked. The locking could cause nodes to TTL as the heartbeat code path was acquiring a lock that could be held for an excessively long time. An example of this is when Vault is inaccessible, since the fingerprint is run with a lock held but the Vault fingerprinter makes the API calls with a large timeout. Fixes https://github.com/hashicorp/nomad/issues/2689	2017-09-14 14:08:26 -07:00
Chelsea Komlo	536d38454b	Merge pull request #3191 from hashicorp/b-tagged-metrics-panic Fix panic in emitting tagged allocation metrics	2017-09-11 14:28:50 -04:00
Armon Dadgar	d4aed839d2	Merge pull request #3185 from hashicorp/f-acl-reset Add ability to reset ACL bootstrap process	2017-09-11 10:47:17 -07:00
Armon Dadgar	3d5ecaafff	Address @dadgar feedback	2017-09-11 10:30:59 -07:00
Alex Dadgar	b3958faa14	Merge pull request #3187 from hashicorp/b-windows-docker Fix MemorySwappiness on Windows Docker	2017-09-11 09:56:49 -07:00
Alex Dadgar	1cd8f7523f	Merge pull request #3184 from hashicorp/b-docker-logging Fix docker user specified syslogging	2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo	848af92183	fix panic in emitting tagged metrics	2017-09-11 15:32:37 +00:00
Alex Dadgar	d3a9463358	Fix MemorySwappiness on Windows Docker Fixes https://github.com/hashicorp/nomad/issues/3181	2017-09-10 17:46:45 -07:00
Alex Dadgar	3ec7946b3e	Fix invalid CPU stats on Windows This PR fixes an issue introduced in Nomad 0.6.0 due to https://github.com/shirou/gopsutil/issues/420. The issue arised from the fact that the Windows stats from gopsutil reports CPUs in percentages where we expected ticks.	2017-09-10 15:30:48 -07:00
Alex Dadgar	637ae9580a	Fix docker user specified syslogging	2017-09-10 14:57:48 -07:00
James Nugent	448145872f	client: Guard against "NaN" values from floats This commit protects against finding `0.NaN` tokens in JSON streams because of infinity representation on serialization.	2017-09-08 16:21:07 -05:00
Alex Dadgar	31f9e099d9	Merge pull request #3148 from clinta/purge-stopped Always purge stopped containers	2017-09-05 17:18:05 -07:00
Alex Dadgar	6fdaf38389	Fix repo name passed to docker credential helpers This PR fixes the server url passed to docker credential helpers and fixes stderr capture. Fixes https://github.com/hashicorp/nomad/issues/2957	2017-09-05 16:43:21 -07:00
Alex Dadgar	21564c7c04	Parse Docker mounts correctly (#3163 ) * Parse Docker mounts correctly This PR fixes the parsing of Docker mounts and adds testing to ensure no regressions. Fixes https://github.com/hashicorp/nomad/issues/3156 * Review feedback	2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo	0ef43c3c5f	final code review fixups	2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo	dea1fa089b	fix up travis test failure via race condition	2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo	a8cbd0b559	fixups from code review	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f72e4aad13	labels depend on full setup of client beforehand	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	87a814397d	refactor to use baseLabels	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	b2953d905a	pass in commonly used values	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	c634043069	create base labels to be used in every metric	2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo	f5ea83da8d	emit metrics using labels, add option for backwards compatibility	2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo	0175f80775	add metrics options to client config	2017-09-05 14:12:57 +00:00
Armon Dadgar	b8bf35f087	ACL RPCs allow stale reads for scalability	2017-09-04 13:07:44 -07:00
Armon Dadgar	f31cd6a618	client: fixing policy resolution after ACL endpoint enforcement	2017-09-04 13:05:53 -07:00
Armon Dadgar	ddcc5f89bc	Add ErrPermissionDenied, rename TokenNotFound	2017-09-04 13:05:53 -07:00
Armon Dadgar	76a03f2d8e	Address @dadgar feedback	2017-09-04 13:05:53 -07:00
Armon Dadgar	e3f32ca6f1	client: adding token resolution logic	2017-09-04 13:05:36 -07:00
Armon Dadgar	688897561b	client: adding token cache for ACL resolution	2017-09-04 13:05:36 -07:00
Armon Dadgar	c2e72e8a9c	client: create ACL and Policy cache	2017-09-04 13:05:35 -07:00

... 2 3 4 5 6 ...

2842 Commits