open-nomad

Author	SHA1	Message	Date
Alex Dadgar	a3ea0c17a0	Handle multiple environment templates Fixes https://github.com/hashicorp/nomad/issues/3498	2017-11-10 11:08:19 -08:00
Alex Dadgar	b3edc12dd9	Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown Qemu driver: graceful shutdown feature	2017-11-03 16:41:34 -07:00
Michael Schurter	690b8f4cfb	Remove noisy log line Didn't mean to commit this	2017-11-03 16:00:30 -07:00
Matt Mercer	11e2870875	Qemu driver: clean up logging; fail unsupported features on Windows	2017-11-03 15:40:20 -07:00
Alex Dadgar	6034916ad1	fix spelling mistake	2017-11-03 15:04:59 -07:00
Alex Dadgar	a23033932a	Merge pull request #3459 from multani/docker-oom-notification docker: log that a container has been killed by the OOM killer	2017-11-03 13:24:03 -07:00
Matt Mercer	cef9ba9770	Qemu driver: tweaks in response to PR feedback Remove attribute for long qemu monitor path; misc cleanup; update tests	2017-11-03 11:28:56 -07:00
Preetha Appan	0eaef09675	Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.	2017-11-03 10:10:06 -05:00
Preetha Appan	5f09c968b3	Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.	2017-11-03 09:13:01 -05:00
Alex Dadgar	b4af10edde	Alloc Runner doesn't panic on restoration.	2017-11-02 16:14:13 -07:00
Alex Dadgar	abd28cbd7d	Merge pull request #3493 from hashicorp/f-remove-atlas Remove Atlas and Scada from codebase	2017-11-02 16:00:44 -07:00
Michael Schurter	eedbe8efbb	Merge pull request #3490 from hashicorp/f-gc-logging Make unable-to-gc log level adaptive	2017-11-02 14:32:40 -07:00
Diptanu Choudhury	cb68889652	Added the node_id as a tag	2017-11-02 13:29:10 -07:00
Alex Dadgar	701f462d33	remove atlas	2017-11-02 11:27:21 -07:00
Michael Schurter	fc33c945be	Make unable-to-gc log level adaptive WARNing when someone has over 50 non-terminal allocs was just too confusing. Tested manually with `gc_max_allocs = 10` and bumping a job from `count = 19` to `count = 21`: ``` 2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations ... 2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations ```	2017-11-02 10:57:42 -07:00
Diptanu Choudhury	8a9d0d40b1	Added support for tagged metrics	2017-11-02 10:07:57 -07:00
Diptanu Choudhury	5f522c6de3	Incrementing the start counter when we are actually starting a container	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	44535e5d10	Recording counter for dead allocs properly	2017-11-02 09:51:20 -07:00
Diptanu Choudhury	0b34e811b7	Added metrics to track task/alloc start/restarts/dead events	2017-11-02 09:51:20 -07:00
Matt Mercer	00f90323c2	Qemu driver: defer cleanup sooner	2017-11-01 17:37:43 -07:00
Matt Mercer	43256af5f3	Qemu driver: clean up test logging; retry integration test for longer	2017-11-01 17:21:56 -07:00
Matt Mercer	b1145705d3	Use strings.Replace() instead of custom function	2017-11-01 15:31:35 -07:00
Matt Mercer	d51d174fa0	Qemu driver: basic testing of graceful shutdown feature	2017-11-01 15:31:30 -07:00
Matt Mercer	c26013ea0b	Qemu driver: include PIDs in log output	2017-11-01 15:31:24 -07:00
Matt Mercer	38d9a391aa	Qemu driver: ensure proper cleanup of resources	2017-11-01 15:31:20 -07:00
Matt Mercer	46f7e2fa4c	Qemu driver: minor logging fixes	2017-11-01 15:31:14 -07:00
Matt Mercer	4afb9dfa2d	Standardize driver.qemu logging prefix	2017-11-01 15:30:44 -07:00
Matt Mercer	5127e75569	Qemu driver: add graceful shutdown feature	2017-11-01 15:30:36 -07:00
Michael Schurter	1769db98b7	Fix regression by returning error on unknown alloc	2017-11-01 15:16:38 -05:00
Michael Schurter	9f26b9a403	Fix race in test	2017-11-01 15:16:38 -05:00
Michael Schurter	73e9b57908	Trigger GCs after alloc changes GC much more aggressively by triggering GCs when allocations become terminal as well as after new allocations are added.	2017-11-01 15:16:38 -05:00
Michael Schurter	2a81160dcd	Fix GC'd alloc tracking The Client.allocs map now contains all AllocRunners again, not just un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs allocs. Also stops logging "marked for GC" twice.	2017-11-01 15:16:38 -05:00
Alex Dadgar	c710550551	fix test	2017-10-30 12:35:31 -07:00
Alex Dadgar	4831380e57	Node access is done using locked Node copy Fixes https://github.com/hashicorp/nomad/issues/3454 Reliably reproduced the data race before by having a fingerprinter change the nodes attributes every millisecond and syncing at the same rate. With fix, did not ever panic.	2017-10-27 13:27:24 -07:00
Jonathan Ballet	5429d1c656	docker: changed OOM killed error message	2017-10-27 20:30:52 +02:00
Jonathan Ballet	12615bde9c	docker: log that a container has been killed by the OOM killer Fix: #2203 (at least for Docker tasks)	2017-10-27 18:05:27 +02:00
Alex Dadgar	f117eb28c7	go style vars	2017-10-25 10:49:34 -07:00
Alex Dadgar	3f8495dd0e	fix two flaky tests	2017-10-23 18:15:52 -07:00
Alex Dadgar	cb0d0ef009	move to consul freeport implementation	2017-10-23 16:51:40 -07:00
Alex Dadgar	dbc014b360	Standardize retrieving a free port into a helper package	2017-10-23 16:48:20 -07:00
Alex Dadgar	4a69e1ad15	don't double parallel	2017-10-23 16:48:06 -07:00
Alex Dadgar	96ca2bbe4c	respond to comments	2017-10-23 15:50:27 -07:00
Alex Dadgar	99c81b5848	Skip if no docker	2017-10-19 16:55:10 -07:00
Alex Dadgar	593536664e	fix flaky java tests	2017-10-19 16:49:57 -07:00
Alex Dadgar	4bc452b479	Undo darwin user setting	2017-10-19 16:49:57 -07:00
Alex Dadgar	c7c6964313	Run as user on mac	2017-10-19 16:49:57 -07:00
Alex Dadgar	55a1dffa2f	sudo docker works	2017-10-19 16:49:57 -07:00
Alex Dadgar	805e7b3b62	docker tests	2017-10-19 16:49:57 -07:00
Michael Schurter	797f49702e	Add logging around moby/moby#32648 bug	2017-10-18 10:44:03 -07:00
Michael Schurter	22ac450b2f	Properly fail rkt fingerprinting on old vesions	2017-10-16 13:58:58 -07:00
Michael Schurter	d7732c1a58	Squelch repeated rkt version warnings	2017-10-16 12:09:47 -07:00
Michael Schurter	b5fd075d74	Test fixes from #3383	2017-10-13 15:45:35 -07:00
Michael Schurter	b63eee17e9	Merge pull request #3383 from hashicorp/b-migrate-token base64 migrate token	2017-10-13 13:46:54 -07:00
Michael Schurter	dfd2967cdb	Merge pull request #3376 from hashicorp/f-node-acls Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc	2017-10-13 11:51:48 -07:00
Michael Schurter	15b991e039	base64 migrate token HTTP header values must be ASCII. Also constant time compare tokens and test the generate and compare helper functions.	2017-10-13 10:59:13 -07:00
Alex Dadgar	85178d6048	rkt remove allocid	2017-10-13 10:07:50 -07:00
Adam Stankiewicz	cefbc72b49	Remove AllocID from ExecutorContext	2017-10-13 17:07:49 +02:00
Michael Schurter	4a70d4356a	Alloc watcher must send Node.SecretID as AuthToken An auth token is required if ACLs are enabled	2017-10-12 16:38:02 -07:00
Michael Schurter	84d8a51be1	SecretID -> AuthToken	2017-10-12 15:16:33 -07:00
Michael Schurter	59ff94cd71	Don't panic on unexpeced Consul response Fixes #3326	2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo	e1c4701a43	fix up build warnings	2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo	b018ca4d46	fixing up code review comments	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	a77e462465	add tests for functionality	2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo	410adaf726	Add functionality for authenticated volumes	2017-10-11 17:09:20 -07:00
Alex Dadgar	6d3d0a9391	Nomad UI Command	2017-10-09 23:01:55 -07:00
Michael Schurter	f788974f8a	Merge pull request #3288 from simar7/qemu-improvements qemu: Add bound checks for memory assignment	2017-10-02 14:47:05 -07:00
Simarpreet Singh	d801584c46	qemu: Fix lower memory bound to 128M Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 14:29:44 -07:00
Simarpreet Singh	10d7d6dab0	gofmt: format qemu.go and qemu_test.go Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-10-02 13:16:48 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Michael Schurter	77f1fe40e7	Properly autodetect Docker IP in Windows Our Docker network plugin autodetection code was erroneously treating Window's default network `nat` as a plugin and defaulting to it instead of the host. Fixes #3218	2017-09-27 16:49:23 -07:00
Michael Schurter	a8a87af7ed	Only build rkt driver on linux Build stub for non-linux targets	2017-09-27 14:21:45 -07:00
Simarpreet Singh	3d99e71de8	qemu: Add bound checks for memory assignment Signed-off-by: Simarpreet Singh <simar@linux.com>	2017-09-26 21:07:48 -07:00
Michael Schurter	d7229ce6c5	Merge pull request #3256 from dalegaard/master Enable rkt driver to use address_mode = 'driver'	2017-09-26 18:04:37 -05:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Lasse Dalegaard	9f584d1114	Ignore rkt network failure if container died early If the container dies before the network can be read, we now ignore the error coming out of the network information polling loop. Nomad will restart the task regardless, so we might be masking the actual error. The polling loop for the rkt network information, inside the `Start` method, was getting a bit unwieldy. It's been refactored out so it's not a seperate function.	2017-09-27 00:15:27 +02:00
Lasse Dalegaard	b43ec57c02	Make rkt port mapping test not exit immediately The rkt port mapping test currently starts redis with --version, which obviously makes redis exit again almost immediately. This means that the container exists before the network status can be queried, and so the test fails.	2017-09-26 23:10:24 +02:00
Lasse Dalegaard	17d155d316	Improve rkt driver network status poll loop The network status poll loop will now report any networks it ignored, as well as a no-networks situations.	2017-09-26 21:49:45 +02:00
Lasse Dalegaard	bafd32fda0	Refactor rkt network status loop The network status poll loop for the rkt drivers `Start` method was a bit messy, and could not display the last encountered error. Here we clean it up.	2017-09-26 21:27:12 +02:00
Lasse Dalegaard	5e9e2b07bd	Small logging fix in rkt/driver	2017-09-26 19:36:13 +02:00
Lasse Dalegaard	3d25fd3b00	Bump minimum rkt version to 1.27.0. The changes introduces in #3256 require at least rkt 1.27.0 because of a bug in the JSON output of `rkt status` in previous versions. Here we upgrade all references to rkt's minimum version, and also make travis and vagrant use this version when running tests. Finally we add a CHANGELOG notice.	2017-09-26 19:15:43 +02:00
Lasse Dalegaard	f55f2b8f24	Turn rkt network status failure into Start failure If the rkt driver cannot get the network status, for a task with a configured port mapping, it will now fail the Start() call and kill the task instead of simply logging. This matches the Docker behavior. If no port map is specified, the warnings will be logged but the task will be allowed to start.	2017-09-26 10:20:57 +02:00
Lasse Dalegaard	55a2e60e1a	Test for rkt driver setting DriverNetwork To test that the rkt driver correctly sets a DriverNetwork, at least when a port mapping is requested, we amend the TestRktDriver_PortsMapping test with a small check.	2017-09-26 09:10:50 +02:00
Lasse Dalegaard	2d307d5beb	Discard errors from rkt status and cat-manifest Since we don't actually show these errors anywhere, just discard them right away.	2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo	b26454cf99	Move setGaugeForAllocationStats to emitClientMetrics	2017-09-25 16:05:49 +00:00
Lasse Dalegaard	cbcbe0da2e	Expose rkt DriverNetwork Currently the rkt driver does not expose a DriverNetwork instance after starting the container, which means that address_mode = 'driver' does not work. To get the container network information, we can call `rkt status` on the UUID of the container and grab the container IP from there. For the port map, we need to grab the pod manifest as it will tell us which ports the container exposes. We then cross-reference the configured port name with the container port names, and use that to create a correct port mapping. To avoid doing a (bad) reimplementation of the appc schema(which rkt uses for its manifest) and rkt apis, we pull those in as vendored dependencies. The versions used are the same ones that rkt use in their glide dependency configuration for version 1.28.0.	2017-09-21 00:34:22 +02:00
Lasse Dalegaard	7ac599d509	Use rkt prepare + run-prepared instead of run. The rkt driver currently executes run and asks that the pod UUID is written to a file that is then polled for changes for up to five seconds. Many container fetches will take longer than this, so this method will often not be able to track the pod UUID reliably. To avoid this problem, rkt allows pods to be first prepared, which will return their UUID, and then run as a second invocation. Here we convert the rkt driver's Start method to use this method instead. This way, the UUID will always be tracked correctly.	2017-09-21 00:17:31 +02:00
Michael Schurter	f92ffe5af5	Merge pull request #3105 from hashicorp/f-876-restart-unhealthy Restart unhealthy tasks	2017-09-17 19:38:32 -07:00
epipho	a16c97394f	Fix incorrect docker stats	2017-09-16 00:43:03 -04:00
Michael Schurter	67a4a169a9	Name const after what it represents	2017-09-15 14:57:18 -07:00
Michael Schurter	79a7bf3d7c	Cleanup and test restart failure code	2017-09-15 14:54:37 -07:00
Michael Schurter	06ca379da0	Add comments	2017-09-15 14:34:36 -07:00
Michael Schurter	4dbaa52aba	Fold SetFailure into SetRestartTriggered	2017-09-14 16:48:39 -07:00
Michael Schurter	ed77c0944b	DRY up restart handling a bit. All 3 error/failure cases share restart logic, but 2 of them have special cased conditions.	2017-09-14 16:48:39 -07:00
Michael Schurter	73fb71ca10	RestartDelay isn't needed as checks are re-added on restarts @dadgar made the excellent observation in #3105 that TaskRunner removes and re-registers checks on restarts. This means checkWatcher doesn't need to do any internal restart tracking. Individual checks can just remove themselves and be re-added when the task restarts.	2017-09-14 16:48:39 -07:00
Michael Schurter	06dd86adbd	Remove unused lastStart field	2017-09-14 16:47:41 -07:00
Michael Schurter	0447f79288	Removed partially implemented allocLock	2017-09-14 16:47:41 -07:00
Michael Schurter	ade29ecbed	Improve check watcher logging and add tests Also expose a mock Consul Agent to allow testing ServiceClient and checkWatcher from TaskRunner without actually talking to a real Consul.	2017-09-14 16:47:41 -07:00
Michael Schurter	a137676358	Add comments and move delay calc to TaskRunner	2017-09-14 16:46:54 -07:00
Michael Schurter	8a87475498	Use existing restart policy infrastructure	2017-09-14 16:46:54 -07:00
Michael Schurter	22690c5f4c	Add check watcher for restarting unhealthy tasks	2017-09-14 16:46:54 -07:00

1 2 3 4 5 ...

2673 commits