open-nomad

Author	SHA1	Message	Date
Michael Schurter	dadf461c0f	Merge pull request #5599 from hashicorp/docs-091rc1 docs: add download link to 0.9.1-rc1	2019-04-23 08:54:41 -07:00
Michael Schurter	c19ffd359f	docs: add download link to 0.9.1-rc1	2019-04-23 08:47:21 -07:00
Nick Ethier	2c9c9dbd67	website: add plugin docs (#5501 ) website: add plugin docs	2019-04-23 11:22:08 -04:00
Nick Ethier	c9bbdbf208	website: fixs a few errors in new plugin docs	2019-04-23 11:15:26 -04:00
Mahmood Ali	8b778f832d	Merge pull request #5598 from hashicorp/b-dont-forward-logs fix crash when executor parent nomad process dies	2019-04-23 10:15:30 -04:00
Mahmood Ali	60ee243149	fix crash when executor parent nomad process dies Fixes https://github.com/hashicorp/nomad/issues/5593 Executor seems to die unexpectedly after nomad agent dies or is restarted. The crash seems to occur at the first log message after the nomad agent dies. To ease debugging we forward executor log messages to executor.log as well as to Stderr. `go-plugin` sets up plugins with Stderr pointing to a pipe being read by plugin client, the nomad agent in our case[1]. When the nomad agent dies, the pipe is closed, and any subsequent executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE results into executor process dying. I considered adding a handler to ignore SIGPIPE, but hc-log library currently panics when logging write operation fails[2] This we opt to revert to v0.8 behavior of exclusively writing logs to executor.log, while we investigate alternative options. [1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535 [2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323	2019-04-23 09:52:46 -04:00
Danielle Lancashire	fde36992f1	api docs: Add allocation stop	2019-04-23 13:28:21 +02:00
Danielle Lancashire	5fe3b99ec8	docs: Add documentation for	2019-04-23 13:22:27 +02:00
Danielle Lancashire	3b6bda04e2	changelog: Update for GH-5512 and GH-5577	2019-04-23 13:12:08 +02:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Michael Lange	f530c2f5c1	Updated serializer unit tests	2019-04-22 17:20:52 -07:00
Michael Lange	35e34fea8b	Test coverage for preemption on the client detail page	2019-04-22 16:40:10 -07:00
Michael Lange	b7860a9bca	Test coverage for preemption on the allocation detail page	2019-04-22 16:40:09 -07:00
Michael Lange	29ccd8bcc5	Preemption modeling as page objects	2019-04-22 16:40:08 -07:00
Michael Lange	5124dfe30f	Integration test for the alloc row icon	2019-04-22 16:40:07 -07:00
Michael Lange	000bfce30f	Add preemption properties to Mirage allocation factory	2019-04-22 16:40:07 -07:00
Michael Lange	4c7e350e84	Show which allocations an allocation preempted on the alloc page	2019-04-22 16:40:06 -07:00
Michael Lange	42a4793d9d	Show which alloc, if any, preempted an alloc on the alloc detail page	2019-04-22 16:40:05 -07:00
Michael Lange	a5a659a98a	Preemptions count and filtering on client detail page Show the count in the allocations table next to the existing total alloc count badge. Clicking either will filter by all or by preemptions.	2019-04-22 16:40:04 -07:00
Michael Lange	1266567098	Add preempted icon to alloc row	2019-04-22 16:40:04 -07:00
Michael Lange	e35139e453	Make sure tooltips show up over the top of the side bar	2019-04-22 16:40:03 -07:00
Michael Lange	d12d5f9163	Add wasPreempted bool to allocs	2019-04-22 16:40:02 -07:00
Michael Lange	dcc219fe73	Show preemptions on the job plan phase of job submission	2019-04-22 16:40:01 -07:00
Michael Lange	cb11f46ecf	Data modeling for preemptions	2019-04-22 16:40:00 -07:00
Chris Baker	812abe153f	Merge pull request #5591 from hashicorp/cgbaker/changelog changelog: added entry for #5540 fix	2019-04-22 15:31:22 -04:00
Michael Schurter	12ccadcbd0	Merge pull request #5586 from hashicorp/docs-deploy-ver docs: bump deployment guide to 0.9.0	2019-04-22 12:29:22 -07:00
Chris Baker	0baf547059	changelog: added entry for #5540 fix	2019-04-22 19:27:40 +00:00
Chris Baker	91c4e1eabb	Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics client/metrics: fixed stale metrics	2019-04-22 15:07:30 -04:00
Mahmood Ali	f515b93b5e	Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable logging: Attempt to recover logmon failures	2019-04-22 14:40:24 -04:00
Michael Schurter	61f17a1043	tweak logging level for failed log line Co-Authored-By: notnoop <mahmood@notnoop.com>	2019-04-22 14:40:17 -04:00
Chris Baker	0b1a4dd206	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Lang Martin	8aa97cff13	tests over setwise equality of fingerprinted parts	2019-04-19 15:49:24 -04:00
Michael Schurter	6e43f72a12	docs: bump deployment guide to 0.9.0	2019-04-19 12:39:38 -07:00
Lang Martin	7de6e28ddc	structs need to keep assert Equal interface implementation for tests	2019-04-19 15:23:49 -04:00
Lang Martin	977d33970b	structs equals use labeled continue for clarity	2019-04-19 15:23:48 -04:00
Lang Martin	7b99488afa	struct equals use a working pattern for setwise comparison	2019-04-19 15:23:48 -04:00
Lang Martin	eba4e29440	client fingerprinter doesn't overwrite manual configuration Revert "Revert accidental merge of pr #5482" This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.	2019-04-19 15:23:48 -04:00
Michael Schurter	26f3bdbf8f	Merge pull request #5583 from ygersie/fingerprint_nilpointer fix nil pointer in fingerprinting AWS env leading to crash	2019-04-19 08:08:59 -07:00
Mahmood Ali	6b8f855c14	Merge pull request #5437 from hashicorp/r-upstream-libcontainer-plain Use upstream libcontainer package	2019-04-19 10:15:13 -04:00
Mahmood Ali	6014a884be	comment on using init() for libcontainer handling	2019-04-19 09:49:04 -04:00
Mahmood Ali	4322055301	comment what refer to	2019-04-19 09:49:04 -04:00
Mahmood Ali	18993421f2	Move libcontainer helper to executor package	2019-04-19 09:49:04 -04:00
Mahmood Ali	e0c7063697	vendor upstream opencontainers/runc	2019-04-19 09:49:04 -04:00
Mahmood Ali	97aba5ad20	Merge pull request #5585 from hashicorp/b-drivers-node-registration client: wait for batched driver updates before registering nodes	2019-04-19 09:47:21 -04:00
Mahmood Ali	902eed4bf9	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	f74d60439f	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Mahmood Ali	6bdc9860b7	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Preetha	a9327e58fb	Update CHANGELOG.md	2019-04-19 08:02:48 -05:00
Mahmood Ali	f82ea8824f	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00

... 2 3 4 5 6 ...

14847 commits