open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Lange	dcc219fe73	Show preemptions on the job plan phase of job submission	2019-04-22 16:40:01 -07:00
Michael Lange	cb11f46ecf	Data modeling for preemptions	2019-04-22 16:40:00 -07:00
Chris Baker	812abe153f	Merge pull request #5591 from hashicorp/cgbaker/changelog changelog: added entry for #5540 fix	2019-04-22 15:31:22 -04:00
Michael Schurter	12ccadcbd0	Merge pull request #5586 from hashicorp/docs-deploy-ver docs: bump deployment guide to 0.9.0	2019-04-22 12:29:22 -07:00
Chris Baker	0baf547059	changelog: added entry for #5540 fix	2019-04-22 19:27:40 +00:00
Chris Baker	91c4e1eabb	Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics client/metrics: fixed stale metrics	2019-04-22 15:07:30 -04:00
Mahmood Ali	f515b93b5e	Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable logging: Attempt to recover logmon failures	2019-04-22 14:40:24 -04:00
Michael Schurter	61f17a1043	tweak logging level for failed log line Co-Authored-By: notnoop <mahmood@notnoop.com>	2019-04-22 14:40:17 -04:00
Chris Baker	0b1a4dd206	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Michael Schurter	6e43f72a12	docs: bump deployment guide to 0.9.0	2019-04-19 12:39:38 -07:00
Michael Schurter	26f3bdbf8f	Merge pull request #5583 from ygersie/fingerprint_nilpointer fix nil pointer in fingerprinting AWS env leading to crash	2019-04-19 08:08:59 -07:00
Mahmood Ali	6b8f855c14	Merge pull request #5437 from hashicorp/r-upstream-libcontainer-plain Use upstream libcontainer package	2019-04-19 10:15:13 -04:00
Mahmood Ali	6014a884be	comment on using init() for libcontainer handling	2019-04-19 09:49:04 -04:00
Mahmood Ali	4322055301	comment what refer to	2019-04-19 09:49:04 -04:00
Mahmood Ali	18993421f2	Move libcontainer helper to executor package	2019-04-19 09:49:04 -04:00
Mahmood Ali	e0c7063697	vendor upstream opencontainers/runc	2019-04-19 09:49:04 -04:00
Mahmood Ali	97aba5ad20	Merge pull request #5585 from hashicorp/b-drivers-node-registration client: wait for batched driver updates before registering nodes	2019-04-19 09:47:21 -04:00
Mahmood Ali	902eed4bf9	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	f74d60439f	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Mahmood Ali	6bdc9860b7	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Preetha	a9327e58fb	Update CHANGELOG.md	2019-04-19 08:02:48 -05:00
Mahmood Ali	f82ea8824f	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00
Yorick Gersie	95f81f3eeb	fix nil pointer in fingerprinting AWS env leading to crash HTTP Client returns a nil response if an error has occured. We first need to check for an error before being able to check the HTTP response code.	2019-04-19 11:07:13 +02:00
Preetha	4fdd82c601	Merge pull request #5580 from hashicorp/f-api-preemption-info Add preemption related fields to AllocationListStub	2019-04-18 18:38:25 -07:00
Preetha Appan	22109d1e20	Add preemption related fields to AllocationListStub	2019-04-18 10:36:44 -05:00
Danielle	72862db778	Merge pull request #5572 from hashicorp/dani/b-docker-volumes Switch to pre-0.9 behaviour for handling volumes	2019-04-18 15:48:23 +02:00
Danielle	be7daaaf15	Merge pull request #5573 from hashicorp/dani/update-vol-docs docs: Clarify docker volume behaviour	2019-04-18 14:30:16 +02:00
Danielle Lancashire	a096a7f112	Switch to pre-0.9 behaviour for handling volumes In Nomad 0.9, we made volume driver handling the same for `""`, and `"local"` volumes. Prior to Nomad 0.9 however these had slightly different behaviour for relative paths and named volumes. Prior to 0.9 the empty string would expand relative paths within the task dir, and `"local"` volumes that are not absolute paths would be treated as docker named volumes. This commit reverts to the previous behaviour as follows: \| Nomad Version \| Driver \| Volume Spec \| Behaviour \| \|------------------------------------------------------------------------- \| all \| "" \| testing:/testing \| allocdir/testing \| \| 0.8.7 \| "local" \| testing:/testing \| "testing" as named volume \| \| 0.9.0 \| "local" \| testing:/testing \| allocdir/testing \| \| 0.9.1 \| "local" \| testing:/testing \| "testing" as named volume \|	2019-04-18 14:28:45 +02:00
Danielle Lancashire	c31966fc71	loggging: Attempt to recover logmon failures Currently, when logmon fails to reattach, we will retry reattachment to the same pid until the task restart specification is exhausted. Because we cannot clear hook state during error conditions, it is not possible for us to signal to a future restart that it _shouldn't_ attempt to reattach to the plugin. Here we revert to explicitly detecting reattachment seperately from a launch of a new logmon, so we can recover from scenarios where a logmon plugin has failed. This is a net improvement over the current hard failure situation, as it means in the most common case (the pid has gone away), we can recover. Other reattachment failure modes where the plugin may still be running could potentially cause a duplicate process, or a subsequent failure to launch a new plugin. If there was a duplicate process, it could potentially cause duplicate logging. This is better than a production workload outage. If there was a subsequent failure to launch a new plugin, it would fail in the same (retry until restarts are exhausted) as the current failure mode.	2019-04-18 13:41:56 +02:00
Chris Baker	338d4e989d	Merge pull request #5559 from ArangoGutierrez/website_docs_singularity list singularity as a community driver	2019-04-17 12:42:29 -04:00
Charlie Voiselle	7f01244ece	fixed header level	2019-04-17 10:12:43 -04:00
Danielle Lancashire	1e0d3ffe24	docs: Clairfy docker volume behaviour	2019-04-17 11:31:55 +02:00
Mahmood Ali	12a9896a7e	Merge pull request #5568 from hashicorp/b-nomad-logger-restart Fixes #5566 . Fix a case where docker logging process may lock up nomad agent restart. Looks like we have a case where docker logger is started even through logmon isn't. In such case, the fifo writer blocks indefinitely and because the open operation happens in the main goroutine, nomad agent blocks indefinitely. This fixes the issue where the fifo open operation happens in goroutine instead of main goroutine. We should follow up independently to ensure logmon <-> dockerlogger ordering and consider having task recovery happen in non-main goroutine with some sensible timeouts.	2019-04-16 19:34:37 -04:00
Eduardo Arango	40d0af5422	resolve merge conflicts Signed-off-by: Eduardo Arango <eduardo@sylabs.io>	2019-04-16 17:01:22 -05:00
Eduardo Arango	6934b98313	address @cgbaker comments Signed-off-by: Eduardo Arango <eduardo@sylabs.io>	2019-04-16 16:59:59 -05:00
Michael Schurter	3ba39e7c76	Merge pull request #5479 from hashicorp/b-vault-renewal vault: fix renewal time	2019-04-16 12:20:26 -07:00
Michael Schurter	6421c55384	changelog: add #5479	2019-04-16 11:23:28 -07:00
Michael Schurter	a85e7b7cc9	vault: fix data races	2019-04-16 11:22:44 -07:00
Michael Schurter	0aeb3dbd86	vault: fix renewal time Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal time could be very rapid or within 1s of the deadline: [10s, lease) This commit fixes the renewal time by calculating it as: (lease/2) +/- 10s For a lease of 60s this means the renewal will occur in [20s, 40s).	2019-04-16 11:22:44 -07:00
Mahmood Ali	01a13a0947	locking and opening streams in goroutine comment	2019-04-16 11:02:19 -04:00
Mahmood Ali	357b86adc3	open fifo on background goroutine	2019-04-15 21:20:09 -04:00
Michael Schurter	f7a7acc345	Merge pull request #5518 from hashicorp/f-simplify-kill client: simplify kill logic	2019-04-15 14:11:58 -07:00
Michael Schurter	373748a327	Merge pull request #5486 from hashicorp/b-validate-migrate api: fix migrate stanza initialization	2019-04-15 09:44:59 -07:00
Danielle	a34b950a89	Merge pull request #5565 from hashicorp/dani/alloc-restart-docs docs: Add docs for nomad-alloc-restart	2019-04-15 17:26:28 +02:00
Danielle Lancashire	3aef4343ae	docs: Add docs for nomad-alloc-restart	2019-04-15 17:21:06 +02:00
Chris Baker	a73d7e797b	Update singularity.html.md	2019-04-15 09:49:30 -04:00
Chris Baker	5b66a00689	Merge pull request #5560 from hashicorp/f-3251-cli-force-periodic cli: add support for periodic force evaluation	2019-04-15 09:40:35 -04:00
Danielle Lancashire	60d7fc4bf5	Update CHANGELOG Add `nomad alloc restart` and `nomad status -verbose`	2019-04-15 11:14:51 +02:00
Eduardo Arango	c9bae637f2	Merge branch 'website_docs_singularity' of github.com:ArangoGutierrez/nomad into website_docs_singularity	2019-04-12 16:27:33 -05:00
Eduardo Arango	7ada6a2c4c	address requestec changes, iteration 1 Signed-off-by: Eduardo Arango <eduardo@sylabs.io>	2019-04-12 16:26:52 -05:00

1 2 3 4 5 ...

14666 Commits All Branches Search

14666 Commits

All Branches