Commit graph

15353 commits

Author SHA1 Message Date
Renaud Gaubert 02ff3a5ac2 Updated tensorrt demo to use the official nvidia image
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2019-07-07 16:04:52 -07:00
Preetha Appan 53397722f1
add module version constraint to e2e/terraform 2019-07-05 09:18:38 -05:00
Jabi 6ce262856e Fix typo (#5922) 2019-07-04 10:49:15 -05:00
Jasmine Dahilig f65ee56b3b update changelog 2019-07-03 14:00:53 -07:00
Jasmine Dahilig 1c1e81b294
Merge pull request #5846 from hashicorp/f-docker-log-constraints
add log rotation to docker driver log defaults
2019-07-03 10:17:19 -07:00
Michael Lange a09c006e39
Merge pull request #5915 from hashicorp/b-fix-json-key-casing
Use consistent casing in the JSON representation of the AllocFileInfo struct
2019-07-03 09:48:43 -07:00
Jasmine Dahilig cece83dd9c default to json-file log rotation for docker driver 2019-07-03 09:04:45 -07:00
Michael Lange b2e9570075
Use consistent casing in the JSON representation of the AllocFileInfo struct 2019-07-02 17:27:31 -07:00
Preetha 702072e5aa
Merge pull request #5913 from hashicorp/f-fix-contenttype-tests
Fixed test case for detecting content type
2019-07-02 14:41:22 -05:00
Preetha Appan 8495fb9055
Added additional test cases and fixed go test case 2019-07-02 13:25:29 -05:00
Preetha Appan 249a13e492
update changelog 2019-07-02 09:50:34 -05:00
Preetha 5b83cd4ce0
Merge pull request #5894 from hashicorp/f-remove-deprecated-code
Remove deprecated code
2019-07-02 09:29:24 -05:00
Buck Doyle 100433b08a
Add Mirage-toggling via environment variable (#5899)
I’m finding myself having to revert my change to this
variable when I switch branches, so this would let me
affect the variable without code changes.
2019-07-02 08:58:43 -05:00
Mahmood Ali a97d451ac7
Merge pull request #5905 from hashicorp/b-ar-failed-prestart
Fail alloc if alloc runner prestart hooks fail
2019-07-02 20:25:53 +08:00
Danielle Lancashire 8e69783dbe
changelog: Add entries for windows fixes 2019-07-02 14:01:54 +02:00
Danielle c6872cdf12
Merge pull request #5864 from hashicorp/dani/win-pipe-cleaner
windows: Fix restarts using the raw_exec driver
2019-07-02 13:58:56 +02:00
Danielle Lancashire e20300313f
fifo: Safer access to Conn 2019-07-02 13:12:54 +02:00
Mahmood Ali f10201c102 run post-run/post-stop task runner hooks
Handle when prestart failed while restoring a task, to prevent
accidentally leaking consul/logmon processes.
2019-07-02 18:38:32 +08:00
Mahmood Ali 4afd7835e3 Fail alloc if alloc runner prestart hooks fail
When an alloc runner prestart hook fails, the task runners aren't invoked
and they remain in a pending state.

This leads to terrible results, some of which are:
* Lockup in GC process as reported in https://github.com/hashicorp/nomad/pull/5861
* Lockup in shutdown process as TR.Shutdown() waits for WaitCh to be closed
* Alloc not being restarted/rescheduled to another node (as it's still in
  pending state)
* Unexpected restart of alloc on a client restart, potentially days/weeks after
  alloc expected start time!

Here, we treat all tasks to have failed if alloc runner prestart hook fails.
This fixes the lockups, and permits the alloc to be rescheduled on another node.

While it's desirable to retry alloc runner in such failures, I opted to treat it
out of scope.  I'm afraid of some subtles about alloc and task runners and their
idempotency that's better handled in a follow up PR.

This might be one of the root causes for
https://github.com/hashicorp/nomad/issues/5840 .
2019-07-02 18:35:47 +08:00
Mahmood Ali 7614b8f09e
Merge pull request #5890 from hashicorp/b-dont-start-completed-allocs-2
task runner to avoid running task if terminal
2019-07-02 15:31:17 +08:00
Mahmood Ali 7bfad051b9 address review comments 2019-07-02 14:53:50 +08:00
Mahmood Ali c0c00ecc07
Merge pull request #5906 from hashicorp/b-alloc-stale-updates
client: defensive against getting stale alloc updates
2019-07-02 12:40:17 +08:00
Preetha Appan c2116cbf09
changelog 2019-07-01 16:59:37 -05:00
Preetha 50bf20bcfc
Merge pull request #5907 from hashicorp/f-infer-contenttype
Infer content type in alloc fs stat endpoint
2019-07-01 16:54:32 -05:00
Preetha Appan 3cb798235d
Missed one revert of backwards compatibility for node drain 2019-07-01 16:46:05 -05:00
Preetha Appan c09342903b
Improve test cases for detecting content type 2019-07-01 16:24:48 -05:00
Preetha Appan aa2b4b4e00
Undo removal of node drain compat changes
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Yishan Lin cd8fc7c983
Merge pull request #5804 from hashicorp/yishan/revised-enterprise-docs
Revised Nomad Enterprise page
2019-07-01 10:41:32 -07:00
Yishan Lin 92f36ed021 Updated with suggestions. 2019-07-01 10:39:35 -07:00
Danielle Lancashire 688f82f07d
fifo: Close connections and cleanup lock handling 2019-07-01 14:14:29 +02:00
Danielle Lancashire 2c7d1f1b99
logmon: Add windows compatibility test 2019-07-01 14:14:06 +02:00
Mahmood Ali c5f5a1fcb9 client: defensive against getting stale alloc updates
When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
2019-06-29 04:17:35 -05:00
Preetha Appan 3345ce3ba4
Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
Danielle Lancashire e1151f743b
appveyor: Run logmon tests 2019-06-28 16:01:41 +02:00
Danielle Lancashire 634ada671e
fifo: Require that fifos do not exist for create
Although this operation is safe on linux, it is not safe on Windows when
using the named pipe interface. To provide a ~reasonable common api
abstraction, here we switch to returning File exists errors on the unix
api.
2019-06-28 13:47:18 +02:00
Danielle Lancashire 0ff27cfc0f
vendor: Use dani fork of go-winio 2019-06-28 13:47:18 +02:00
Danielle Lancashire 514a2a6017
logmon: Refactor fifo access for windows safety
On unix platforms, it is safe to re-open fifo's for reading after the
first creation if the file is already a fifo, however this is not
possible on windows where this triggers a permissions error on the
socket path, as you cannot recreate it.

We can't transparently handle this in the CreateAndRead handle, because
the Access Is Denied error is too generic to reliably be an IO error.
Instead, we add an explict API for opening a reader to an existing FIFO,
and check to see if the fifo already exists inside the calling package
(e.g logmon)
2019-06-28 13:41:54 +02:00
Michael Lange 4884780b2a
Merge pull request #5902 from hashicorp/b-ui/allocation-magnifying-glass
UI: Account for the search icon within the is-compact modifier
2019-06-27 14:37:14 -07:00
Michael Lange aedeeadebd Account for the search icon within the is-compact modifer 2019-06-27 12:32:26 -07:00
Omar Khawaja b9f0407f17
make purge parameter lowercase (#5895) 2019-06-27 14:07:25 -04:00
Mahmood Ali 3d89ae0f1e task runner to avoid running task if terminal
This change fixes a bug where nomad would avoid running alloc tasks if
the alloc is client terminal but the server copy on the client isn't
marked as running.

Here, we fix the case by having task runner uses the
allocRunner.shouldRun() instead of only checking the server updated
alloc.

Here, we preserve much of the invariants such that `tr.Run()` is always
run, and don't change the overall alloc runner and task runner
lifecycles.

Fixes https://github.com/hashicorp/nomad/issues/5883
2019-06-27 11:27:34 +08:00
Preetha Appan f6fc5d40d1
one more drain test 2019-06-26 17:33:51 -05:00
Preetha Appan 67bf66efc6
remove now unneeded test 2019-06-26 16:59:23 -05:00
Preetha Appan 3484f18984
Fix more tests 2019-06-26 16:30:53 -05:00
Preetha Appan ff1b80dba6
Fix node drain test 2019-06-26 16:12:07 -05:00
Preetha Appan 23319e04d6
Restore accidentally deleted block 2019-06-26 13:59:14 -05:00
Danielle d6b8a0a290
Merge pull request #5889 from hashicorp/dani/b-task-restart
tr: Fetch Wait channel before killTask in restart
2019-06-26 16:18:08 +02:00
Danielle Lancashire b9ac184e1f
tr: Fetch Wait channel before killTask in restart
Currently, if killTask results in the termination of a process before
calling WaitTask, Restart() will incorrectly return a TaskNotFound
error when using the raw_exec driver on Windows.
2019-06-26 15:20:57 +02:00
Preetha Appan 66fa6a67ec
newline 2019-06-25 19:41:09 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00