Commit graph

15534 commits

Author SHA1 Message Date
Preetha Appan ef9a71c68b
code review feedback 2019-07-10 10:41:06 -05:00
Michael Schurter 2cef0f019e
Merge pull request #5933 from hashicorp/f-connect-initial-docs
First pass at a Consul Connect example docs
2019-07-10 14:37:35 +02:00
Michael Schurter d3157160ce website: link to nick's talk 2019-07-10 09:20:37 +02:00
Michael Schurter 75936652c0 website: mention cni plugin requirement 2019-07-10 09:13:10 +02:00
Michael Schurter 58e14ffa66 website: Add link to connect tp download 2019-07-09 17:01:35 +02:00
Preetha Appan 990e468edc
Populate task event struct with kill timeout
This makes for a nicer task event message
2019-07-09 09:37:09 -05:00
Michael Schurter 5594739eb4 website: switch to prettier demo 2019-07-09 14:44:35 +02:00
Michael Schurter af0e7b8495 website: link to consul 1.6 beta 2019-07-08 22:20:02 +02:00
Preetha Appan 108a292cc0
fix linting failure in test case file 2019-07-08 11:29:12 -05:00
Buck Doyle 4edd1d78c1
Remove superfluous test attributes (#5927)
I found while working on #5926 that x-icon already adds
assertion-compatible selectors, so these wrappers are
unnecessary.
2019-07-08 10:36:56 -05:00
Michael Schurter 1ef8b37d8d website: minor connect improvements 2019-07-08 13:31:07 +02:00
Renaud Gaubert 02ff3a5ac2 Updated tensorrt demo to use the official nvidia image
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2019-07-07 16:04:52 -07:00
Nick Ethier c6064c663a
website: change consul and nomad start up to reflect changes 2019-07-07 06:07:26 -04:00
Preetha Appan 1206c895f6
First pass at a Consul Connect example docs 2019-07-06 10:50:02 -05:00
Preetha Appan 53397722f1
add module version constraint to e2e/terraform 2019-07-05 09:18:38 -05:00
Jabi 6ce262856e Fix typo (#5922) 2019-07-04 10:49:15 -05:00
Buck Doyle d5232ecf78 Merge branch 'master' into f-ui/alloc-fs 2019-07-04 10:09:19 -05:00
Jasmine Dahilig f65ee56b3b update changelog 2019-07-03 14:00:53 -07:00
Jasmine Dahilig 1c1e81b294
Merge pull request #5846 from hashicorp/f-docker-log-constraints
add log rotation to docker driver log defaults
2019-07-03 10:17:19 -07:00
Michael Lange a09c006e39
Merge pull request #5915 from hashicorp/b-fix-json-key-casing
Use consistent casing in the JSON representation of the AllocFileInfo struct
2019-07-03 09:48:43 -07:00
Jasmine Dahilig cece83dd9c default to json-file log rotation for docker driver 2019-07-03 09:04:45 -07:00
Michael Lange b2e9570075
Use consistent casing in the JSON representation of the AllocFileInfo struct 2019-07-02 17:27:31 -07:00
Michael Lange 1eb689aca9 Merge remote-tracking branch 'origin/master' into f-ui/alloc-fs
* origin/master: (32 commits)
  Added additional test cases and fixed go test case
  update changelog
  Add Mirage-toggling via environment variable (#5899)
  changelog: Add entries for windows fixes
  fifo: Safer access to Conn
  run post-run/post-stop task runner hooks
  Fail alloc if alloc runner prestart hooks fail
  address review comments
  changelog
  Missed one revert of backwards compatibility for node drain
  Improve test cases for detecting content type
  Undo removal of node drain compat changes
  Updated with suggestions.
  fifo: Close connections and cleanup lock handling
  logmon: Add windows compatibility test
  client: defensive against getting stale alloc updates
  Infer content type in alloc fs stat endpoint
  appveyor: Run logmon tests
  fifo: Require that fifos do not exist for create
  vendor: Use dani fork of go-winio
  ...
2019-07-02 16:40:09 -07:00
Michael Lange c6ef91edc7 Merge branch 'master' into f-ui/alloc-fs
* master:
  make purge parameter lowercase (#5895)
  tr: Fetch Wait channel before killTask in restart
2019-07-02 15:47:25 -07:00
Buck Doyle 595eb480ba
UI: Add allocation directory rendering (#5873)
This lets users navigate the allocation filesystem. It doesn’t
support viewing actual files yet.
2019-07-02 16:42:38 -05:00
Preetha 702072e5aa
Merge pull request #5913 from hashicorp/f-fix-contenttype-tests
Fixed test case for detecting content type
2019-07-02 14:41:22 -05:00
Preetha Appan 8495fb9055
Added additional test cases and fixed go test case 2019-07-02 13:25:29 -05:00
Michael Schurter 803aa62b7a systemd: set a high but non-infinite fd limit 2019-07-02 09:13:24 -07:00
Preetha Appan 249a13e492
update changelog 2019-07-02 09:50:34 -05:00
Preetha 5b83cd4ce0
Merge pull request #5894 from hashicorp/f-remove-deprecated-code
Remove deprecated code
2019-07-02 09:29:24 -05:00
Buck Doyle 100433b08a
Add Mirage-toggling via environment variable (#5899)
I’m finding myself having to revert my change to this
variable when I switch branches, so this would let me
affect the variable without code changes.
2019-07-02 08:58:43 -05:00
Mahmood Ali a97d451ac7
Merge pull request #5905 from hashicorp/b-ar-failed-prestart
Fail alloc if alloc runner prestart hooks fail
2019-07-02 20:25:53 +08:00
Danielle Lancashire 8e69783dbe
changelog: Add entries for windows fixes 2019-07-02 14:01:54 +02:00
Danielle c6872cdf12
Merge pull request #5864 from hashicorp/dani/win-pipe-cleaner
windows: Fix restarts using the raw_exec driver
2019-07-02 13:58:56 +02:00
Danielle Lancashire e20300313f
fifo: Safer access to Conn 2019-07-02 13:12:54 +02:00
Mahmood Ali f10201c102 run post-run/post-stop task runner hooks
Handle when prestart failed while restoring a task, to prevent
accidentally leaking consul/logmon processes.
2019-07-02 18:38:32 +08:00
Mahmood Ali 4afd7835e3 Fail alloc if alloc runner prestart hooks fail
When an alloc runner prestart hook fails, the task runners aren't invoked
and they remain in a pending state.

This leads to terrible results, some of which are:
* Lockup in GC process as reported in https://github.com/hashicorp/nomad/pull/5861
* Lockup in shutdown process as TR.Shutdown() waits for WaitCh to be closed
* Alloc not being restarted/rescheduled to another node (as it's still in
  pending state)
* Unexpected restart of alloc on a client restart, potentially days/weeks after
  alloc expected start time!

Here, we treat all tasks to have failed if alloc runner prestart hook fails.
This fixes the lockups, and permits the alloc to be rescheduled on another node.

While it's desirable to retry alloc runner in such failures, I opted to treat it
out of scope.  I'm afraid of some subtles about alloc and task runners and their
idempotency that's better handled in a follow up PR.

This might be one of the root causes for
https://github.com/hashicorp/nomad/issues/5840 .
2019-07-02 18:35:47 +08:00
Mahmood Ali 7614b8f09e
Merge pull request #5890 from hashicorp/b-dont-start-completed-allocs-2
task runner to avoid running task if terminal
2019-07-02 15:31:17 +08:00
Mahmood Ali 7bfad051b9 address review comments 2019-07-02 14:53:50 +08:00
Mahmood Ali c0c00ecc07
Merge pull request #5906 from hashicorp/b-alloc-stale-updates
client: defensive against getting stale alloc updates
2019-07-02 12:40:17 +08:00
Preetha Appan c2116cbf09
changelog 2019-07-01 16:59:37 -05:00
Preetha 50bf20bcfc
Merge pull request #5907 from hashicorp/f-infer-contenttype
Infer content type in alloc fs stat endpoint
2019-07-01 16:54:32 -05:00
Preetha Appan 3cb798235d
Missed one revert of backwards compatibility for node drain 2019-07-01 16:46:05 -05:00
Preetha Appan c09342903b
Improve test cases for detecting content type 2019-07-01 16:24:48 -05:00
Preetha Appan aa2b4b4e00
Undo removal of node drain compat changes
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Yishan Lin cd8fc7c983
Merge pull request #5804 from hashicorp/yishan/revised-enterprise-docs
Revised Nomad Enterprise page
2019-07-01 10:41:32 -07:00
Yishan Lin 92f36ed021 Updated with suggestions. 2019-07-01 10:39:35 -07:00
Danielle Lancashire 688f82f07d
fifo: Close connections and cleanup lock handling 2019-07-01 14:14:29 +02:00
Danielle Lancashire 2c7d1f1b99
logmon: Add windows compatibility test 2019-07-01 14:14:06 +02:00
Mahmood Ali c5f5a1fcb9 client: defensive against getting stale alloc updates
When fetching node alloc assignments, be defensive against a stale read before
killing local nodes allocs.

The bug is when both client and servers are restarting and the client requests
the node allocation for the node, it might get stale data as server hasn't
finished applying all the restored raft transaction to store.

Consequently, client would kill and destroy the alloc locally, just to fetch it
again moments later when server store is up to date.

The bug can be reproduced quite reliably with single node setup (configured with
persistence).  I suspect it's too edge-casey to occur in production cluster with
multiple servers, but we may need to examine leader failover scenarios more closely.

In this commit, we only remove and destroy allocs if the removal index is more
recent than the alloc index. This seems like a cheap resiliency fix we already
use for detecting alloc updates.

A more proper fix would be to ensure that a nomad server only serves
RPC calls when state store is fully restored or up to date in leadership
transition cases.
2019-06-29 04:17:35 -05:00