Commit graph

15150 commits

Author SHA1 Message Date
Mahmood Ali fd8fb8c22b Stop allocs to be rescheduled
Currently, when an alloc fails and is rescheduled, the alloc desired
state remains as "run" and the nomad client may not free the resources.

Here, we ensure that an alloc is marked as stopped when it's
rescheduled.

Notice the Desired Status and Description before and after this change:

Before:
```
mars-2:nomad notnoop$ nomad alloc status 02aba49e
ID                   = 02aba49e
Eval ID              = bb9ed1d2
Name                 = example-reschedule.nodes[0]
Node ID              = 5853d547
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = run
Desired Description  = <none>
Created              = 10s ago
Modified             = 5s ago
Replacement Alloc ID = d6bf872b

Task "payload" is "dead"
Task Resources
CPU        Memory          Disk     Addresses
0/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:12:45Z
Finished At    = 2019-06-06T21:12:50Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:12:50-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:12:50-04:00  Terminated      Exit Code: 1
2019-06-06T17:12:45-04:00  Started         Task started by client
2019-06-06T17:12:45-04:00  Task Setup      Building Task Directory
2019-06-06T17:12:45-04:00  Received        Task received by client

```

After:

```
ID                   = 5001ccd1
Eval ID              = 53507a02
Name                 = example-reschedule.nodes[0]
Node ID              = a3b04364
Node Name            = mars-2.local
Job ID               = example-reschedule
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 13s ago
Modified             = 3s ago
Replacement Alloc ID = 7ba7ac20

Task "payload" is "dead"
Task Resources
CPU         Memory          Disk     Addresses
21/100 MHz  24 MiB/300 MiB  300 MiB

Task Events:
Started At     = 2019-06-06T21:22:50Z
Finished At    = 2019-06-06T21:22:55Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type            Description
2019-06-06T17:22:55-04:00  Not Restarting  Policy allows no restarts
2019-06-06T17:22:55-04:00  Terminated      Exit Code: 1
2019-06-06T17:22:50-04:00  Started         Task started by client
2019-06-06T17:22:50-04:00  Task Setup      Building Task Directory
2019-06-06T17:22:50-04:00  Received        Task received by client
```
2019-06-06 17:27:12 -04:00
Mahmood Ali e6cec7984a
Merge pull request #5788 from hashicorp/b-fix-node-down-test
tests: Migrated allocs aren't lost
2019-06-06 16:58:35 -04:00
Mahmood Ali 3eda42d027 tests: Migrated allocs aren't lost
Fix `TestServiceSched_NodeDown` for checking that the migrated allocs
are actually marked to be stopped.

The boolean logic in test made it skip actually checking client status
as long as desired status was stop.

Here, we mark some jobs for migration while leaving others as running,
and we check that lost flag is only set for non-migrated allocs.
2019-06-06 16:05:07 -04:00
Charlie Voiselle 1f05d6b39c
Merge pull request #5785 from john-lay/website-link-for-port-host-label
Fix a website link under `Runtime Environment`
2019-06-06 14:11:03 -04:00
john-lay e6c947e83a Update the link to point to #mapped-ports 2019-06-06 17:44:01 +01:00
john-lay 7f6e8d3229 Fix a website link under Runtime Environment
Under the `Network-related Variables` the `NOMAD_HOST_PORT_<label>` has
a incorrect link.
2019-06-06 14:28:30 +01:00
Mahmood Ali eb022e90c7
Merge pull request #5760 from hashicorp/f_improve_tfvars
Proposing new tfvars with additional inline docs
2019-06-06 09:09:21 -04:00
Mahmood Ali d30c3d10b0
Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1
More test fixes
2019-06-05 19:09:18 -04:00
Mahmood Ali 87173111de
Merge pull request #5746 from hashicorp/b-no-updating-inmem-node
set node.StatusUpdatedAt in raft
2019-06-05 19:05:21 -04:00
Mahmood Ali 935ee86e92
Merge pull request #5737 from fwkz/fix-restart-attempts
Fix restart attempts of `restart` stanza in `delay` mode.
2019-06-05 19:05:07 -04:00
Preetha 72bfbe15b7
Merge pull request #5781 from hashicorp/b-revert-release-sup
Revert 0.9.2 release super script tags
2019-06-05 20:46:50 +05:30
Preetha Appan 503db78789
Update release version to 0.9.2 2019-06-05 20:45:17 +05:30
Preetha Appan 5d4a8d3b11
remove 0.9.2 rc1 download link 2019-06-05 20:41:11 +05:30
Preetha Appan 16f422589a
revert 0.9.2 super script tags 2019-06-05 20:39:22 +05:30
Mahmood Ali 97957fbf75 Prepare for 0.9.3 dev cycle 2019-06-05 14:54:00 +00:00
Mahmood Ali e620508e47
Release v0.9.2 2019-06-05 14:49:29 +00:00
Nomad Release bot 43bfbf3fcc Generate files for 0.9.2 release 2019-06-05 11:59:27 +00:00
Mahmood Ali e684a3b7df update changelog for GH-5545 2019-06-04 22:40:38 -04:00
Mahmood Ali 2f90a8ddc5
Merge pull request #5778 from hashicorp/b-preempt-off-by-default
nomad: disable service+batch preemption by default
2019-06-04 20:00:09 -04:00
Mahmood Ali 20cd7f6f54
Merge pull request #5779 from hashicorp/d-preemption-ent
Add Enterprise docs for Preemption
2019-06-04 19:58:45 -04:00
Rob Genova 7ef82d5521 Adds Enterprise docs for Preemption 2019-06-04 23:05:25 +00:00
Michael Schurter 073893f529 nomad: disable service+batch preemption by default
Enterprise only.

Disable preemption for service and batch jobs by default.

Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
2019-06-04 15:54:50 -07:00
Mahmood Ali df09e39f12 changelog GH-5728 2019-06-04 15:11:41 -04:00
Mahmood Ali 3d9967fc5a
Merge pull request #5772 from hashicorp/f-disable-nomad-exec
client config flag to disable remote exec
2019-06-04 14:38:59 -04:00
Mahmood Ali 89930873da link to flag from alloc exec doc 2019-06-04 14:37:56 -04:00
Michael Schurter 3d8938626e
Merge pull request #5773 from hashicorp/b-revert-planapply-snapshotafter
nomad: revert use of SnapshotAfter in planApply
2019-06-04 08:25:21 -07:00
Chris Baker 344d5a83ad
Merge pull request #5768 from hashicorp/b-nmd-1489-cleanup-docker-images
Cleanup docker images
2019-06-03 20:16:23 -04:00
Michael Schurter a8fc50cc1b nomad: revert use of SnapshotAfter in planApply
Revert plan_apply.go changes from #5411

Since non-Command Raft messages do not update the StateStore index,
SnapshotAfter may unnecessarily block and needlessly fail in idle
clusters where the last Raft message is a non-Command message.

This is trivially reproducible with the dev agent and a job that has 2
tasks, 1 of which fails.

The correct logic would be to SnapshotAfter the previous plan's index to
ensure consistency. New clusters or newly elected leaders will not have
a previous plan, so the index the leader was elected should be used
instead.
2019-06-03 15:34:21 -07:00
Chris Baker 3ca97d52db docker/driver: downgraded log level for error in DestroyTask 2019-06-03 21:21:32 +00:00
Chris Baker 2af897c76f drivers/docker: modify container/image cleanup to be robust to containers removed out of band 2019-06-03 19:52:28 +00:00
Mahmood Ali a9f81f2daa client config flag to disable remote exec
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.

I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
2019-06-03 15:31:39 -04:00
Chris Baker a6fe288b52 update changelog 2019-06-03 19:07:13 +00:00
Chris Baker be6c6e8ce1 docker/tests:
- modified tests to cleanup now that RemoveContainer isn't in StartTask
- fix some broken tests by removing docker images/containers before test
2019-06-03 19:05:08 +00:00
Chris Baker 9442c26cff docker: DestroyTask was not cleaning up Docker images because it was erroring early due to an attempt to inspect an image that had already been removed 2019-06-03 19:04:27 +00:00
Mahmood Ali 5811f41e0c changelog exec memory consumption fix 2019-05-31 14:59:13 -05:00
Mahmood Ali 410c5fbf8d
Merge pull request #5728 from hashicorp/restore-08-caps
drivers/exec: Restore 0.8 capabilities
2019-05-29 11:49:39 -05:00
Chris Baker 622409f84c
update changelog for #5557 (#5763) 2019-05-28 09:18:24 -04:00
Mahmood Ali cb554a015f Fix test comparisons 2019-05-24 21:38:22 -05:00
Mahmood Ali 99637c8bbc Test for expected capabilities specifically 2019-05-24 16:07:05 -05:00
Mahmood Ali 7455c746aa use /bin/bash 2019-05-24 14:50:23 -04:00
Mahmood Ali 68813def56 special case root capabilities 2019-05-24 14:10:10 -04:00
Charlie Voiselle ded9dcbb9a Proposing new tfvars with additional inline docs 2019-05-24 12:30:06 -04:00
Mahmood Ali 01d5c90cbb tests: Fix binary dir permissions 2019-05-24 11:31:12 -04:00
Mahmood Ali 2112a5b239
Merge pull request #5755 from hashicorp/d-tag-0.9.2-rc1-features
Prepare for 0.9.2-rc1
2019-05-23 11:22:28 -04:00
Mahmood Ali a4ead8ff79 remove 0.9.2-rc1 generated code 2019-05-23 11:14:24 -04:00
Mahmood Ali 74b9a1ceef
Release v0.9.2-rc1 2019-05-23 11:04:50 -04:00
Mahmood Ali c37ce19ac1 docs: Tag all 0.9.2 features 2019-05-23 10:51:03 -04:00
Mahmood Ali a4a7df0c19 docs: Link to 0.9.2-rc1 2019-05-23 10:27:17 -04:00
Nomad Release bot 6d6bc59732 Generate files for 0.9.2-rc1 release 2019-05-22 19:29:30 +00:00
Mahmood Ali b7320b91d7 Pin node version to v10.15.3 2019-05-22 14:23:50 -04:00