Commit graph

17908 commits

Author SHA1 Message Date
Mahmood Ali 5b42796f1e
Merge pull request #7704 from hashicorp/b-agent-shutdown-order
agent: shutdown agent http server last
2020-04-20 10:37:26 -04:00
Mahmood Ali 3e741a0caa
Merge pull request #7748 from hashicorp/b-noisy-http-logs
agent: route http logs through hclog
2020-04-20 10:37:15 -04:00
Mahmood Ali 1c0e1cabc9 update changelog
[ci skip]
2020-04-20 10:36:39 -04:00
Mahmood Ali 4e1366f285 agent: route http logs through hclog
Pipe http server log to hclog, so that it uses the same logging format
as rest of nomad logs.  Also, supports emitting them as json logs, when
json formatting is set.

The http server logs are emitted as Trace level, as they are typically
repsent HTTP client errors (e.g. failed tls handshakes, invalid headers,
etc).

Though, Panic logs represent server errors and are relayed as Error
level.
2020-04-20 10:33:40 -04:00
Mahmood Ali 86aa8105b2
Merge pull request #7749 from hashicorp/b-docker-panic
driver/docker: protect against nil container
2020-04-20 10:31:46 -04:00
Mahmood Ali 6bfef2c945 add changelog
[ci skip]
2020-04-20 10:31:09 -04:00
Jeffrey 'jf' Lim 35418efb60
demo/vagrant/Vagrantfile: Update Nomad version (0.11.0) (#7579) 2020-04-20 09:29:12 -04:00
Anthony Scalisi 9664c6b270
fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Charles Z e4a669598e
label csi as beta from 0.11 release notes (#7745) 2020-04-20 08:48:04 -04:00
Mahmood Ali dff071c3b9 driver/docker: protect against nil container
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one.  If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.

I'm not certain how we get into the state, except for being very
unlucky.  I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).

If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
2020-04-19 15:34:45 -04:00
Jeffrey 'jf' Lim eab600d3e1
Fix/improve "job plan" messaging (#7580) 2020-04-17 15:53:16 -04:00
Yishan Lin 164314f7fa
Merge pull request #7741 from hashicorp/yishan/docs-rebased-preemption-update
docs: update preemption page
2020-04-17 11:03:27 -07:00
Yishan Lin b95309dc4b docs: update preemption page
This page has not been updated (yet) to reflect that support for all 3 job types (service, batch, system) which shipped in 0.9.2.

The current page implies that preemption is only available for system jobs.

This is early preparation for Nomad 0.12, where we plan to move Preemption from Enterprise feature suite to OSS for all.
2020-04-17 09:34:07 -07:00
Brandon Romano 3b22f5aa72
Merge pull request #7717 from hashicorp/website-alert
website: Adjust the website alert to point to the blog post
2020-04-14 11:36:43 -07:00
Brandon Romano f520757617 Adjust the website alert to point to the blog post 2020-04-14 11:17:06 -07:00
Michael Schurter 165ddda744
Merge pull request #7682 from hashicorp/b-comment-fix
core: fix comment on system stack
2020-04-13 15:13:23 -07:00
Chris Baker a37446acaa
documents the scaling block in the JSON Job docs (#7706)
* documents the scaling block in the JSON Job docs

resolves #7656

* add task-specific restart to JSON Job docs

companion to #7603

* [docs] improved and corrected scaling docs

* Update website/pages/api-docs/json-jobs.mdx

Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2020-04-13 16:33:49 -05:00
Chris Baker a6e0d17433
update restart documentation (#7603)
* update `restart` documentation

#7288 added support for task-specific `restart` policy. this PR updates the docs to reflect that.

* added an explicit example of task-specific restart policy

* Update website/pages/docs/job-specification/restart.mdx
2020-04-13 16:29:43 -05:00
Drew Bailey f3b168e369
Merge pull request #7663 from hashicorp/b-taskrunner-shutdown_delay
Run task shutdown_delay regardless of service registration
2020-04-13 13:27:24 -04:00
Drew Bailey da11c31e4c
Update CHANGELOG.md 2020-04-13 12:41:13 -04:00
Mahmood Ali b78680eee7 agent: shutdown agent http server last
Shutdown http server last, after nomad client/server components
terminate.

Before this change, if the agent is taking an unexpectedly long time to
shutdown, the operator cannot query the http server directly: they
cannot access agent specific http endpoints and need to query another
agent about the troublesome agent.

Unexpectedly long shutdown can happen in normal cases, e.g. a client
might hung is if one of the allocs it is running has a long
shutdown_delay.

Here, we switch to ensuring that the http server is shutdown last.

I believe this doesn't require extra care in agent shutting down logic
while operators may be able to submit write http requests.  We already
need to cope with operators submiting these http requests to another
agent or by servers updating the client allocations.
2020-04-13 10:50:07 -04:00
Tim Gross 4e9bd1e1d1
refactor: consolidate private methods for CSI RPC (#7702)
Follow-up for a method missed in the refactor for #7688. The
`volAndPluginLookup` method is only ever called from the server's `CSI`
RPC and never the `ClientCSI` RPC, so move it into that scope.
2020-04-13 10:46:43 -04:00
Tim Gross ab3086a1f4
e2e: testing reliability (#7701)
* pin CSI plugin versions
* ensure failing CSI tests clean up
* allow NOMAD_SHA env var to override makefile
2020-04-13 10:25:24 -04:00
Mahmood Ali e6551455b9
Merge pull request #7693 from greut/bump-testify
api: testify v1.5.1
2020-04-11 09:09:44 -04:00
Yoan Blanc 790df29996
api: testify v1.5.1
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-04-11 13:55:10 +02:00
Tim Gross f37e986b1b
refactor: make nodeForControllerPlugin private to ClientCSI (#7688)
The current design of `ClientCSI` RPC requires that callers in the
server know about the free-standing `nodeForControllerPlugin`
function. This makes it difficult to send `ClientCSI` RPC messages
from subpackages of `nomad` and adds a bunch of boilerplate to every
server-side caller of a controller RPC.

This changeset makes it so that the `ClientCSI` RPCs will populate and
validate the controller's client node ID if it hasn't been passed by
the caller, centralizing the logic of picking and validating
controller targets into the `nomad.ClientCSI` struct.
2020-04-10 16:47:21 -04:00
Seth Hoenig 43804d8ca9
Merge pull request #7684 from hashicorp/b-connect-sidecar-name
connect: enable configuring sidecar_task.name
2020-04-10 10:04:25 -06:00
Seth Hoenig 7ff7d2a288
Merge pull request #7683 from hashicorp/b-no-sidecar-panic
connect: correctly handle missing sidecar_service task stanza
2020-04-10 09:49:59 -06:00
Seth Hoenig 0407eaaf88 connect: extract common task keys 2020-04-10 09:49:19 -06:00
Drew Bailey 591aea1edd
changelog 2020-04-10 11:14:39 -04:00
Drew Bailey 8bfee62b70
Run task shutdown_delay regardless of service registration
task shutdown_delay will currently only run if there are registered
services for the task. This implementation detail isn't explicity stated
anywhere and is defined outside of the service stanza.

This change moves shutdown_delay to be evaluated after prekill hooks are
run, outside of any task runner hooks.

just use time.sleep
2020-04-10 11:06:26 -04:00
Seth Hoenig db865e05d8 connect: enable configuring sidecar_task.name
Before, the submitted jobspec for sidecar_task would pass
through 2 key validation steps - once for the subset specific
to connect sidecar task definitions, and once again for the set
of normal task definition where the task would actually get
unmarshalled.

The valid keys for the normal task definition did not include
"name", which is supposed to be configurable for the sidecar
task. To fix this, just eliminate the double validation step,
and instead pass-in the correct set of keys to validate against
to the one generic task parser.

Fixes #7680
2020-04-09 21:01:16 -06:00
Seth Hoenig 20802da8fd connect: correctly deal with nil sidecar_service task stanza
Before, if the sidecar_service stanza of a connect enabled service
was missing, the job submission would cause a panic in the nomad
agent. Since the panic was happening in the API handler the agent
itself continued running, but this change will the condition more
gracefully.

By fixing the `Copy` method, the API handler now returns the proper
error.

$ nomad job run foo.nomad
Error submitting job: Unexpected response code: 500 (1 error occurred:
	* Task group api validation failed: 2 errors occurred:
	* Missing tasks for task group
	* Task group service validation failed: 1 error occurred:
	* Service[0] count-api validation failed: 1 error occurred:
	* Consul Connect must be native or use a sidecar service
2020-04-09 20:28:17 -06:00
Michael Schurter 4b475db408 core: fix comment on system stack
This makes me do a double take every time I run into it, so what if we
just changed it?
2020-04-09 15:19:11 -07:00
Mahmood Ali 1640f58776
Merge pull request #7676 from hashicorp/vendor-golang-org-x-20200409
Upgrade all golang.org/x packages
2020-04-09 17:18:57 -04:00
Seth Hoenig 58d844f591
Merge pull request #7678 from hashicorp/docs-connect-config-link-404
docs: fix link to envoy proxy documentation on consul site
2020-04-09 13:56:58 -06:00
Seth Hoenig 6cfecc6d03 docs: fix link to envoy proxy documentation on consul site 2020-04-09 13:46:59 -06:00
Mahmood Ali 735a478cc2 Upgrade all golang.org/x packages
Upgrade all golang.org/x packages to pick up fixes and improvements.
Some packages date back to 2018 and so much improvement happened since
then!
2020-04-09 15:23:25 -04:00
Michael Schurter 4bd6372284
Merge pull request #7675 from hashicorp/release-post-0110
Prepare for 0.11.1 release
2020-04-09 12:20:39 -07:00
Michael Schurter 084c6bb94b docs: add #7673 to changelog 2020-04-09 12:18:34 -07:00
Mahmood Ali c64a79f6d1
Merge pull request #7673 from hashicorp/b-http2-cached-connections
vendor: upgrade golang.org/x/net packages
2020-04-09 15:14:40 -04:00
Michael Schurter c763cb3b37 docs: prep changelog for 0.11.1 2020-04-09 12:11:54 -07:00
Michael Schurter f93ee566cf release: bump version to 0.11.1 for development 2020-04-09 12:11:13 -07:00
James Rasell 20fd17c166
Merge pull request #7666 from hashicorp/add-0.11.0-changelog-release-date
changelog: add 0.11.0 release date.
2020-04-09 21:10:15 +02:00
Mahmood Ali 63d15d7e5c vendor: upgrade golang.org/net/...
golang.org/net packages are ancient - upgrading them to pick up
important fixes, e.g. https://go-review.googlesource.com/c/go/+/87298/
2020-04-09 14:57:39 -04:00
Mahmood Ali 1271c4ce96
Merge pull request #7672 from hashicorp/dev-e2e-tweaks-20200409
e2e: add a convenient creation script
2020-04-09 11:30:48 -04:00
Mahmood Ali c8eddb9f6b fixup! e2e: add a convenient creation script 2020-04-09 11:04:26 -04:00
Mahmood Ali 8a4937d9ce e2e: add a convenient creation script
Add a convenience Makefile for creating e2e environment for manual
debugging.
2020-04-09 10:54:30 -04:00
James Rasell cb2719b7e6
changelog: add 0.11.0 release date. 2020-04-09 10:35:17 +02:00
Mahmood Ali 030e40ac5c
Merge pull request #7652 from hashicorp/v-gomod-msgpaack
dev: Use go mod for managing hashicorp/go-msgpack
2020-04-08 14:42:39 -04:00