open-nomad

Author	SHA1	Message	Date
Seth Hoenig	cc59227a49	docs: add bugfix notes for #7690 #7397 #7684 #7683 to changelog	2020-04-20 10:25:57 -06:00
Seth Hoenig	40e0f8a346	Merge pull request #7690 from hashicorp/b-inspect-proxy-output two fixes for inspect on connect proxy	2020-04-20 10:17:54 -06:00
Seth Hoenig	3d16d56fbb	Merge pull request #7705 from hashicorp/docs-remove-connect-limitation fixup references in connect docs	2020-04-20 10:15:50 -06:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	3e741a0caa	Merge pull request #7748 from hashicorp/b-noisy-http-logs agent: route http logs through hclog	2020-04-20 10:37:15 -04:00
Mahmood Ali	1c0e1cabc9	update changelog [ci skip]	2020-04-20 10:36:39 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Mahmood Ali	86aa8105b2	Merge pull request #7749 from hashicorp/b-docker-panic driver/docker: protect against nil container	2020-04-20 10:31:46 -04:00
Mahmood Ali	6bfef2c945	add changelog [ci skip]	2020-04-20 10:31:09 -04:00
Jeffrey 'jf' Lim	35418efb60	demo/vagrant/Vagrantfile: Update Nomad version (0.11.0) (#7579 )	2020-04-20 09:29:12 -04:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00
Charles Z	e4a669598e	label csi as beta from 0.11 release notes (#7745 )	2020-04-20 08:48:04 -04:00
Mahmood Ali	dff071c3b9	driver/docker: protect against nil container Protect against a panic when we attempt to start a container with a name that conflicts with an existing one. If the existing one is being deleted while nomad first attempts to create the container, the createContainer will fail with `container already exists`, but we get nil container reference from the `containerByName` lookup, and cause a crash. I'm not certain how we get into the state, except for being very unlucky. I suspect that this case may be the result of a concurrent restart or the docker engine API not being fully consistent (e.g. an earlier call purged the container, but docker didn't free up resources yet to create a new container with the same name immediately yet). If that's the case, then re-attempting creation will hopefully succeed, or we'd at least fail enough times for the alloc to be rescheduled to another node.	2020-04-19 15:34:45 -04:00
Jeffrey 'jf' Lim	eab600d3e1	Fix/improve "job plan" messaging (#7580 )	2020-04-17 15:53:16 -04:00
Yishan Lin	164314f7fa	Merge pull request #7741 from hashicorp/yishan/docs-rebased-preemption-update docs: update preemption page	2020-04-17 11:03:27 -07:00
Yishan Lin	b95309dc4b	docs: update preemption page This page has not been updated (yet) to reflect that support for all 3 job types (service, batch, system) which shipped in 0.9.2. The current page implies that preemption is only available for system jobs. This is early preparation for Nomad 0.12, where we plan to move Preemption from Enterprise feature suite to OSS for all.	2020-04-17 09:34:07 -07:00
Michael Schurter	85999cbfab	docs: add #7730 to changelog	2020-04-15 15:13:30 -07:00
Michael Schurter	4c5a0cae35	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Brandon Romano	3b22f5aa72	Merge pull request #7717 from hashicorp/website-alert website: Adjust the website alert to point to the blog post	2020-04-14 11:36:43 -07:00
Brandon Romano	f520757617	Adjust the website alert to point to the blog post	2020-04-14 11:17:06 -07:00
Michael Schurter	165ddda744	Merge pull request #7682 from hashicorp/b-comment-fix core: fix comment on system stack	2020-04-13 15:13:23 -07:00
Seth Hoenig	d5ad580d5c	structs: fix compatibility between api and nomad/structs proxy definitions The field names within the structs representing the Connect proxy definition were not the same (nomad/structs/ vs api/), causing the values to be lost in translation for the 'nomad job inspect' command. Since the field names already shipped in v0.11.0 we cannot simply fix the names. Instead, use the json struct tag on the structs/ structs to remap the name to match the publicly expose api/ package on json encoding. This means existing jobs from v0.11.0 will continue to work, and the JSON API for job submission will remain backwards compatible.	2020-04-13 15:59:45 -06:00
Seth Hoenig	7e3b16fa90	jobspec: correctly parse proxy fields from jobspec Before, the proxy stanza did not parse non-object fields `local_service_port` and `local_service_address` from the connect `proxy` stanza. This change fixes that.	2020-04-13 15:59:45 -06:00
Chris Baker	a37446acaa	documents the scaling block in the JSON Job docs (#7706 ) * documents the scaling block in the JSON Job docs resolves #7656 * add task-specific restart to JSON Job docs companion to #7603 * [docs] improved and corrected scaling docs * Update website/pages/api-docs/json-jobs.mdx Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-04-13 16:33:49 -05:00
Chris Baker	a6e0d17433	update `restart` documentation (#7603 ) * update `restart` documentation #7288 added support for task-specific `restart` policy. this PR updates the docs to reflect that. * added an explicit example of task-specific restart policy * Update website/pages/docs/job-specification/restart.mdx	2020-04-13 16:29:43 -05:00
Drew Bailey	f3b168e369	Merge pull request #7663 from hashicorp/b-taskrunner-shutdown_delay Run task shutdown_delay regardless of service registration	2020-04-13 13:27:24 -04:00
Drew Bailey	da11c31e4c	Update CHANGELOG.md	2020-04-13 12:41:13 -04:00
Seth Hoenig	07ccebca71	docs: add a link to the Connect w/ACLs guide ... from the docs/integration/consul-connect page.	2020-04-13 10:05:20 -06:00
Seth Hoenig	a35a64b6bd	docs: update connect limitations (acls & checks now supported)	2020-04-13 09:51:17 -06:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Tim Gross	4e9bd1e1d1	refactor: consolidate private methods for CSI RPC (#7702 ) Follow-up for a method missed in the refactor for #7688. The `volAndPluginLookup` method is only ever called from the server's `CSI` RPC and never the `ClientCSI` RPC, so move it into that scope.	2020-04-13 10:46:43 -04:00
Tim Gross	ab3086a1f4	e2e: testing reliability (#7701 ) * pin CSI plugin versions * ensure failing CSI tests clean up * allow NOMAD_SHA env var to override makefile	2020-04-13 10:25:24 -04:00
Mahmood Ali	e6551455b9	Merge pull request #7693 from greut/bump-testify api: testify v1.5.1	2020-04-11 09:09:44 -04:00
Yoan Blanc	790df29996	api: testify v1.5.1 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-04-11 13:55:10 +02:00
Tim Gross	f37e986b1b	refactor: make nodeForControllerPlugin private to ClientCSI (#7688 ) The current design of `ClientCSI` RPC requires that callers in the server know about the free-standing `nodeForControllerPlugin` function. This makes it difficult to send `ClientCSI` RPC messages from subpackages of `nomad` and adds a bunch of boilerplate to every server-side caller of a controller RPC. This changeset makes it so that the `ClientCSI` RPCs will populate and validate the controller's client node ID if it hasn't been passed by the caller, centralizing the logic of picking and validating controller targets into the `nomad.ClientCSI` struct.	2020-04-10 16:47:21 -04:00
Seth Hoenig	43804d8ca9	Merge pull request #7684 from hashicorp/b-connect-sidecar-name connect: enable configuring sidecar_task.name	2020-04-10 10:04:25 -06:00
Seth Hoenig	7ff7d2a288	Merge pull request #7683 from hashicorp/b-no-sidecar-panic connect: correctly handle missing sidecar_service task stanza	2020-04-10 09:49:59 -06:00
Seth Hoenig	0407eaaf88	connect: extract common task keys	2020-04-10 09:49:19 -06:00
Drew Bailey	591aea1edd	changelog	2020-04-10 11:14:39 -04:00
Drew Bailey	8bfee62b70	Run task shutdown_delay regardless of service registration task shutdown_delay will currently only run if there are registered services for the task. This implementation detail isn't explicity stated anywhere and is defined outside of the service stanza. This change moves shutdown_delay to be evaluated after prekill hooks are run, outside of any task runner hooks. just use time.sleep	2020-04-10 11:06:26 -04:00
Michael Lange	a34363efd3	Remove now superfluous lint-staged arguments	2020-04-09 20:46:32 -07:00
Michael Lange	45eb6fd7f3	Upgrade Husky	2020-04-09 20:45:37 -07:00
Michael Lange	03d4afe9e0	Upgrade lint-staged Version 10 fixes an issue where if lint-staged fails while linting a partially staged file, all unstaged changes will be removed from the working tree. Now when this happens, unstaged changes will be in the stash.	2020-04-09 20:41:35 -07:00
Seth Hoenig	db865e05d8	connect: enable configuring sidecar_task.name Before, the submitted jobspec for sidecar_task would pass through 2 key validation steps - once for the subset specific to connect sidecar task definitions, and once again for the set of normal task definition where the task would actually get unmarshalled. The valid keys for the normal task definition did not include "name", which is supposed to be configurable for the sidecar task. To fix this, just eliminate the double validation step, and instead pass-in the correct set of keys to validate against to the one generic task parser. Fixes #7680	2020-04-09 21:01:16 -06:00
Seth Hoenig	20802da8fd	connect: correctly deal with nil sidecar_service task stanza Before, if the sidecar_service stanza of a connect enabled service was missing, the job submission would cause a panic in the nomad agent. Since the panic was happening in the API handler the agent itself continued running, but this change will the condition more gracefully. By fixing the `Copy` method, the API handler now returns the proper error. $ nomad job run foo.nomad Error submitting job: Unexpected response code: 500 (1 error occurred: * Task group api validation failed: 2 errors occurred: * Missing tasks for task group * Task group service validation failed: 1 error occurred: * Service[0] count-api validation failed: 1 error occurred: * Consul Connect must be native or use a sidecar service	2020-04-09 20:28:17 -06:00
Michael Schurter	4b475db408	core: fix comment on system stack This makes me do a double take every time I run into it, so what if we just changed it?	2020-04-09 15:19:11 -07:00
Mahmood Ali	1640f58776	Merge pull request #7676 from hashicorp/vendor-golang-org-x-20200409 Upgrade all golang.org/x packages	2020-04-09 17:18:57 -04:00
Seth Hoenig	58d844f591	Merge pull request #7678 from hashicorp/docs-connect-config-link-404 docs: fix link to envoy proxy documentation on consul site	2020-04-09 13:56:58 -06:00
Seth Hoenig	6cfecc6d03	docs: fix link to envoy proxy documentation on consul site	2020-04-09 13:46:59 -06:00
Mahmood Ali	735a478cc2	Upgrade all golang.org/x packages Upgrade all golang.org/x packages to pick up fixes and improvements. Some packages date back to 2018 and so much improvement happened since then!	2020-04-09 15:23:25 -04:00

... 2 3 4 5 6 ...

18070 commits