open-nomad

Author	SHA1	Message	Date
Michael Lange	82dc694c70	Disable the proxy when Mirage is enabled This is to prevent max socket connection errors that can stop the live reload server from responding.	2020-04-21 19:52:44 -07:00
Michael Lange	7a4852d44b	Use existing ember proxy config within our custom proxy	2020-04-21 19:52:43 -07:00
Michael Lange	919b15a2db	Merge pull request #7685 from hashicorp/ui/upgrade-lint-staged UI: Upgrade lint-staged and husky	2020-04-21 17:42:12 -07:00
Chris Baker	09d980be2b	modify state store so that autoscaling policies are deleted from their table as job is stopped (and recreated when job is started)	2020-04-21 23:01:26 +00:00
Tim Gross	5b607d7061	changelog entries for 0.11.1 bugfixes (#7763 )	2020-04-21 10:04:13 -04:00
Mahmood Ali	3137d13bc5	Merge pull request #7762 from hashicorp/b-in-place-update-deviceids Perserve device ids in in-place alloc updates	2020-04-21 09:31:10 -04:00
Mahmood Ali	534275448b	add changelog [ci skip]	2020-04-21 09:27:40 -04:00
Mahmood Ali	9f005201e2	Ensure that alloc updates preserve device offers When an alloc is updated in-place, ensure that the allocated device are preserved and carried over to new alloc.	2020-04-21 08:57:15 -04:00
Mahmood Ali	2ff2745374	test for allocated devices on job in-update update When an alloc is updated in-place, test that the allocated devices are preserved in new alloc struct.	2020-04-21 08:56:05 -04:00
Buck Doyle	8cd5f798c4	Docs: correct search API (#7756 ) This closes #7718. It corrects some inaccuracies and adds an explanation of the truncations block.	2020-04-21 07:33:24 -05:00
Tim Gross	bd74b593d0	csi: nil-check allocs for VolumeDenormalize and claim methods (#7760 )	2020-04-21 08:32:24 -04:00
Charlie Voiselle	c68c19f3cf	Use ExternalID in NodeStageVolume RPC (#7754 )	2020-04-20 17:13:46 -04:00
Michael Dwan	ba70c54340	fix panic while deleting CSI plugins for missing job (#7758 )	2020-04-20 17:13:33 -04:00
Seth Hoenig	dad4d58a1d	Merge pull request #7691 from hashicorp/docs-some-connect-bugs docs: add bugfix notes for #7690 #7397 #7684 #7683 to changelog	2020-04-20 10:27:18 -06:00
Seth Hoenig	cc59227a49	docs: add bugfix notes for #7690 #7397 #7684 #7683 to changelog	2020-04-20 10:25:57 -06:00
Seth Hoenig	40e0f8a346	Merge pull request #7690 from hashicorp/b-inspect-proxy-output two fixes for inspect on connect proxy	2020-04-20 10:17:54 -06:00
Seth Hoenig	3d16d56fbb	Merge pull request #7705 from hashicorp/docs-remove-connect-limitation fixup references in connect docs	2020-04-20 10:15:50 -06:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	3e741a0caa	Merge pull request #7748 from hashicorp/b-noisy-http-logs agent: route http logs through hclog	2020-04-20 10:37:15 -04:00
Mahmood Ali	1c0e1cabc9	update changelog [ci skip]	2020-04-20 10:36:39 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Mahmood Ali	86aa8105b2	Merge pull request #7749 from hashicorp/b-docker-panic driver/docker: protect against nil container	2020-04-20 10:31:46 -04:00
Mahmood Ali	6bfef2c945	add changelog [ci skip]	2020-04-20 10:31:09 -04:00
Jeffrey 'jf' Lim	35418efb60	demo/vagrant/Vagrantfile: Update Nomad version (0.11.0) (#7579 )	2020-04-20 09:29:12 -04:00
Anthony Scalisi	9664c6b270	fix spelling errors (#6985 )	2020-04-20 09:28:19 -04:00
Charles Z	e4a669598e	label csi as beta from 0.11 release notes (#7745 )	2020-04-20 08:48:04 -04:00
Mahmood Ali	dff071c3b9	driver/docker: protect against nil container Protect against a panic when we attempt to start a container with a name that conflicts with an existing one. If the existing one is being deleted while nomad first attempts to create the container, the createContainer will fail with `container already exists`, but we get nil container reference from the `containerByName` lookup, and cause a crash. I'm not certain how we get into the state, except for being very unlucky. I suspect that this case may be the result of a concurrent restart or the docker engine API not being fully consistent (e.g. an earlier call purged the container, but docker didn't free up resources yet to create a new container with the same name immediately yet). If that's the case, then re-attempting creation will hopefully succeed, or we'd at least fail enough times for the alloc to be rescheduled to another node.	2020-04-19 15:34:45 -04:00
Jeffrey 'jf' Lim	eab600d3e1	Fix/improve "job plan" messaging (#7580 )	2020-04-17 15:53:16 -04:00
Yishan Lin	164314f7fa	Merge pull request #7741 from hashicorp/yishan/docs-rebased-preemption-update docs: update preemption page	2020-04-17 11:03:27 -07:00
Yishan Lin	b95309dc4b	docs: update preemption page This page has not been updated (yet) to reflect that support for all 3 job types (service, batch, system) which shipped in 0.9.2. The current page implies that preemption is only available for system jobs. This is early preparation for Nomad 0.12, where we plan to move Preemption from Enterprise feature suite to OSS for all.	2020-04-17 09:34:07 -07:00
Michael Schurter	85999cbfab	docs: add #7730 to changelog	2020-04-15 15:13:30 -07:00
Michael Schurter	4c5a0cae35	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Brandon Romano	3b22f5aa72	Merge pull request #7717 from hashicorp/website-alert website: Adjust the website alert to point to the blog post	2020-04-14 11:36:43 -07:00
Brandon Romano	f520757617	Adjust the website alert to point to the blog post	2020-04-14 11:17:06 -07:00
Michael Schurter	165ddda744	Merge pull request #7682 from hashicorp/b-comment-fix core: fix comment on system stack	2020-04-13 15:13:23 -07:00
Seth Hoenig	d5ad580d5c	structs: fix compatibility between api and nomad/structs proxy definitions The field names within the structs representing the Connect proxy definition were not the same (nomad/structs/ vs api/), causing the values to be lost in translation for the 'nomad job inspect' command. Since the field names already shipped in v0.11.0 we cannot simply fix the names. Instead, use the json struct tag on the structs/ structs to remap the name to match the publicly expose api/ package on json encoding. This means existing jobs from v0.11.0 will continue to work, and the JSON API for job submission will remain backwards compatible.	2020-04-13 15:59:45 -06:00
Seth Hoenig	7e3b16fa90	jobspec: correctly parse proxy fields from jobspec Before, the proxy stanza did not parse non-object fields `local_service_port` and `local_service_address` from the connect `proxy` stanza. This change fixes that.	2020-04-13 15:59:45 -06:00
Chris Baker	a37446acaa	documents the scaling block in the JSON Job docs (#7706 ) * documents the scaling block in the JSON Job docs resolves #7656 * add task-specific restart to JSON Job docs companion to #7603 * [docs] improved and corrected scaling docs * Update website/pages/api-docs/json-jobs.mdx Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-04-13 16:33:49 -05:00
Chris Baker	a6e0d17433	update `restart` documentation (#7603 ) * update `restart` documentation #7288 added support for task-specific `restart` policy. this PR updates the docs to reflect that. * added an explicit example of task-specific restart policy * Update website/pages/docs/job-specification/restart.mdx	2020-04-13 16:29:43 -05:00
Drew Bailey	f3b168e369	Merge pull request #7663 from hashicorp/b-taskrunner-shutdown_delay Run task shutdown_delay regardless of service registration	2020-04-13 13:27:24 -04:00
Drew Bailey	da11c31e4c	Update CHANGELOG.md	2020-04-13 12:41:13 -04:00
Seth Hoenig	07ccebca71	docs: add a link to the Connect w/ACLs guide ... from the docs/integration/consul-connect page.	2020-04-13 10:05:20 -06:00
Seth Hoenig	a35a64b6bd	docs: update connect limitations (acls & checks now supported)	2020-04-13 09:51:17 -06:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Tim Gross	4e9bd1e1d1	refactor: consolidate private methods for CSI RPC (#7702 ) Follow-up for a method missed in the refactor for #7688. The `volAndPluginLookup` method is only ever called from the server's `CSI` RPC and never the `ClientCSI` RPC, so move it into that scope.	2020-04-13 10:46:43 -04:00
Tim Gross	ab3086a1f4	e2e: testing reliability (#7701 ) * pin CSI plugin versions * ensure failing CSI tests clean up * allow NOMAD_SHA env var to override makefile	2020-04-13 10:25:24 -04:00
Mahmood Ali	e6551455b9	Merge pull request #7693 from greut/bump-testify api: testify v1.5.1	2020-04-11 09:09:44 -04:00
Yoan Blanc	790df29996	api: testify v1.5.1 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-04-11 13:55:10 +02:00
Tim Gross	f37e986b1b	refactor: make nodeForControllerPlugin private to ClientCSI (#7688 ) The current design of `ClientCSI` RPC requires that callers in the server know about the free-standing `nodeForControllerPlugin` function. This makes it difficult to send `ClientCSI` RPC messages from subpackages of `nomad` and adds a bunch of boilerplate to every server-side caller of a controller RPC. This changeset makes it so that the `ClientCSI` RPCs will populate and validate the controller's client node ID if it hasn't been passed by the caller, centralizing the logic of picking and validating controller targets into the `nomad.ClientCSI` struct.	2020-04-10 16:47:21 -04:00
Seth Hoenig	43804d8ca9	Merge pull request #7684 from hashicorp/b-connect-sidecar-name connect: enable configuring sidecar_task.name	2020-04-10 10:04:25 -06:00

... 4 5 6 7 8 ...

18186 commits