Commit Graph

20462 Commits

Author SHA1 Message Date
James Rasell d6cab8aa14
Merge pull request #9767 from hashicorp/f-e2e-job-scaling-suite
e2e: add job scaling test suite.
2021-01-11 18:35:07 +01:00
Tim Gross d78b4fc1a1 safely handle existing net namespace in default network manager
When a client restarts, the network_hook's prerun will call
`CreateNetwork`. Drivers that don't implement their own network manager will
fall back to the default network manager, which doesn't handle the case where
the network namespace is being recreated safely. This results in an error and
the task being restarted for `exec` tasks with `network` blocks (this also
impacts the community `containerd` and probably other community task drivers).

If we get an error when attempting to create the namespace and that error is
because the file already exists and is locked by its process, then we'll
return a `nil` error with the `created` flag set to false, just as we do with
the `docker` driver.
2021-01-11 11:31:03 -05:00
Seth Hoenig 43880dadd5
Merge pull request #9765 from hashicorp/f-bump-connect-examples
command: bump connect examples to v3
2021-01-11 10:22:58 -06:00
Seth Hoenig 64a8b795f2
Merge pull request #9766 from hashicorp/f-bump-cni-plugins-version
cni: bump CNI plugins version to v0.9.0
2021-01-11 09:59:43 -06:00
Tim Gross f97505e384 e2e: remove deprecated terraform syntax
Also bumps patch versions of some TF modules
2021-01-11 08:25:22 -05:00
James Rasell 4374d99071
e2e: add job scaling test suite. 2021-01-11 11:34:19 +01:00
Seth Hoenig fc5f48d936 cni: bump CNI version to v0.9.0
https://github.com/containernetworking/plugins/releases/tag/v0.9.0

Also make the copy-paste install instructions work with arm64 for
a better OOTB experience (AWS Graviton, Pi 4's).
2021-01-10 18:03:27 -06:00
Seth Hoenig 207fe378ce docs: update countdash examples to v3 2021-01-10 17:19:39 -06:00
Seth Hoenig 36da162619 command: generate bindata assetfs 2021-01-10 17:09:08 -06:00
Seth Hoenig 456868c166 command: bump connect examples to v3
Nomad v1.0+ combined with Consul 1.9+ support launching Envoy v1.16+
which is the first version of envoy to support arm64 platforms out
of the box.

By rebuilding our example docker containers for connect to be multiplatform
between amd64 and arm64, Nomad can provide a nicer user experience for
those trying out Connect on arm64 machines (e.g. AWS Graviton instances
or Raspberry Pi 4's).

This has been done for the countdash examples at v3.

https://hub.docker.com/layers/hashicorpnomad/counter-dashboard/v3/images/sha256-94e323587bc372ba1b6ca5c58dc23e291e9d26787b50e71025f1c8967dfbcd07?context=repo
https://hub.docker.com/layers/hashicorpnomad/counter-api/v3/images/sha256-16a9e9e08082985a635c9edd0f258b084153c6c7831a9b41d34bde78c308b65c?context=repo

The connect-native examples are now also multiplatform at v5, but we
don't have them built into `job init`.
2021-01-10 16:54:31 -06:00
Chris Baker cdfe5a50ff
Merge pull request #9761 from hashicorp/b-9758-enforce-policy-on-scale
in Job.Scale, ensure that new count is within [min,max] configured in  scaling policy
2021-01-08 15:49:38 -06:00
Chris Baker 3546469205 nicer error message 2021-01-08 21:13:29 +00:00
Jeff Escalante f4e68cedc1
update dependencies (#9760) 2021-01-08 15:46:31 -05:00
Buck Doyle 2589f7360c
Add documentation for exec websocket (#9679) 2021-01-08 14:01:06 -06:00
Chris Baker d43e0d10c0 appease the linter and fix an incorrect test 2021-01-08 19:38:25 +00:00
Chris Baker a53e54d7a6 changelog for 9761 2021-01-08 19:26:42 +00:00
Chris Baker 49effd5840 in Job.Scale, ensure that new count is within [min,max] configured in scaling policy
resolves #9758
2021-01-08 19:24:36 +00:00
Drew Bailey c87adfac62
persist shared ports during inplace updates (#9736)
AllocatedSharedResources were not being copied over to the new
allocation struct the scheduler makes during inplace updates. This
caused downstream issues after the plan was applied, namely the shared
ports were dropped causing issues with service
registration/deregistration.

test that shared ports are preserved

change log, also carry over shared network

copy networks
2021-01-08 09:00:41 -05:00
Tim Gross 5b9a98d25a docs: clarify default behavior of docker userns_mode 2021-01-08 08:22:39 -05:00
Chulki Lee b7b23e9955 Fix HCL2 link 2021-01-08 08:19:06 -05:00
James Rasell 108fa33393
Merge pull request #9747 from hashicorp/f-e2e-scaling-policy-suite
e2e: add ScalingPolicies test suite with initial test case.
2021-01-08 10:51:48 +01:00
Michael Lange 1fabd3240c
Merge pull request #9614 from hashicorp/dependabot/npm_and_yarn/ui/ini-1.3.7
build(deps): bump ini from 1.3.5 to 1.3.7 in /ui
2021-01-07 14:10:03 -08:00
Tim Gross cb0c4b1d0b changelog entry for #9532 2021-01-07 15:44:13 -05:00
Joel May 13faf0d79e Allow client.cpu_total_compute to override attr.cpu.totalcompute 2021-01-07 15:31:11 -05:00
Seth Hoenig 09c13b0066
Merge pull request #9751 from hashicorp/b-envoyv-segfault
consul/connect: fix panic during in-place upgrade with connect jobs
2021-01-07 14:22:27 -06:00
Tim Gross 4eafcb06ef changelog: add entry for GH-9050 2021-01-07 15:01:04 -05:00
Seth Hoenig 303856183c consul/connect: fix panic during in-place upgrade with connect jobs
When upgrading from Nomad v0.12.x to v1.0.x, Nomad client will panic on
startup if the node is running Connect enabled jobs. This is caused by
a missing piece of plumbing of the Consul Proxies API interface during the
client restore process.

Fixes #9738
2021-01-07 13:24:24 -06:00
Michael Lange 304378565c
Merge pull request #9690 from hashicorp/docs-wtdd-update-ui-api-docs
WTDD: Update UI api docs
2021-01-07 10:52:45 -08:00
Kent 'picat' Gruber f0d1c4092b
Update go-getter to v1.5.2 with support for vhost style S3 paths (#9349) 2021-01-07 13:34:28 -05:00
Jeff Escalante 8b4f6b40e4
Merge pull request #9748 from hashicorp/docs-zs.build-time-code-highlight-revised
Add build-time highlighting to code blocks
2021-01-07 13:33:49 -05:00
Michael Lange 674707e349 Update the page param default to 1 instead of 0 2021-01-07 09:59:09 -08:00
Michael Lange aa8e209c2e Typo fixes
Co-authored-by: Buck Doyle <buck@hashicorp.com>
2021-01-07 09:59:08 -08:00
Michael Lange 46a0435cf2 Update 'Node' to 'Client' which is used throughout the UI 2021-01-07 09:59:08 -08:00
Michael Lange 761b7a1cef Add missing faceted search query params 2021-01-07 09:59:08 -08:00
Michael Lange 2f05f06ecd Remove no longer true enteprise warning 2021-01-07 09:59:08 -08:00
Michael Lange 549f2f77ab Remove version introduction
0.7 is ancient at this point. Now it's as if the UI has always existed.
2021-01-07 09:59:07 -08:00
Michael Lange 75f304bbc2 Add missing routes to the UI API doc 2021-01-07 09:59:07 -08:00
Michael Lange d9b8f6d411
Merge pull request #9733 from hashicorp/b-ui/topo-viz-old-agent
UI: Guard against nodes running an old version of the Nomad agent
2021-01-07 09:27:14 -08:00
Zach Shilton 2a9f9aa8d3
Remove broken shell-session highlighting 2021-01-07 11:57:09 -05:00
Zach Shilton caa30ca097
Add build-time highlighting to code blocks 2021-01-07 11:48:02 -05:00
James Rasell 005e15afbc
Merge pull request #9744 from hashicorp/f-add-namespace-e2e-oss
e2e: move namespace tests into OSS.
2021-01-07 17:36:09 +01:00
Nick Ethier 6705f845f2
Merge pull request #9739 from hashicorp/b-alloc-netmode-ports
Use port's to value when building service address under 'alloc' addr_mode
2021-01-07 09:16:27 -05:00
Kdu Bonalume 425ad5892d Fix missing link for Consul integration
Add a link back to configuration/consul in the `service` parameter section of the `group` stanza.
2021-01-07 09:02:43 -05:00
Nick Ethier 7a6aab10bb
Apply suggestions from code review
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-01-07 08:53:54 -05:00
James Rasell b087d68736
e2e: add ScalingPolicies test suite with initial test case. 2021-01-07 14:39:55 +01:00
James Rasell 02b9d9da87
e2e: move namespace tests into OSS. 2021-01-07 09:15:43 +01:00
Jeff Escalante f791725736
Merge pull request #9743 from hashicorp/je.fix-edit-page-links
hotfix: fix 'edit this page' links
2021-01-06 19:09:46 -05:00
Jeff Escalante 8c04e22ce4
fix 'edit this page' links 2021-01-06 19:01:32 -05:00
Mahmood Ali 050ad6b6f4
tests: deflake test-api job (#9742)
Deflake test-api job, currently failing at around 7.6% (44 out of 578
workflows), by ensuring that test nomad agent use a small dedicated port
range that doesn't conflict with the kernel ephemeral range.

The failures are disproportionatly related to port allocation, where a
nomad agent fails to start when the http port is already bound to
another process. The failures are intermitent and aren't specific to any
test in particular. The following is a representative failure:
https://app.circleci.com/pipelines/github/hashicorp/nomad/13995/workflows/6cf6eb38-f93c-46f8-8aa0-f61e62fe7694/jobs/128169
.

Upon investigation, the issue seems to be that the api freeport library
picks a port block within 10,000-14,500, but that overlaps with the
kernel ephemeral range 32,769-60,999! So, freeport may allocate a free
port to the nomad agent, just to be used by another process before the
nomad agent starts!

This happened for example in
https://app.circleci.com/pipelines/github/hashicorp/nomad/14111/workflows/e1fcd7ff-f0e0-4796-8719-f57f510b1ffa/jobs/129684
.  `freeport` allocated port 41662 to serf, but `google_accounts`
raced to use it to connect to the CirleCI vm metadata service.

We avoid such races by using a dedicated port range that's disjoint from
the kernel ephemeral port range.
2021-01-06 16:18:28 -05:00
Mahmood Ali 00be4fc63c
tests: deflake TestTaskRunner_StatsHook_Periodic (#9734)
This PR deflakes TestTaskRunner_StatsHook_Periodic tests and adds backoff when the driver closes the channel.

TestTaskRunner_StatsHook_Periodic is currently the most flaky test - failing ~4% of the time (20 out of 486 workflows). A sample failure: https://app.circleci.com/pipelines/github/hashicorp/nomad/14028/workflows/957b674f-cbcc-4228-96d9-1094fdee5b9c/jobs/128563 .

This change has two components:

First, it updates the StatsHook so that it backs off when stats channel is closed. In the context of the test where the mock driver emits a single stats update and closes the channel, the test may make tens of thousands update during the period. In real context, if a driver doesn't implement the stats handler properly or when a task finishes, we may generate way too many Stats queries in a tight loop. Here, the backoff reduces these queries. I've added a failing test that shows 154,458 stats updates within 500ms in https://app.circleci.com/pipelines/github/hashicorp/nomad/14092/workflows/50672445-392d-4661-b19e-e3561ed32746/jobs/129423 .

Second, the test ignores the first stats update after a task exit. Due to the asynchronicity of updates and channel/context use, it's possible that an update is enqueued while the test marks the task as exited, resulting into a spurious update.
2021-01-06 16:03:00 -05:00