Commit graph

21689 commits

Author SHA1 Message Date
Mahmood Ali 84a3522133
Consider all system jobs for a new node (#11054)
When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
2021-08-18 09:50:37 -04:00
Mahmood Ali 97966c7a71
e2e: Run system jobs on all datacenters (#11060)
Target all e2e datacenters for system and sysbatch e2e tests.  They
require that the system jobs run on all linux clients.

However, the jobs currenly only target `dc1` datacenter, but the nightly
e2e cluster has 4 clients spread in `dc1` and `dc2` datacenters, causing
the tests to fail.

I missed this problem in e2e dev cluster because it only used a single
dc1 datacenter.
2021-08-17 11:01:47 -04:00
Mahmood Ali c37339a8c8
Merge pull request #9160 from hashicorp/f-sysbatch
core: implement system batch scheduler
2021-08-16 09:30:24 -04:00
James Rasell 534368780b
Merge pull request #11051 from hashicorp/b-gh-11047
tlsutil: update testing certificates close to expiry.
2021-08-16 09:42:01 +02:00
James Rasell 54d6785bcc
tlsutil: update testing certificates close to expiry. 2021-08-13 11:09:40 +02:00
Mahmood Ali 5ae9df80bf
docs: Consul Connect tweaks (#11040)
Tweaks to the commands in Consul Connect page.

For multi-command scripts, having the leading `$` is a bit annoying, as it makes copying the text harder. Also, the `copy` button would only copy the first command and ignore the rest.

Also, the `echo 1 > ...` commands are required to run as root, unlike the rest! I made them use `| sudo tee` pattern to ease copy & paste as well.

Lastly, update the CNI plugin links to 1.0.0. It's fresh off the oven - just got released less than an hour ago: https://github.com/containernetworking/plugins/releases/tag/v1.0.0 .
2021-08-11 17:14:26 -04:00
Mahmood Ali 47d2dcd60a
Merge pull request #11034 from tgross/docs-cni-install
docs: note CNI requirement for bridge networking
2021-08-11 11:07:25 -04:00
Tim Gross de957b48ff docs: note CNI requirement for bridge networking
Using `bridge` networking requires that you have CNI plugins installed
on the client, but this isn't in the jobspec `network` docs which are
the first place someone will look when trying to configure task
networking.
2021-08-11 10:18:35 -04:00
Michael Schurter a7aae6fa0c
Merge pull request #10848 from ggriffiths/listsnapshot_secrets
CSI Listsnapshot secrets support
2021-08-10 15:59:33 -07:00
Mahmood Ali ea003188fa
system: re-evaluate node on feasibility changes (#11007)
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.

This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.

Fixes https://github.com/hashicorp/nomad/issues/8448
2021-08-10 17:17:44 -04:00
Mahmood Ali bfc766357e
deployments: canary=0 is implicitly autopromote (#11013)
In a multi-task-group job, treat 0 canary groups as auto-promote.

This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-08-10 17:06:40 -04:00
Mahmood Ali efcc8bf082
Speed up client startup and registration (#11005)
Speed up client startup, by retrying more until the servers are known.

Currently, if client fingerprinting is fast and finishes before the
client connect to a server, node registration may be delayed by 15
seconds or so!

Ideally, we'd wait until the client discovers the servers and then retry
immediately, but that requires significant code changes.

Here, we simply retry the node registration request every second. That's
basically the equivalent of check if the client discovered servers every
second. Should be a cheap operation.

When testing this change on my local computer and where both servers and
clients are co-located, the time from startup till node registration
dropped from 34 seconds to 8 seconds!
2021-08-10 17:06:18 -04:00
Luiz Aoqui c1d1906628
ui: add missing pipe separator in parameterized and periodic jobs (#11020) 2021-08-10 13:48:20 -04:00
Michael Schurter ec08bd6ac7
Merge pull request #10995 from miao1007/patch-1
docs: Add replication_token link with authoritative_region
2021-08-10 10:48:02 -07:00
Jai 29a7fe6efa
Merge pull request #10666 from hashicorp/b-ui/search-namespaces
ui: Fix fuzzy search namespace-handling
2021-08-10 13:13:20 -04:00
Mike Wickett 7e914d8f46
chore: update alert banner (#11022) 2021-08-10 12:56:13 -04:00
Jai Bhagat a9b9132f35 edit hierarchy to lead with namespace before job 2021-08-10 10:35:36 -04:00
Luiz Aoqui d283e90c35
ui: only dipslay "Dispatch Job" button on parameterized jobs (#11019) 2021-08-09 17:49:08 -04:00
Luiz Aoqui 5b50b385e8
make: embed the Nomad UI data by default (#11018) 2021-08-09 16:53:44 -04:00
Lir (Rookout) f720179ba0
Some Rookout docs tweaks (#10989) 2021-08-09 11:19:36 +02:00
Michael Schurter c39ca0773d
Merge pull request #10951 from hashicorp/b-cn-proxy
consul/connect: avoid warn messages on connect proxy errors
2021-08-06 15:25:40 -07:00
Michael Schurter a87f748d6d
Merge pull request #11010 from hashicorp/docs-10875
docs: add backward incompat note about #10875
2021-08-06 08:28:48 -07:00
Michael Schurter 6d14c181dd docs: add backward incompat note about #10875
Fixes #11002
2021-08-05 15:08:55 -07:00
James Rasell c2b975163f
Merge pull request #11006 from hashicorp/f-gh-10929-changelog
changelog: add entry for #10929
2021-08-05 17:32:19 +02:00
James Rasell a9a04141a3
consul/connect: avoid warn messages on connect proxy errors
When creating a TCP proxy bridge for Connect tasks, we are at the
mercy of either end for managing the connection state. For long
lived gRPC connections the proxy could reasonably expect to stay
open until the context was cancelled. For the HTTP connections used
by connect native tasks, we experience connection disconnects.
The proxy gets recreated as needed on follow up requests, however
we also emit a WARN log when the connection is broken. This PR
lowers the WARN to a TRACE, because these disconnects are to be
expected.

Ideally we would be able to proxy at the HTTP layer, however Consul
or the connect native task could be configured to expect mTLS, preventing
Nomad from MiTM the requests.

We also can't mange the proxy lifecycle more intelligently, because
we have no control over the HTTP client or server and how they wish
to manage connection state.

What we have now works, it's just noisy.

Fixes #10933
2021-08-05 11:27:35 +02:00
James Rasell c7449b4810
changelog: add entry for #10929 2021-08-05 10:48:36 +02:00
James Rasell 61436b437a
Merge pull request #10929 from AchilleAsh/fix-token-docker-auth-config
fix: load token in docker auth config
2021-08-05 10:44:39 +02:00
Luiz Aoqui 7341615fac
changelog: add entry for #10934 (#11001) 2021-08-04 11:33:18 -04:00
James Rasell 13d2b26e27
Merge pull request #10996 from hashicorp/b-fix-doublespace-general-cli-opts
cli: fix minor format error within `-ca-cert` help text.
2021-08-04 09:21:19 +02:00
Luiz Aoqui a81e6a427d
ui: fix job dispatch page when job doesn't have any meta fields (#10934) 2021-08-03 13:50:43 -04:00
Mahmood Ali 28bc234e84 e2e: fix tests
Use basic sleeps in busybox images. busybox are very light, and ping has
permissions complications, and it may fail for network related
issues.
2021-08-03 11:38:35 -04:00
Seth Hoenig 3371214431 core: implement system batch scheduler
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
2021-08-03 10:30:47 -04:00
James Rasell 78a489418d
cli: fix minor format error within -ca-cert help text. 2021-08-03 16:05:06 +02:00
みゃお 8d970d97d3
[doc]Add replication_token link with authoritative_region
replication_token always works together with authoritative_region, add a link for better doc.
2021-08-03 18:56:00 +08:00
Mahmood Ali 0bc12fba7c
Only initialize task.VolumeMounts when not-nil (#10990)
1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning.

The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally.

Fixes #10981
2021-08-02 13:08:10 -04:00
Mike Wickett 85278ab3f7
website: update consent manager (#10977) 2021-08-02 12:56:20 -04:00
Derek Strickland 7210f855e8
Merge pull request #10976 from itorres/api-docs-allocation-restart-sample
API docs: Fix allocation restart example
2021-08-02 08:48:45 -04:00
James Rasell 15dc41d17f
Merge pull request #10987 from hashicorp/f-docs-order-external-drivers-alphabetically
docs: order external driver overview alphabetically.
2021-08-02 12:50:16 +02:00
James Rasell 167b6c50ff
docs: order external driver overview alphabetically. 2021-08-02 10:51:37 +02:00
Lir (Rookout) 216d0392a8
Rookout driver docs (#10950)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2021-08-02 10:09:45 +02:00
Mahmood Ali 8585aa8991
Merge pull request #10969 from hashicorp/merge-release-1.1.3
Merge release 1.1.3
2021-07-30 11:23:25 -04:00
Ignacio Torres Masdeu 3f784f17f7
Fix allocation restart API docs example 2021-07-30 16:45:21 +02:00
Mike Nomitch 6a8158fd5a
Adds documentation for file mode to sink docs (#10972) 2021-07-29 16:09:18 -04:00
Mahmood Ali b590d43b0d
website: publish 1.1.3 release (#10970) 2021-07-29 12:35:18 -04:00
Mahmood Ali 327ad78ea5 prepare for next dev cycle 2021-07-29 12:32:09 -04:00
Mahmood Ali 70f541287b
e2e: wait for allocs and deployments (#10967)
As we moved to using `-detach` for registering jobs, we should wait
until allocs and deployments are created before asserting their
properties.

Fixing `TestNodeDrainIgnoreSystem` and `TestRescheduleProgressDeadlineFail` tests as they seem particularly flaky, failing 9 and 7 times (respectively) in the last two weeks.
2021-07-29 10:52:04 -04:00
Nomad Release Bot 423799b2dd remove generated files 2021-07-29 04:17:44 +00:00
Nomad Release Bot 6b52b26517
Release v1.1.3 2021-07-29 04:17:00 +00:00
Nomad Release bot b5dff8be42 Generate files for 1.1.3 release 2021-07-29 03:43:03 +00:00
Mahmood Ali 12d4a0b759 prepare release docker image for fetching 2021-07-28 23:33:59 -04:00