When a system or sysbatch job specify constraints that none of the
current nodes meet, report a warning to the user.
Also, for sysbatch job, mark the job as dead as a result.
A sample run would look like:
```
$ nomad job run ./example.nomad
==> 2021-08-31T16:57:35-04:00: Monitoring evaluation "b48e8882"
2021-08-31T16:57:35-04:00: Evaluation triggered by job "example"
==> 2021-08-31T16:57:36-04:00: Monitoring evaluation "b48e8882"
2021-08-31T16:57:36-04:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-31T16:57:36-04:00: Evaluation "b48e8882" finished with status "complete" but failed to place all allocations:
2021-08-31T16:57:36-04:00: Task Group "cache" (failed to place 1 allocation):
* Constraint "${meta.tag} = bar": 2 nodes excluded by filter
* Constraint "${attr.kernel.name} = linux": 1 nodes excluded by filter
$ nomad job status example
ID = example
Name = example
Submit Date = 2021-08-31T16:57:35-04:00
Type = sysbatch
Priority = 50
Datacenters = dc1
Namespace = default
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
cache 0 0 0 0 0 0
Allocations
No allocations placed
```
This PR adds a sentence about configuring your firewall to allow required Nomad ports. This is being added to help search discoverability.
This closes issue #11076
This allows us to spin up e2e clusters with mTLS configured for all HashiCorp services, i.e. Nomad, Consul, and Vault. Used it for testing #11089 .
mTLS is disabled by default. I have not updated Windows provisioning scripts yet - Windows also lacks ACL support from before. I intend to follow up for them in another round.
* don't timestamp active log file
* website: update log_file default value
* changelog: add entry for #11070
* website: add upgrade instructions for log_file in v1.14 and v1.2.0
Attempt to deflake the test by avoiding shutting down the leaders, as leadership
recovery takes more time, and consequently longer to process raft configuration
changes and potentially failing the test.
When a node becomes ready, create an eval for all system jobs across
namespaces.
The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
Target all e2e datacenters for system and sysbatch e2e tests. They
require that the system jobs run on all linux clients.
However, the jobs currenly only target `dc1` datacenter, but the nightly
e2e cluster has 4 clients spread in `dc1` and `dc2` datacenters, causing
the tests to fail.
I missed this problem in e2e dev cluster because it only used a single
dc1 datacenter.
Update the ingress gateway documentation to remove the note stating
that a port must be specified for values in the `hosts` field when
the ingress gateway is listening on a non-standard HTTP port.
Specifying a port was required in Consul 1.8.0, but that requirement
was removed in 1.8.1 with hashicorp/consul#8190 which made Consul
include the port number when constructing the Envoy configuration.
Related Consul docs PR: hashicorp/consul#10827
Tweaks to the commands in Consul Connect page.
For multi-command scripts, having the leading `$` is a bit annoying, as it makes copying the text harder. Also, the `copy` button would only copy the first command and ignore the rest.
Also, the `echo 1 > ...` commands are required to run as root, unlike the rest! I made them use `| sudo tee` pattern to ease copy & paste as well.
Lastly, update the CNI plugin links to 1.0.0. It's fresh off the oven - just got released less than an hour ago: https://github.com/containernetworking/plugins/releases/tag/v1.0.0 .
Using `bridge` networking requires that you have CNI plugins installed
on the client, but this isn't in the jobspec `network` docs which are
the first place someone will look when trying to configure task
networking.
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.
This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.
Fixes https://github.com/hashicorp/nomad/issues/8448
In a multi-task-group job, treat 0 canary groups as auto-promote.
This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Speed up client startup, by retrying more until the servers are known.
Currently, if client fingerprinting is fast and finishes before the
client connect to a server, node registration may be delayed by 15
seconds or so!
Ideally, we'd wait until the client discovers the servers and then retry
immediately, but that requires significant code changes.
Here, we simply retry the node registration request every second. That's
basically the equivalent of check if the client discovered servers every
second. Should be a cheap operation.
When testing this change on my local computer and where both servers and
clients are co-located, the time from startup till node registration
dropped from 34 seconds to 8 seconds!