Commit graph

70 commits

Author SHA1 Message Date
Luiz Aoqui bbae221c8c
deps: update go-memdb to 1.3.2 (#11185) 2021-09-14 20:26:45 -04:00
Michael Schurter 7035c94320
Merge pull request #11111 from hashicorp/b-system-no-match
scheduler: warn when system jobs cannot place an alloc
2021-09-13 16:06:04 -07:00
Michael Schurter d32d0326e8 docs: focus changelog entry for #11111 on the ux
While I don't think this fully encompasses the changes, other bits
like marking sysbatch as dead immediately are new so haven't changed
from a previous release.
2021-09-10 16:45:43 -07:00
James Rasell 686189aade
Merge pull request #11143 from hashicorp/b-gh-11026
deps: update go-plugin to v1.4.3 to fix Windows handle leak.
2021-09-09 09:39:22 +02:00
Luiz Aoqui 4dd8b6b571
cli: include all possible scores in alloc status metric table (#11128) 2021-09-08 17:30:11 -04:00
Luiz Aoqui 305f0b5702
ui: set the job namespace when redirecting after the job is dispatched (#11141) 2021-09-07 12:27:33 -04:00
James Rasell fa149744a9
changelog: add entry for #11143. 2021-09-07 09:51:17 +02:00
Isabel Suchanek ab51050ce8
events: fix wildcard namespace handling (#10935)
This fixes a bug in the event stream API where it currently interprets
namespace=* as an actual namespace, not a wildcard. When Nomad parses
incoming requests, it sets namespace to default if not specified, which
means the request namespace will never be an empty string, which is what
the event subscription was checking for. This changes the conditional
logic to check for a wildcard namespace instead of an empty one.

It also updates some event tests to include the default namespace in the
subscription to match current behavior.

Fixes #10903
2021-09-02 09:36:55 -07:00
Luiz Aoqui 12f5f3ae90
changelog: add entry for #11111 2021-09-02 12:13:42 -04:00
Luiz Aoqui eb0ed980a5
ui: set namespace when looking for and displaying children jobs (#11110) 2021-09-01 14:40:25 -04:00
Mahmood Ali 641afebeed
update golang to 1.16.7 (#11083) 2021-08-25 11:56:46 -04:00
Roopak Venkatakrishnan dcf5981bcd
Update x/sys to support go 1.17 (#11065)
Co-authored-by: James Rasell <jrasell@hashicorp.com>
2021-08-25 17:23:01 +02:00
Luiz Aoqui 104d29e808
Don't timestamp active log file (#11070)
* don't timestamp active log file

* website: update log_file default value

* changelog: add entry for #11070

* website: add upgrade instructions for log_file in v1.14 and v1.2.0
2021-08-23 11:27:34 -04:00
Mahmood Ali 84a3522133
Consider all system jobs for a new node (#11054)
When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
2021-08-18 09:50:37 -04:00
Michael Schurter a7aae6fa0c
Merge pull request #10848 from ggriffiths/listsnapshot_secrets
CSI Listsnapshot secrets support
2021-08-10 15:59:33 -07:00
Mahmood Ali ea003188fa
system: re-evaluate node on feasibility changes (#11007)
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.

This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.

Fixes https://github.com/hashicorp/nomad/issues/8448
2021-08-10 17:17:44 -04:00
Mahmood Ali bfc766357e
deployments: canary=0 is implicitly autopromote (#11013)
In a multi-task-group job, treat 0 canary groups as auto-promote.

This change fixes an edge case where Nomad requires a manual promotion,
if the job had any group with canary=0 and rest of groups having
auto_promote set.

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2021-08-10 17:06:40 -04:00
Mahmood Ali efcc8bf082
Speed up client startup and registration (#11005)
Speed up client startup, by retrying more until the servers are known.

Currently, if client fingerprinting is fast and finishes before the
client connect to a server, node registration may be delayed by 15
seconds or so!

Ideally, we'd wait until the client discovers the servers and then retry
immediately, but that requires significant code changes.

Here, we simply retry the node registration request every second. That's
basically the equivalent of check if the client discovered servers every
second. Should be a cheap operation.

When testing this change on my local computer and where both servers and
clients are co-located, the time from startup till node registration
dropped from 34 seconds to 8 seconds!
2021-08-10 17:06:18 -04:00
Luiz Aoqui c1d1906628
ui: add missing pipe separator in parameterized and periodic jobs (#11020) 2021-08-10 13:48:20 -04:00
Jai 29a7fe6efa
Merge pull request #10666 from hashicorp/b-ui/search-namespaces
ui: Fix fuzzy search namespace-handling
2021-08-10 13:13:20 -04:00
Jai Bhagat a9b9132f35 edit hierarchy to lead with namespace before job 2021-08-10 10:35:36 -04:00
Luiz Aoqui d283e90c35
ui: only dipslay "Dispatch Job" button on parameterized jobs (#11019) 2021-08-09 17:49:08 -04:00
Michael Schurter c39ca0773d
Merge pull request #10951 from hashicorp/b-cn-proxy
consul/connect: avoid warn messages on connect proxy errors
2021-08-06 15:25:40 -07:00
James Rasell a9a04141a3
consul/connect: avoid warn messages on connect proxy errors
When creating a TCP proxy bridge for Connect tasks, we are at the
mercy of either end for managing the connection state. For long
lived gRPC connections the proxy could reasonably expect to stay
open until the context was cancelled. For the HTTP connections used
by connect native tasks, we experience connection disconnects.
The proxy gets recreated as needed on follow up requests, however
we also emit a WARN log when the connection is broken. This PR
lowers the WARN to a TRACE, because these disconnects are to be
expected.

Ideally we would be able to proxy at the HTTP layer, however Consul
or the connect native task could be configured to expect mTLS, preventing
Nomad from MiTM the requests.

We also can't mange the proxy lifecycle more intelligently, because
we have no control over the HTTP client or server and how they wish
to manage connection state.

What we have now works, it's just noisy.

Fixes #10933
2021-08-05 11:27:35 +02:00
James Rasell c7449b4810
changelog: add entry for #10929 2021-08-05 10:48:36 +02:00
Luiz Aoqui 7341615fac
changelog: add entry for #10934 (#11001) 2021-08-04 11:33:18 -04:00
Mahmood Ali 0bc12fba7c
Only initialize task.VolumeMounts when not-nil (#10990)
1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning.

The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally.

Fixes #10981
2021-08-02 13:08:10 -04:00
Mahmood Ali 22a91f7003
update changelog (#10963) 2021-07-28 16:02:04 -04:00
Grant Griffiths fecbbaee22 CSI ListSnapshots secrets implementation
Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>
2021-07-28 11:30:29 -07:00
Mahmood Ali 62fe6f12f9
api: revert to defaulting to http/1 (#10958)
* api: revert to defaulting to http/1

PR #10778 incidentally changed the api http client to connect with
HTTP/2 first. However, the websocket libraries used in `alloc exec`
features don't handle http/2 well, and don't downgrade to http/1
gracefully.

Given that the switch is incidental, and not requested by users.
Furthermore, api consumers can opt-in to forcing http/2 by setting
custom http clients.

Fixes #10922
2021-07-28 11:21:53 -04:00
Michael Schurter ea996c321d
Merge pull request #10916 from hashicorp/f-audit-log-mode
Add audit log file mode config parameter
2021-07-27 12:16:37 -07:00
Michael Schurter d64d70607a docs: add changelog for #10916 2021-07-27 11:51:38 -07:00
Mahmood Ali ac3cf10849
nomad: only activate one-time auth tokens with 1.1.0 (#10952)
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.

If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.

Fixed #10943 .
2021-07-27 13:17:55 -04:00
Mahmood Ali d97927ebcf
cli: Use glint to determine if os.Stdout is tty (#10926)
Use glint to determine if os.Stdout is a terminal.

glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](b492b545f6/renderer_term.go (L39-L46)). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it.

By using golint to perform the check, we eliminate the risk of mis-judgement.
2021-07-23 11:27:47 -04:00
Jai 0ccf60444d
Merge pull request #10893 from hashicorp/f-ui/namespace-acl-bug
edit ember-can to add additional attribute for namespace
2021-07-22 12:57:34 -04:00
Jai Bhagat 5d33884cdc ui: fixes #10885 2021-07-22 11:44:25 -04:00
Seth Hoenig 54d9bad657
Merge pull request #10904 from hashicorp/b-no-affinity-intern
core: remove internalization of affinity strings
2021-07-22 09:09:07 -05:00
Luiz Aoqui 484037aff1
fix nomad alloc signal help message (#10917) 2021-07-21 11:02:44 -04:00
Luiz Aoqui a26874215a
changelog: add entry for #10675 (#10919) 2021-07-21 10:05:48 -04:00
Mahmood Ali 8df9b1fd0f
client: avoid acting on stale data after launch (#10907)
When the client launches, use a consistent read to fetch its own allocs,
but allow stale read afterwards as long as reads don't revert into older
state.

This change addresses an edge case affecting restarting client. When a
client restarts, it may fetch a stale data concerning its allocs: allocs
that have completed prior to the client shutdown may still have "run/running"
desired/client status, and have the client attempt to re-run again.

An alternative approach is to track the indices such that the client
set MinQueryIndex on the maximum index the client ever saw, or compare
received allocs against locally restored client state. Garbage
collection complicates this approach (local knowledge is not complete),
and the approach still risks starting "dead" allocations (e.g. the
allocation may have been placed when client just restarted and have
already been reschuled by the time the client started. This approach
here is effective against all kinds of stalness problems with small
overhead.
2021-07-20 15:13:28 -04:00
Michael Schurter efe8ea2c2c
Merge pull request #10849 from benbuzbee/benbuz/fix-destroy
Don't treat a failed recover + successful destroy as a successful recover
2021-07-19 10:49:31 -07:00
Michael Schurter 6aee3de420 docs: add changelog entry for #10849 2021-07-16 15:58:58 -07:00
Seth Hoenig ac5c83cafd core: remove internalization of affinity strings
Basically the same as #10896 but with the Affinity struct.
Since we use reflect.DeepEquals for job comparison, there is
risk of false positives for changes due to a job struct with
memoized vs non-memoized strings.

Closes #10897
2021-07-15 15:15:39 -05:00
Mahmood Ali 996ea1fa46
Merge pull request #10875 from hashicorp/b-namespace-flag-override
cli: `-namespace` should override job namespace
2021-07-14 17:28:36 -04:00
Mahmood Ali 26509f2299
Merge pull request #10864 from hashicorp/b-10746-plan-datacenter
scheduler: datacenter updates should be destructive
2021-07-14 17:25:13 -04:00
Seth Hoenig 3fce1d3f11
Merge pull request #10898 from hashicorp/f-rm-vendor
build: no longer use vendor directory
2021-07-14 13:00:41 -05:00
Seth Hoenig 1b5f902842 docs: update changelog 2021-07-14 11:21:00 -05:00
Seth Hoenig a4af3fcad0 docs: add changelog entry 2021-07-14 10:46:40 -05:00
James Rasell 66d3b98db5
Merge pull request #10892 from hashicorp/b-gh-10890
deps: update consul-template to v0.25.2.
2021-07-14 09:26:16 +02:00
Luiz Aoqui dd8213abc1
changelog: add entry for GH-10563 (#10894) 2021-07-13 16:12:41 -04:00