Commit graph

378 commits

Author SHA1 Message Date
Seth Hoenig f7c0e078a9 build: update golang version to 1.18.2
This PR update to Go 1.18.2. Also update the versions of hclfmt
and go-hclogfmt which includes newer dependencies necessary for dealing
with go1.18.

The hcl v2 branch is now 'nomad-v2.9.1+tweaks2', to include a fix for
newer macOS versions: 8927e75e82
2022-05-25 10:04:04 -05:00
Luiz Aoqui 769ff1dcc3
Merge pull request #13109 from hashicorp/merge-release-1.3.1-branch
Merge release 1.3.1 branch
2022-05-25 10:45:09 -04:00
Seth Hoenig 20b6bf3c22
Merge pull request #13104 from hashicorp/b-blocked-eval-math
core: fix blocked eval math
2022-05-24 16:23:06 -05:00
Michael Schurter 2965dc6a1a
artifact: fix numerous go-getter security issues
Fix numerous go-getter security issues:

- Add timeouts to http, git, and hg operations to prevent DoS
- Add size limit to http to prevent resource exhaustion
- Disable following symlinks in both artifacts and `job run`
- Stop performing initial HEAD request to avoid file corruption on
  retries and DoS opportunities.

**Approach**

Since Nomad has no ability to differentiate a DoS-via-large-artifact vs
a legitimate workload, all of the new limits are configurable at the
client agent level.

The max size of HTTP downloads is also exposed as a node attribute so
that if some workloads have large artifacts they can specify a high
limit in their jobspecs.

In the future all of this plumbing could be extended to enable/disable
specific getters or artifact downloading entirely on a per-node basis.
2022-05-24 16:29:39 -04:00
Seth Hoenig 83bab8ed64
Merge pull request #13058 from hashicorp/b-cgroupsv1-docker-cgparent
drivers/docker: do not set cgroup parent in v1 mode
2022-05-24 14:07:40 -05:00
Seth Hoenig c6c3ae020d drivers/docker: do not set cgroup parent in v1 mode
This PR fixes a bug where the CgroupParent on the docker
HostConfig struct was accidently being set when running in
cgroups v1 mode.
2022-05-24 11:22:50 -05:00
Seth Hoenig 27d0c0dc9f docs: add changelog 2022-05-24 09:13:15 -05:00
Will Jordan d515e5c3b0
Don't buffer json logs on agent startup (#13076)
There's no reason to buffer json logs on agent startup
since logs in this format already aren't reordered.
2022-05-19 15:40:30 -04:00
Seth Hoenig fc58f4972c cli: correctly use and validate job with vault token set
This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN',
'-vault-namespace' if set.
2022-05-19 12:13:34 -05:00
Tim Gross b72ff42ada
api: include Consul token in job revert API (#13065) 2022-05-19 11:30:29 -04:00
Seth Hoenig 29d3da6dfd cl: update changelog 2022-05-17 10:35:08 -05:00
Seth Hoenig 26b5c01431
Merge pull request #12817 from twunderlich-grapl/fix-network-interpolation
Fix network.dns interpolation
2022-05-17 09:31:32 -05:00
Seth Hoenig 08becb117c cl: add changelog note for network interpolation 2022-05-17 09:14:55 -05:00
Phil Renaud 45dc1cfd58
12986 UI fails to load job when there is an "@" in job name in nomad 130 (#13012)
* LastIndexOf and always append a namespace on job links

* Confirmed the volume equivalent and simplified idWIthNamespace logic

* Changelog added

* PR comments addressed

* Drop the redirect for the time being

* Tests updated to reflect namespace on links

* Task detail test default namespace link for test
2022-05-13 17:01:27 -04:00
Tim Gross faeb3fcd44
scheduler: volume updates should always be destructive (#13008) 2022-05-13 11:34:04 -04:00
James Rasell 636b647a30
agent: fix panic when logging about protocol version config use. (#12962)
The log line comes before the agent logger has been setup,
therefore we need to use the UI logging to avoid panic.
2022-05-13 09:28:43 +02:00
Phil Renaud dd824ac3f8
Changelog for visual diff tests (#12909) 2022-05-06 11:29:10 -04:00
Phil Renaud 6a8f98723e
Chronological most-recent evals by default (#12847)
* Chronological most-recent evals by default

* Adding reverse: true to the list of expected queryparams in test

* changelog
2022-05-05 16:11:27 -04:00
Jai 316daf581e
fix broken link to task-group in Recent Allocation table in jobs.job.index (#12765)
* chore:  run prettier on hbs files

* ui:  ensure to pass a real job object to task-group link

* chore:  add changelog entry

* chore: prettify template

* ui:  template helper for formatting jobId in LinkTo component

* ui:  handle async relationship

* ui:  pass in job id to model arg instead of job model

* update test for serialized namespace

* ui:  defend against null  in tests

* ui:  prettified template added whitespace

* ui:  rollback ember-data to 3.24 because watcher return undefined on abort

* ui: use format-job-helper instead of job model via alloc

* ui: fix whitespace in template caused by prettier using template helper

* ui: update test for new namespace

* ui: revert prettier change

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-04-28 14:02:15 -04:00
Dave May 97cf204c00
debug: add version constraint to avoid pprof panic (#12807) 2022-04-28 13:18:55 -04:00
Tim Gross c763c4cb96
remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
Michael Schurter e2544dd089
client: fix waiting on preempted alloc (#12779)
Fixes #10200

**The bug**

A user reported receiving the following error when an alloc was placed
that needed to preempt existing allocs:

```
[ERROR] client.alloc_watcher: error querying previous alloc:
alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup
failed: index error: UUID must be 36 characters"
```

The previous alloc (8e) was already complete on the client. This is
possible if an alloc stops *after* the scheduling decision was made to
preempt it, but *before* the node running both allocations was able to
pull and start the preemptor. While that is hopefully a narrow window of
time, you can expect it to occur in high throughput batch scheduling
heavy systems.

However the RPC error made no sense! `previous_alloc` in the logs was a
valid 36 character UUID!

**The fix**

The fix is:

```
-		prevAllocID:  c.Alloc.PreviousAllocation,
+		prevAllocID:  watchedAllocID,
```

The alloc watcher new func used for preemption improperly referenced
Alloc.PreviousAllocation instead of the passed in watchedAllocID. When
multiple allocs are preempted, a watcher is created for each with
watchedAllocID set properly by the caller. In this case
Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters`
error was coming from! Sadly we were properly referencing
watchedAllocID in the log, so it made the error make no sense!

**The repro**

I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl)
and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl)
for ease of repro.

First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad),
then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad)
that evicts 2 low priority jobs. Everything worked as expected.

However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147),
it reproduces the bug because the watcher references PreviousAllocation
instead of watchedAllocID.
2022-04-26 13:14:43 -07:00
Michael Schurter 6449ba8d41
api: add ParseHCLOpts helper method (#12777)
The existing ParseHCL func didn't allow setting HCLv1=true.
2022-04-25 11:51:52 -07:00
Tim Gross b2e4841747
CSI: plugin config updates should always be destructive (#12774) 2022-04-25 12:59:25 -04:00
Tim Gross 766025cde7
CSI: plugin supervisor prestart should not mark itself done (#12752)
The task runner hook `Prestart` response object includes a `Done`
field that's intended to tell the client not to run the hook
again. The plugin supervisor creates mount points for the task during
prestart and saves these mounts in the hook resources. But if a client
restarts the hook resources will not be populated. If the plugin task
restarts at any time after the client restarts, it will fail to have
the correct mounts and crash loop until restart attempts run out.

Fix this by not returning `Done` in the response, just as we do for
the `volume_mount_hook`.
2022-04-22 13:07:47 -04:00
James Rasell 24b499791d
deps: update consul-template to v0.29.0 (#12747)
* deps: update consul-template to v0.29.0

* changelog: add entry for #12747
2022-04-22 09:58:54 -07:00
Phil Renaud ab557b15e0
Adding changelog note (#12753) 2022-04-22 12:38:49 -04:00
Luiz Aoqui a8cc633156
vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00
Seth Hoenig 3fcac242c6 services: enable setting arbitrary address value in service registrations
This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.
2022-04-22 09:14:29 -05:00
Michael Schurter 5db3a671db
cli: add -json flag to support job commands (#12591)
* cli: add -json flag to support job commands

While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.

So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.

The jobspecs can even have the wrapping {"Job": {...}} envelope or not!

* docs: fix example for `nomad job validate`

We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins. 😭
2022-04-21 13:20:36 -07:00
Phil Renaud a5bef3ce72
[ui, bugfix] Link fix for volumes where per_alloc=true (#12713)
* Allocation page linkfix

* fix added to task page and computed prop moved to allocation model

* Fallback query added to task group when specific volume isnt knowable

* Delog

* link text reflects alloc suffix

* Helper instead of in-template conditionals

* formatVolumeName unit test

* Removing unused helper import
2022-04-21 13:57:18 -04:00
James Rasell 716b8e658b
api: Add support for filtering and pagination to the node list endpoint (#12727) 2022-04-21 17:04:33 +02:00
Gowtham 1ff8b5f759
Add Concurrent Download Support for artifacts (#11531)
* add concurrent download support - resolves #11244

* format imports

* mark `wg.Done()` via `defer`

* added tests for successful and failure cases and resolved some goleak

* docs: add changelog for #11531

* test typo fixes and improvements

Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-20 10:15:56 -07:00
James Rasell 010acce59f
job_hooks: add implicit constraint when using Consul for services. (#12602) 2022-04-20 14:09:13 +02:00
James Rasell 42068f8823
client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603) 2022-04-20 10:30:48 +02:00
Luiz Aoqui 8dccc48f17
changelog: fix entry for #11927 (#12577) 2022-04-19 10:46:25 -04:00
Luiz Aoqui 950a2109aa
changelog: add entry for #11944 (#12578) 2022-04-19 10:46:11 -04:00
Seth Hoenig 411158acff
Merge pull request #12586 from hashicorp/f-local-si-token
connect: create SI tokens in local scope
2022-04-19 07:53:01 -05:00
Seth Hoenig a7950e5624 cl: add missing prefix 2022-04-19 07:48:56 -05:00
Derek Strickland 7c6eb47b78
consul-template: revert function_denylist logic (#12071)
* consul-template: replace config rather than append
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
2022-04-18 13:57:56 -04:00
chavacava eb1c42e643
QueryOptions.SetTimeToBlock should take pointer receiver
Fixes a bug where blocking queries that are retried don't have their blocking 
timeout reset, resulting in them running longer than expected.
2022-04-18 10:41:27 -04:00
Seth Hoenig df587d8263 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Tim Gross f5d8c636c7
CSI: handle per-alloc volumes in alloc status -verbose CLI (#12573)
The Nomad client's `csi_hook` interpolates the alloc suffix with the
volume request's name for CSI volumes with `per_alloc = true`, turning
`example` into `example[1]`. We need to do this same behavior in the
`alloc status` output so that we show the correct volume.
2022-04-15 09:26:19 -04:00
Seth Hoenig a1c4f16cf1 connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
Tim Gross a135d9b260
CSI: fix data race in plugin manager (#12553)
The plugin manager for CSI hands out instances of a plugin for callers
that need to mount a volume. The `MounterForPlugin` method accesses
the internal instances map without a lock, and can be called
concurrently from outside the plugin manager's main run-loop.

The original commit for the instances map included a warning that it
needed to be accessed only from the main loop but that comment was
unfortunately ignored shortly thereafter, so this bug has existed in
the code for a couple years without being detected until we ran tests
with `-race` in #12098. Lesson learned here: comments make for lousy
enforcement of invariants!
2022-04-12 12:18:04 -04:00
Luiz Aoqui 16e3a1028e
changelog: update #12476 entry to highlight the feature (#12528) 2022-04-08 13:28:23 -04:00
Seth Hoenig 9236fe3904 docs: update cl 2022-04-07 10:02:00 -05:00
Tim Gross 1724765096
api: use cleanhttp.DefaultPooledTransport for default API client (#12492)
We expect every Nomad API client to use a single connection to any
given agent, so take advantage of keep-alive by switching the default
transport to `DefaultPooledClient`. Provide a facility to close idle
connections for testing purposes.

Restores the previously reverted #12409


Co-authored-by: Ben Buzbee <bbuzbee@cloudflare.com>
2022-04-06 16:14:53 -04:00
Luiz Aoqui 0b13ea6920
changelog: make breaking change note for raft v3 (#12493) 2022-04-06 16:00:38 -04:00
Luiz Aoqui 697e82a665
changelog: add entry for #12435 (#12491) 2022-04-06 14:22:09 -04:00