Commit Graph

22991 Commits

Author SHA1 Message Date
James Rasell e1bf9138a1
release: fix hcl linting error within CI file. (#12867) 2022-05-04 10:48:42 +02:00
Michele Degges 9c85ddcb7f
Add config key to the promote-staging event (#12857) 2022-05-03 20:33:14 -07:00
Michele Degges 417d3ca232 Add config key to the promote-staging event 2022-05-03 08:51:19 -07:00
Tim Gross 45b238ec82
CSI: node drain should end once only plugins remain (#12846)
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
2022-05-03 10:20:22 -04:00
Alex Carpenter d59b517ab2
[WIP] feat: homepage and use case pages redesign (#11873)
* feat: connect homepage and use case pages

* fix: internalLink usage

* fix: query name

* chore: add homepage patterns

* chore: remove offerings

* chore: add intro features

* chore: bump subnav

* chore: updating patterns

* chore: add use case to the subnav

* chore: cleanup unused import

* chore: remove subnav border
2022-05-03 09:06:00 -04:00
Luiz Aoqui 6cd9881d2d
Update CHANGELOG for 1.3.0-rc.1 (#12849) 2022-05-02 16:52:00 -04:00
Seth Hoenig 35728cbc58
Merge pull request #12740 from hashicorp/cleanup-makefile-help
build: add missing help descriptions to makefile
2022-05-02 10:33:22 -05:00
Seth Hoenig b8d807c320
Merge pull request #12840 from hashicorp/docs-nvidia-updates
docs: update nvidia driver documentation
2022-05-02 10:07:02 -05:00
Luiz Aoqui 758e85bc84
ui: fix an error when navigating to a task group (#12832)
Clicking in a task group row in the job details page would throw the
error:

Uncaught Error: You didn't provide enough string/numeric parameters to satisfy all of the dynamic segments for route jobs.job.task-group. Missing params: name
    createParamHandlerInfo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4814
    applyToHandlers http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4804
    applyToState http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4801
    getTransitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4843
    transitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4836
    refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4885
    refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2254
    queryParamsDidChange http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2326
    k http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2423
    triggerEvent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2349
    fireQueryParamDidChange http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4863
    getTransitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4848
    transitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4836
    doTransition http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4853
    transitionTo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4882
    _doTransition http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2392
    transitionTo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2177
    gotoTaskGroup http://localhost:4646/ui/assets/nomad-ui-4a2c1941e03e60e1feef715f23cf268c.js:623
...

This was caused because the attribute being passed to the transitionTo
function was not the task group name, but the whole model.
2022-05-02 11:01:19 -04:00
Seth Hoenig 684abb9e28 docs: update nvidia driver documentation
notably:
- name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu'
- link to Nvidia docs for installing the container runtime toolkit
- list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit
2022-05-02 09:11:05 -05:00
Matus Goljer a741cc76b5
nomad can also install autocomplete for fish shell (#12834) 2022-05-02 09:26:55 -04:00
Luiz Aoqui 59e2bcd809
ci: remove unused CircleCI Makefile (#12828)
This Makefile was used to generate the full config.yml from smaller
sub-files, but this is not done anymore.
2022-04-29 15:25:23 -04:00
Tim Gross d06ad50538
docs: clarify `capacity_min/max` for volumes (#12825)
The capacity fields for `create volume` set bounds on the resulting
size of the volume, but the ultimate size of the volume will be
determined by the storage provider (between the min and max). Clarify
this in the documentation and provide a suggestion for how to set a
exact size.
2022-04-29 13:38:30 -04:00
Derek Strickland 584bf0162f
docs: Add known limitations callouts to Max Client Disconnect section (#12801)
* docs: Add known limitations callouts to Max Client Disconnect section
2022-04-28 16:17:14 -04:00
Phil Renaud 067234792a
Moves the evaluations table toolbar outside of the table-container (#12799) 2022-04-28 16:08:46 -04:00
Luiz Aoqui 6c3473b778
ci: update the `hashicorp/actions-generate-metadata` action version (#12813) 2022-04-28 15:24:55 -04:00
Jai 316daf581e
fix broken link to `task-group` in `Recent Allocation` table in `jobs.job.index` (#12765)
* chore:  run prettier on hbs files

* ui:  ensure to pass a real job object to task-group link

* chore:  add changelog entry

* chore: prettify template

* ui:  template helper for formatting jobId in LinkTo component

* ui:  handle async relationship

* ui:  pass in job id to model arg instead of job model

* update test for serialized namespace

* ui:  defend against null  in tests

* ui:  prettified template added whitespace

* ui:  rollback ember-data to 3.24 because watcher return undefined on abort

* ui: use format-job-helper instead of job model via alloc

* ui: fix whitespace in template caused by prettier using template helper

* ui: update test for new namespace

* ui: revert prettier change

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-04-28 14:02:15 -04:00
Dave May 97cf204c00
debug: add version constraint to avoid pprof panic (#12807) 2022-04-28 13:18:55 -04:00
Luiz Aoqui 0830e3c787
ci: fix build workflow trigger on push (#12806) 2022-04-28 11:15:54 -04:00
Luiz Aoqui cca49a054f
ci: setup release process with CRT (#12781) 2022-04-27 20:14:23 -04:00
Derek Strickland 90daed7c1d
e2e: Wait for deployment to finish before disconnect (#12795)
* Wait for deployment to finish
* Don't reschedule disconnect or restart-node jobs
2022-04-27 12:27:03 -04:00
Phil Renaud 182bead357
[ui, mirage] Evaluation mocks (#12471)
* Linear and Branching mock evaluations

* De-comment

* test-trigger

* Making evaluation trees dynamic

* Reinstated job relationship on eval mock

* Dasherize job prefix back to normal

* Handle bug where UUIDKey is not present on job

* Appending node to eval

* Job ID as a passed property

* Remove unused import

* Branching evals set up as generatable
2022-04-27 12:11:24 -04:00
Tim Gross c763c4cb96
remove pre-0.9 driver code and related E2E test (#12791)
This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.
2022-04-27 09:53:37 -04:00
Michael Schurter e2544dd089
client: fix waiting on preempted alloc (#12779)
Fixes #10200

**The bug**

A user reported receiving the following error when an alloc was placed
that needed to preempt existing allocs:

```
[ERROR] client.alloc_watcher: error querying previous alloc:
alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup
failed: index error: UUID must be 36 characters"
```

The previous alloc (8e) was already complete on the client. This is
possible if an alloc stops *after* the scheduling decision was made to
preempt it, but *before* the node running both allocations was able to
pull and start the preemptor. While that is hopefully a narrow window of
time, you can expect it to occur in high throughput batch scheduling
heavy systems.

However the RPC error made no sense! `previous_alloc` in the logs was a
valid 36 character UUID!

**The fix**

The fix is:

```
-		prevAllocID:  c.Alloc.PreviousAllocation,
+		prevAllocID:  watchedAllocID,
```

The alloc watcher new func used for preemption improperly referenced
Alloc.PreviousAllocation instead of the passed in watchedAllocID. When
multiple allocs are preempted, a watcher is created for each with
watchedAllocID set properly by the caller. In this case
Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters`
error was coming from! Sadly we were properly referencing
watchedAllocID in the log, so it made the error make no sense!

**The repro**

I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl)
and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl)
for ease of repro.

First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad),
then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad)
that evicts 2 low priority jobs. Everything worked as expected.

However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147),
it reproduces the bug because the watcher references PreviousAllocation
instead of watchedAllocID.
2022-04-26 13:14:43 -07:00
Tim Gross cfd353207f
E2E: move volume mounts test to use golang's stdlib test runner (#12788)
Part of ongoing work to remove the old E2E framework code.
2022-04-26 14:28:20 -04:00
Tim Gross 83eb879d61
E2E: remove old CLI for driving provisioning (#12787)
We moved off the old provisioning process for nightly E2E to one driven
entirely by Terraform quite a while back now. We're in the slow
process of removing the framework code for this test-by-test, but this
chunk of code no longer has any callers.
2022-04-26 13:43:25 -04:00
Tim Gross 3d630a3629
CSI: enforce one plugin supervisor loop via `sync.Once` (#12785)
We enforce exactly one plugin supervisor loop by checking whether
`running` is set and returning early. This works but is fairly
subtle. It can briefly result in two goroutines where one quickly
exits before doing any work. Clarify the intent by using
`sync.Once`. The goroutine we've spawned only exits when the entire
task runner is being torn down, and not when the task driver restarts
the workload, so it should never be re-run.
2022-04-26 10:38:50 -04:00
Michael Schurter 6449ba8d41
api: add ParseHCLOpts helper method (#12777)
The existing ParseHCL func didn't allow setting HCLv1=true.
2022-04-25 11:51:52 -07:00
Tim Gross b2e4841747
CSI: plugin config updates should always be destructive (#12774) 2022-04-25 12:59:25 -04:00
Luiz Aoqui b8dd60f79c
update LAST_RELEASE comment to match new release branches structure (#12773) 2022-04-25 11:57:55 -04:00
Michael Schurter 1256c8ef66
docs: update json jobs docs (#12766)
* docs: update json jobs docs

Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you
want to acknowledge that sometimes our JSON job representations have a
Job top-level wrapper and sometimes do not.

The 2½ formats are:
```
 1.   HCL JSON
 2.   Input API JSON (top-level Job field)
 2.5. Output API JSON (lacks top-level Job field)
```

`#2` is what our docs consider our API JSON. `#2.5` seems to be an
accident of history we can't fix with breaking API compatibility.

`#1` is an even more interesting accident of history: the `jobspec2`
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that `nomad run` accepts!

Since we have no telemetry around whether or not anyone passes HCL JSON
to `nomad run`, and people don't file bugs around features that Just
Work, I'm choosing to leave that code path in place and *acknowledged
but not suggested* in documentation.

See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive
discussion of what officially supporting HCL JSON in Nomad would look
like.

(I also added some of the missing fields to the (Input API flavor) JSON
Job documentation, but it still needs a lot of work to be
comprehensive.)

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-04-22 15:57:27 -07:00
Jai b3985db31f
bug: fix filter and search (#12587)
* chore:  remove commented out code and skipped tests

* refact:  triggeredBy requires filter expression not qp

* refact:  use filter expression dsl instead of named params

* fix:  add  type

* docs:  add in-line reference to filter expression DSL

* fix:  update filter copy for non-matches

* fix:  correct conditional logic to render no match copy
2022-04-22 15:40:13 -04:00
Phil Renaud aed56e5732
Sets up a new z-modal z-index and assigns it to the sidebar (#12758) 2022-04-22 15:23:49 -04:00
Phil Renaud c0792b1092
Accidentally added while setting lint rules elsewhere (#12759) 2022-04-22 15:04:21 -04:00
Tim Gross 766025cde7
CSI: plugin supervisor prestart should not mark itself done (#12752)
The task runner hook `Prestart` response object includes a `Done`
field that's intended to tell the client not to run the hook
again. The plugin supervisor creates mount points for the task during
prestart and saves these mounts in the hook resources. But if a client
restarts the hook resources will not be populated. If the plugin task
restarts at any time after the client restarts, it will fail to have
the correct mounts and crash loop until restart attempts run out.

Fix this by not returning `Done` in the response, just as we do for
the `volume_mount_hook`.
2022-04-22 13:07:47 -04:00
James Rasell 24b499791d
deps: update consul-template to v0.29.0 (#12747)
* deps: update consul-template to v0.29.0

* changelog: add entry for #12747
2022-04-22 09:58:54 -07:00
Phil Renaud ab557b15e0
Adding changelog note (#12753) 2022-04-22 12:38:49 -04:00
Phil Renaud 15872cc2d4
[ui] Disconnected Clients: "Unknown" allocations in the UI (#12544)
* Unknown status for allocations accounted for

* Canary string removed

* Test cleanup

* Generate unknown in mirage

* aacidentally oovervoowled

* Update ui/app/components/allocation-status-bar.js

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>

* Disconnected state on job status in client

* Renaming Disconnected to Unknown in the job-status-in-client

* Unknown accounted for on job rows filtering and testsfix

* Adding lostAllocs as a computed dependency

* Unknown client status within acceptance test

* Swatches updated and PR comments addressed

* Unknown and disconnected added to test fixtures

Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>
2022-04-22 11:25:02 -04:00
Luiz Aoqui a8cc633156
vault: revert support for entity aliases (#12723)
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
2022-04-22 10:46:34 -04:00
Seth Hoenig 86d90cd906
Merge pull request #12720 from hashicorp/f-arbitrary-addresses
services: enable setting arbitrary address value in service registrations
2022-04-22 09:34:02 -05:00
Seth Hoenig ed71d5db2b services: fix imports 2022-04-22 09:15:51 -05:00
Seth Hoenig c4aab10e53 services: cr followup 2022-04-22 09:14:29 -05:00
Seth Hoenig ae6048eafa services: format ipv6 in nomad service info output
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2022-04-22 09:14:29 -05:00
Seth Hoenig 3fcac242c6 services: enable setting arbitrary address value in service registrations
This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.
2022-04-22 09:14:29 -05:00
Tim Gross f7d6841dd2
E2E: remove platform specific realpath code from UI run script (#12750)
We don't need the absolute path for any of the commands in this script
so long as we `cd` into the source directory path. Doing this removes
the need for weird platform-specific tricks we have to do with
realpath vs GNU realpath.
2022-04-22 10:10:18 -04:00
James Rasell b5d10bcece
docs: add upgrade note for Consul implicit constraint. (#12749) 2022-04-22 15:53:27 +02:00
Tim Gross 3fc1b67396
CSI: handle nil topologies safely in command line (#12751) 2022-04-22 09:25:04 -04:00
Tim Gross 7dd3910e51
E2E: fix debug logging on disconnected clients test (#12621) 2022-04-22 09:07:05 -04:00
James Rasell 046831466c
cli: add pagination flags to service info command. (#12730) 2022-04-22 10:32:40 +02:00
Tim Gross d200a66509
E2E: make UIs runnable from any working directory (#12739)
The E2E test runner is running from the root of the Nomad
repository. Make this run independent of the working directory for
convenience of developers and the test runner.
2022-04-21 17:00:01 -04:00