Commit graph

24550 commits

Author SHA1 Message Date
James Rasell 4d2c1403c2
scale: do not allow scaling of jobs with type system. (#16969) 2023-04-25 15:47:44 +01:00
Seth Hoenig f221e99572
tools: update dependencies and use tree set (#16974)
* tools: bump go mod deps for tools module

* tools: use treeset in tools/missing
2023-04-25 07:47:19 -05:00
Phil Renaud 7dbebe9a93
[ui, feature] Job Page Redesign (#16932)
* [ui] Service job status panel (#16134)

* it begins

* Hacky demo enabled

* Still very hacky but seems deece

* Floor of at least 3 must be shown

* Width from on-high

* Other statuses considered

* More sensible allocTypes listing

* Beginnings of a legend

* Total number of allocs running now maps over job.groups

* Lintfix

* base the number of slots to hold open on actual tallies, which should never exceed totalAllocs

* Versions get yer versions here

* Versions lookin like versions

* Mirage fixup

* Adds Remaining as an alloc chart status and adds historical status option

* Get tests passing again by making job status static for a sec

* Historical status panel click actions moved into their own component class

* job detail tests plz chill

* Testing if percy is fickle

* Hyper-specfic on summary distribution bar identifier

* Perhaps the 2nd allocSummary item no longer exists with the more accurate afterCreate data

* UI Test eschewing the page pattern

* Bones of a new acceptance test

* Track width changes explicitly with window-resize

* testlintfix

* Alloc counting tests

* Alloc grouping test

* Alloc grouping with complex resizing

* Refined the list of showable statuses

* PR feedback addressed

* renamed allocation-row to allocation-status-row

* [ui, job status] Make panel status mode a queryParam (#16345)

* queryParam changing

* Test for QP in panel

* Adding @tracked to legacy controller

* Move the job of switching to Historical out to larger context

* integration test mock passed func

* [ui] Service job deployment status panel (#16383)

* A very fast and loose deployment panel

* Removing Unknown status from the panel

* Set up oldAllocs list in constructor, rather than as a getter/tracked var

* Small amount of template cleanup

* Refactored latest-deployment new logic back into panel.js

* Revert now-unused latest-deployment component

* margin bottom when ungrouped also

* Basic integration tests for job deployment status panel

* Updates complete alloc colour to green for new visualizations only (#16618)

* Updates complete alloc colour to green for new visualizations only

* Pale green instead of dark green for viz in general

* [ui] Job Deployment Status: History and Update Props (#16518)

* Deployment history wooooooo

* Styled deployment history

* Update Params

* lintfix

* Types and groups for updateParams

* Live-updating history

* Harden with types, error states, and pending states

* Refactor updateParams to use trigger component

* [ui] Deployment History search (#16608)

* Functioning searchbox

* Some nice animations for history items

* History search test

* Fixing up some old mirage conventions

* some a11y rule override to account for scss keyframes

* Split panel into deploying and steady components

* HandleError passed from job index

* gridified panel elements

* TotalAllocs added to deploying.js

* Width perc to px

* [ui] Splitting deployment allocs by status, health, and canary status (#16766)

* Initial attempt with lots of scratchpad work

* Style mods per UI discussion

* Fix canary overflow bug

* Dont show canary or health for steady/prev-alloc blocks

* Steady state

* Thanks Julie

* Fixes steady-state versions

* Legen, wait for it...

* Test fixes now that we have a minimum block size

* PR prep

* Shimmer effect on pending and unplaced allocs (#16801)

* Shimmer effect on pending and unplaced

* Dont show animation in the legend

* [ui, deployments] Linking allocblocks and legends to allocation / allocations index routes (#16821)

* Conditional link-to component and basic linking to allocations and allocation routes

* Job versions filter added to allocations index page

* Steady state legends link

* Legend links

* Badge count links for versions

* Fix: faded class on steady-state legend items

* version link now wont show completed ones

* Fix a11y violations with link labels

* Combining some template conditional logic

* [ui, deployments] Conversions on long nanosecond update params (#16882)

* Conversions on long nanosecond nums

* Early return in updateParamGroups comp prop

* [ui, deployments] Mirage Actively Deploying Job and Deployment Integration Tests (#16888)

* Start of deployment alloc test scaffolding

* Bit of test cleanup and canary for ungrouped allocs

* Flakey but more robust integrations for deployment panel

* De-flake acceptance tests and add an actively deploying job to mirage

* Jitter-less alloc status distribution removes my bad math

* bugfix caused by summary.desiredTotal non-null

* More interesting mirage active deployment alloc breakdown

* Further tests for previous-allocs row

* Previous alloc legend tests

* Percy snapshots added to integration test

* changelog
2023-04-24 22:45:39 -04:00
Daniel Bennett 2c63d34296
Demo: NFS CSI Plugins (#16875)
Demo (and easily reproduce, locally) a CSI setup
with separate controller and node plugins.

This runs NFS in a container backed by a host volume
and CSI controller and node plugins from rocketduck:
  gitlab.com/rocketduck/csi-plugin-nfs

Co-authored-by: Florian Apolloner <florian@apolloner.eu>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-04-24 15:08:48 -05:00
Seth Hoenig 753c17c9de
services: un-mark group services as deregistered if restart hook runs (#16905)
* services: un-mark group services as deregistered if restart hook runs

This PR may fix a bug where group services will never be deregistered if the
group undergoes a task restart.

* e2e: add test case for restart and deregister group service

* cl: add cl

* e2e: add wait for service list call
2023-04-24 14:24:51 -05:00
dependabot[bot] 1633cab363
build(deps): bump github.com/shoenig/test from 0.6.3 to 0.6.4 in /api (#16895)
Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.6.3 to 0.6.4.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.6.3...v0.6.4)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-24 11:39:37 -05:00
Seth Hoenig e27f95448c
copywrite: excempt example assets from copywrite headers (#16971) 2023-04-24 10:36:11 -05:00
dependabot[bot] 801e4f961c
build(deps): bump github.com/hashicorp/vault/api from 1.8.3 to 1.9.1 (#16966)
Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.8.3 to 1.9.1.
- [Release notes](https://github.com/hashicorp/vault/releases)
- [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/vault/compare/v1.8.3...v1.9.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/vault/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-24 09:00:54 -05:00
Tim Gross 72cbe53f19
logs: allow disabling log collection in jobspec (#16962)
Some Nomad users ship application logs out-of-band via syslog. For these users
having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow
disabling the logmon and pointing the task's stdout/stderr to /dev/null.

This changeset is the first of several incremental improvements to log
collection short of full-on logging plugins. The next step will likely be to
extend the internal-only task driver configuration so that cluster
administrators can turn off log collection for the entire driver.

---

Fixes: #11175

Co-authored-by: Thomas Weber <towe75@googlemail.com>
2023-04-24 10:00:27 -04:00
valodzka 379497a484
fix host port handling for ipv6 (#16723) 2023-04-20 19:53:20 -07:00
Etienne Bruines 1e3531b978
cni: fix plugin fingerprinting versions (#16776)
CNI plugins v1.2.0 and above output a second line, containing supported protocol versions.
2023-04-20 18:44:39 -07:00
Luiz Aoqui a1ba068e1f
cli: fix panic on job plan when -diff=false (#16944)
PR #14492 introduced a new check to return 0 when the `nomad job plan`
command returns a diff of type `None`.

But the `-diff` CLI flag was also being used to control whether the plan
request should return the diff of not instead of just controlling if the
diff was printed.

This means that when `-diff=false` is set the response does not include
any diff information, and so the new check panics.

This commit fixes the problem by always requesting a diff and using the
`-diff` only for controlling output, as it's currently documented.
2023-04-20 17:33:29 -07:00
Tim Gross b5a54b3b5f
docs: fix keyring path in install docs (#16946) 2023-04-20 16:20:39 -04:00
astudentofblake 42c4c8d5ea
fix: added landlock access to /usr/libexec for getter (#16900) 2023-04-20 11:16:04 -05:00
Luiz Aoqui b0fe69fded
docs: add missing field Capabilities to Namespace API (#16931) 2023-04-19 08:14:36 -07:00
claire labry d2beea3435
changelog: add changelog update for vendor label for linux packaging (#16071) 2023-04-19 08:14:14 -07:00
Luiz Aoqui c7387dbd3a
docs: add missing API field JobACL and fix workload identity headers (#16930) 2023-04-19 08:12:58 -07:00
Chris van Meer d2f1766f3a
Updates to the UI block (#16328)
1. On the Consul address, following the recommendation for the HTTPS
   API on port 8501.
2. Add the hint to use HEX values for the colors.
2023-04-18 18:28:17 -07:00
Luiz Aoqui fb588fcbb8
allocrunner: prevent panic on network manager (#16921) 2023-04-18 13:39:13 -07:00
Süleyman Vurucu a5967b1bad
Update metric names (#16894)
Dashboard uses old metric names
2023-04-18 13:25:42 -07:00
James Rasell 367cfa6d93
rpc: use "+" concatination in hot path RPC rate limit metrics. (#16923) 2023-04-18 13:41:34 +01:00
Luiz Aoqui 37ddd2dd86
ui: fix notification service in token controller (#16918)
Remove unneeded service injection. This service is not being used in
this controller and currently only exists in `main`, causing
`release/1.5.x` to break.
2023-04-17 20:33:50 -04:00
Luiz Aoqui 8285be09e6
test: fix quota command autocomplete (#16917) 2023-04-17 20:08:55 -04:00
Charlie Voiselle 9e8f2a937c
[scheduler] Honor false for distinct hosts constraint (#16907)
* Honor value for distinct_hosts constraint
* Add test for feasibility checking for `false`
---------
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-04-17 17:43:56 -04:00
Ian Fijolek 619f49afcf
hashicorp/go-msgpack v2 (#16810)
* Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0

Fixes #16808

* Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack

* deps: use go-msgpack v2.0.0

go-msgpack v2.1.0 includes some code changes that we will need to
investigate furthere to assess its impact on Nomad, so keeping this
dependency on v2.0.0 for now since it's no-op.

---------

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-04-17 17:02:05 -04:00
Seth Hoenig 92d4a05534
users: eliminate nobody user memoization (#16904)
This PR eliminates code specific to looking up and caching the uid/gid/user.User
object associated with the nobody user in an init block. This code existed before
adding the generic users cache and was meant to optimize the one search path we
knew would happen often. Now that we have the cache, seems reasonable to eliminate
this init block and use the cache instead like for any other user.

Also fixes a constraint on the podman (and other) drivers, where building without
CGO became problematic on some OS like Fedora IoT where the nobody user cannot
be found with the pure-Go standard library.

Fixes github.com/hashicorp/nomad-driver-podman/issues/228
2023-04-17 12:30:30 -05:00
Tim Gross ee071531de
docs: disable secret scanning for documentation content (#16903)
Examples in the documentation frequently include tokens, including Vault tokens
which end up triggering GitHub's secret scanner. Remove these from consideration
so that we don't get false positive reports.
2023-04-17 10:03:52 -04:00
Tim Gross 04e049caed
license: show Terminated field in license get command (#16892) 2023-04-17 09:01:43 -04:00
Tim Gross 62548616d4
client: allow drain_on_shutdown configuration (#16827)
Adds a new configuration to clients to optionally allow them to drain their
workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting
itself and then monitors the drain state as seen by the server until the drain
is complete or the deadline expires. If it loses connection with the server, it
will monitor local client status instead to ensure allocations are stopped
before exiting.
2023-04-14 15:35:32 -04:00
Phil Renaud 99185e2d8f
[ui, compliance] Remove the newline after .hbs copyright headers (#16861)
* Remove the newline after .hbs copyright headers

* Trying with the whitespace control char
2023-04-14 13:08:13 -04:00
Tim Gross 5a9abdc469
drain: use client status to determine drain is complete (#14348)
If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`,
the node drain is marked as complete prematurely, even though drain monitoring
will continue to report allocation migrations. This impacts the UI or API
clients that monitor node draining to shut down nodes.

This changeset updates the behavior to wait until the client status of all
drained allocs are terminal before marking the node as done draining.
2023-04-13 08:55:28 -04:00
Michael Schurter 79c521e570
docs: add node meta command docs (#16828)
* docs: add node meta command docs

Fixes #16758

* it helps if you actually add the files to git

* fix typos and examples vs usage
2023-04-12 15:29:33 -07:00
Shawn 007b534020
fix: typo (#16873) 2023-04-12 16:18:13 -04:00
Tim Gross 4df2d9bda8
E2E: clarify drain -deadline and -force flag behaviors (#16868)
The `-deadline` and `-force` flag for the `nomad node drain` command only cause
the draining to ignore the `migrate` block's healthy deadline, max parallel,
etc. These flags don't have anything to do with the `kill_timeout` or
`shutdown_delay` options of the jobspec.

This changeset fixes the skipped E2E tests so that they validate the intended
behavior, and updates the docs for more clarity.
2023-04-12 15:27:24 -04:00
Seth Hoenig ec1a8ae12a
deps: update docker to 23.0.3 (#16862)
* [no ci] deps: update docker to 23.0.3

This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby)
up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to
reference the package in its new home, which is docker/docker/libnetwork (it is
no longer an independent repository). Some minor nearby test case cleanup as well.

* add cl
2023-04-12 14:13:36 -05:00
Seth Hoenig dbb6edd96d
e2e: add e2e tests for job submission api (#16841)
* e2e: add e2e tests for job submission api

* e2e: fixup callers of AllocLogs

* fix typo
2023-04-12 08:36:17 -05:00
James Rasell b7a41fe48d
core: ensure all Server receiver names are consistent. (#16859) 2023-04-12 14:03:07 +01:00
Juana De La Cuesta 8302085384
Deployment Status Command Does Not Respect -namespace Wildcard (#16792)
* func: add namespace support for list deployment

* func: add wildcard to namespace filter for deployments

* Update deployment_endpoint.go

* style: use must instead of require or asseert

* style: rename paginator to avoid clash with import

* style: add changelog entry

* fix: add missing parameter for upsert jobs
2023-04-12 11:02:14 +02:00
Seth Hoenig f615bd92e5
deps: bump build and linter tool versions (#16858)
This just helps keep up with the podman driver repo.
2023-04-11 17:01:14 -05:00
Tim Gross 657ae6f7d2
docs: document signal handling (#16835)
Expand documentation about Nomad's signal handling behaviors, including removing
incorrect information about graceful client shutdowns.
2023-04-11 16:26:39 -04:00
Tim Gross a9a350cfdb
drainer: fix codec race condition in integration test (#16845)
msgpackrpc codec handles are specific to a connection and cannot be shared
between goroutines; this can cause corrupted decoding. Fix the drainer
integration test so that we create separate codecs for the goroutines that the
test helper spins up to simulate client updates.

This changeset also refactors the drainer integration test to bring it up to
current idioms and library usages, make assertions more clear, and reduce
duplication.
2023-04-11 14:31:13 -04:00
James Rasell bc01d47071
consul/connect: fixed a bug where restarting proxy tasks failed. (#16815)
The first start of a Consul Connect proxy sidecar triggers a run
of the envoy_version hook which modifies the task config image
entry. The modification takes into account a number of factors to
correctly populate this. Importantly, once the hook has run, it
marks itself as done so the taskrunner will not execute it again.

When the client receives a non-destructive update for the
allocation which the proxy sidecar is a member of, it will update
and overwrite the task definition within the taskerunner. In doing
so it overwrite the modification performed by the hook. If the
allocation is restarted, the envoy_version hook will be skipped as
it previously marked itself as done, and therefore the sidecar
config image is incorrect and causes a driver error.

The fix removes the hook in marking itself as done to the view of
the taskrunner.
2023-04-11 15:56:03 +01:00
Seth Hoenig ba728f8f97
api: enable support for setting original job source (#16763)
* api: enable support for setting original source alongside job

This PR adds support for setting job source material along with
the registration of a job.

This includes a new HTTP endpoint and a new RPC endpoint for
making queries for the original source of a job. The
HTTP endpoint is /v1/job/<id>/submission?version=<version> and
the RPC method is Job.GetJobSubmission.

The job source (if submitted, and doing so is always optional), is
stored in the job_submission memdb table, separately from the
actual job. This way we do not incur overhead of reading the large
string field throughout normal job operations.

The server config now includes job_max_source_size for configuring
the maximum size the job source may be, before the server simply
drops the source material. This should help prevent Bad Things from
happening when huge jobs are submitted. If the value is set to 0,
all job source material will be dropped.

* api: avoid writing var content to disk for parsing

* api: move submission validation into RPC layer

* api: return an error if updating a job submission without namespace or job id

* api: be exact about the job index we associate a submission with (modify)

* api: reword api docs scheduling

* api: prune all but the last 6 job submissions

* api: protect against nil job submission in job validation

* api: set max job source size in test server

* api: fixups from pr
2023-04-11 08:45:08 -05:00
Piotr Kazmierczak dea8b1a093
acl: bump JWT auth gate to 1.5.4 (#16838) 2023-04-11 10:07:45 +02:00
Michael Schurter a407cbb626
Merge pull request #16836 from hashicorp/compliance/add-headers
[COMPLIANCE] Add Copyright and License Headers
2023-04-10 16:32:03 -07:00
Daniel Bennett fa33ee567a
gracefully recover tasks that use csi node plugins (#16809)
new WaitForPlugin() called during csiHook.Prerun,
so that on startup, clients can recover running
tasks that use CSI volumes, instead of them being
terminated and rescheduled because they need a
node plugin that is "not found" *yet*, only because
the plugin task has not yet been recovered.
2023-04-10 17:15:33 -05:00
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Tim Gross 1335543731
ephemeral disk: migrate should imply sticky (#16826)
The `ephemeral_disk` block's `migrate` field allows for best-effort migration of
the ephemeral disk data to new nodes. The documentation says the `migrate` field
is only respected if `sticky=true`, but in fact if client ACLs are not set the
data is migrated even if `sticky=false`.

The existing behavior when client ACLs are disabled has existed since the early
implementation, so "fixing" that case now would silently break backwards
compatibility. Additionally, having `migrate` not imply `sticky` seems
nonsensical: it suggests that if we place on a new node we migrate the data but
if we place on the same node, we throw the data away!

Update so that `migrate=true` implies `sticky=true` as follows:

* The failure mode when client ACLs are enabled comes from the server not passing
  along a migration token. Update the server so that the server provides a
  migration token whenever `migrate=true` and not just when `sticky=true` too.
* Update the scheduler so that `migrate` implies `sticky`.
* Update the client so that we check for `migrate || sticky` where appropriate.
* Refactor the E2E tests to move them off the old framework and make the intention
  of the test more clear.
2023-04-07 16:33:45 -04:00
Tim Gross e7eae66cf1
E2E: update subset of node drain tests off the old framework (#16823)
While working on several open drain issues, I'm fixing up the E2E tests. This
subset of tests being refactored are existing ones that already work. I'm
shipping these as their own PR to keep review sizes manageable when I push up
PRs in the next few days for #9902, #12314, and #12915.
2023-04-07 09:17:19 -04:00
Michael Schurter a8b379f962
docker: default device.container_path to host_path (#16811)
* docker: default device.container_path to host_path

Matches docker cli behavior.

Fixes #16754
2023-04-06 14:44:33 -07:00