Commit Graph

4424 Commits

Author SHA1 Message Date
hc-github-team-nomad-core 922512677e
backport of commit b55dcb39672e00686b70b1ec5f6b99c66c5397ce (#18841)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-10-24 08:08:35 +01:00
hc-github-team-nomad-core 63c2013ec1
backport of commit ca9e08e6b5eee00d055b9429df5976a70cdcb2d6 (#18813)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-10-20 08:35:54 +01:00
hc-github-team-nomad-core 897bcef932
backport: do not embed *Server (#18786) (#18789)
these structs embedding Server, then Server _also embedding them_,
confused my IDE, isn't necessary, and just feels wrong!
2023-10-17 15:55:56 -05:00
Daniel Bennett b6298dc073
Only generate default workload identity once per alloc task - 1.6.x (#18783)
this can save a bit of cpu when
running plans for tasks that already exist,
and prevents Nomad tokens from changing,
which can cause task template{}s to restart
unnecessarily.
2023-10-17 13:06:20 -05:00
hc-github-team-nomad-core c96ca6f81c
vault: use an importable const for Vault header string. (#18740) (#18750)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-10-13 08:11:54 +01:00
hc-github-team-nomad-core f6900307e8
backport deps: remove Vault SDK into release/1.6.x (#18727)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2023-10-11 12:02:48 -04:00
hc-github-team-nomad-core 7725931942
backport of commit 9c57ddd8383c2302884272d0b01b034e2509f194 (#18714)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-10-10 10:15:23 +01:00
hc-github-team-nomad-core ce6c86a057
backport of commit e7136f80c5c1277ea2dea4eeeda84005224d7835 (#18648)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-10-03 12:27:26 -05:00
Michael Schurter 547a95795a client: prevent using stale allocs (#18601)
Similar to #18269, it is possible that even if Node.GetClientAllocs
retrieves fresh allocs that the subsequent Alloc.GetAllocs call
retrieves stale allocs. While `diffAlloc(existing, updated)` properly
ignores stale alloc *updates*, alloc deletions have no such check.

So if a client retrieves an alloc created at index 123, and then a
subsequent Alloc.GetAllocs call hits a new server which returns results
at index 100, the client will stop the alloc created at 123 because it
will be missing from the stale response.

This change applies the same logic as #18269 and ensures only fresh
responses are used.

Glossary:
* fresh - modified at an index > the query index
* stale - modified at an index <= the query index
2023-09-29 14:34:04 -07:00
Phil Renaud bfba4f5e13
[ui] ACL Roles in the UI, plus Role, Policy and Token management (#17770) (#18599)
* Rename pages to include roles

* Models and adapters

* [ui] Any policy checks in the UI now check for roles' policies as well as token policies (#18346)

* combinedPolicies as a concept

* Classic decorator on role adapter

* We added a new request for roles, so the test based on a specific order of requests got fickle fast

* Mirage roles cluster scaffolded

* Acceptance test for roles and policies on the login page

* Update mirage mock for nodes fetch to account for role policies / empty token.policies

* Roles-derived policies checks

* [ui] Access Control with Roles and Tokens (#18413)

* top level policies routes moved into access control

* A few more routes and name cleanup

* Delog and test fixes to account for new url prefix and document titles

* Overview page

* Tokens and Roles routes

* Tokens helios table

* Add a role

* Hacky role page and deletion

* New policy keyboard shortcut and roles breadcrumb nav

* If you leave New Role but havent made any changes, remove the newly-created record from store

* Roles index list and general role route crud

* Roles index actually links to roles now

* Helios button styles for new roles and policies

* Handle when you try to create a new role without having any policies

* Token editing generally

* Create Token functionality

* Cant delete self-token but management token editing and deleting is fine

* Upgrading helios caused codemirror to explode, shimmed

* Policies table fix

* without bang-element condition, modifier would refire over and over

* Token TTL or Time setting

* time will take you on

* Mirage hooks for create and list roles

* Ensure policy names only use allow characters in mirage mocks

* Mirage mocked roles and policies in the default cluster

* log and lintfix

* chromedriver to 2.1.2

* unused unit tests removed

* Nice profile dropdown

* With the HDS accordion, rename our internal component scss ref

* design revisions after discussion

* Tooltip on deleted-policy tokens

* Two-step button peripheral isDeleting gcode removed

* Never to null on token save

* copywrite headers added and empty routefiles removed

* acceptance test fixes for policies endpoint

* Route for updating a token

* Policies testfixes

* Ember on-click-outside modifier upgraded with general ember-modifier upgrade

* Test adjustments to account for new profile header dropdown

* Test adjustments for tokens via policy pages

* Removed an unused route

* Access Control index page tests

* a11y tests

* Tokens index acceptance tests generally

* Lintfix

* Token edit page tests

* Token editing tests

* New token expiration tests

* Roles Index tests

* Role editing policies tests

* A complete set of Access Control Roles tests

* Policies test

* Be more specific about which row to check for expiration time

* Nil check on expirationTime equality

* Management tokens shouldnt show No Roles/Policies, give them their own designation

* Route guard on selftoken, conditional columns, and afterModel at parent to prevent orphaned policies on tokens/roles from stopping a new save

* Policy unloading on delete and other todos plus autofocus conditionally re-enabled

* Invalid policies non-links now a concept for Roles index

* HDS style links to make job.variables.alert links look like links again

* Mirage finding looks weird so making model async in hash even though redundant

* Drop rsvp

* RSVP wasnt the problem, cached lookups were

* remove old todo comments

* de-log
2023-09-27 17:02:48 -04:00
hc-github-team-nomad-core 3cc387749e
backport of commit 9b74e11f064ecc53a53f13e82419927b533a9e4a (#18589)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-09-26 15:23:35 -05:00
hc-github-team-nomad-core a2f56797a0
backport of commit 4895d708b438b42e52fd54a128f9ec4cb6d72277 (#18531)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-09-18 14:29:29 -05:00
hc-github-team-nomad-core c7b1966565
backport of commit 1339599185af9dbfcca6f0aa1001c6753b8c682b (#18517)
Co-authored-by: Gerard Nguyen <nguyenvanthao1991@gmail.com>
2023-09-15 09:16:38 -04:00
hc-github-team-nomad-core 46b4847885
backport of commit c6dbba7cde911bb08f1f8da445a44a0125cd2047 (#18505)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-09-14 14:38:05 -05:00
hc-github-team-nomad-core ea59b59035
backport of commit 532911c3803b51ad093f682a5dc0ded13b636751 (#18472)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-13 08:55:29 +01:00
hc-github-team-nomad-core 2ef7a280b0
backport of commit d923fc554d09ceb51b530467a354860b25114fd3 (#18450)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-11 16:21:44 +01:00
hc-github-team-nomad-core 552dad53f5
test: fix name of state service registration test file. (#18406) (#18422)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-07 11:00:13 +01:00
hc-github-team-nomad-core 286164288d
backport of commit b6f6541f5024fb847a230ecd612a15f4a4fc05a4 (#18416)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-07 09:14:56 +01:00
hc-github-team-nomad-core a9441eb589
backport of commit c28cd59655b88d64f841bed5da7996ae8d88772f (#18411)
Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
2023-09-06 11:32:42 -05:00
hc-github-team-nomad-core 1b2237d6a8
backport of commit 776a26bce7cf3a320fc7e7f4a6bf9da2b30f3da7 (#18375)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-09-01 10:25:08 +01:00
hc-github-team-nomad-core b93dc92ec2
backport of commit f187afab9f06b7489f7103c3e3c8eed72f210621 (#18350)
Co-authored-by: Gerard Nguyen <nguyenvanthao1991@gmail.com>
2023-08-28 19:14:45 -04:00
hc-github-team-nomad-core b0bece8a18
backport of commit da830b10463f1cc0a704ec4a4f66e35d4324d728 (#18337)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-08-25 21:36:35 -04:00
James Rasell 3730b66d8c
test: use correct parallel test setup func (#18326) (#18330) 2023-08-25 14:48:06 +01:00
hc-github-team-nomad-core 4ebd0d251f
backport of commit f7a336d2ba95a362504d6094e581b8aeedbd554e (#18323)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-08-25 09:37:13 +01:00
hc-github-team-nomad-core e4c7388608
backport of commit 3e61b3a37df9ff0836b52ba5440106ad0f607dd7 (#18294)
Co-authored-by: Андрей Неустроев <99169437+aneustroev@users.noreply.github.com>
2023-08-22 16:01:24 -04:00
James Rasell 1397ec4ad6
nomad: remove custom max func and use Go 1.21.0 builtin (#18237) (#18251) 2023-08-17 16:17:43 +01:00
Luiz Aoqui d910457159 csi: prevent panic on volume delete (#18234)
When a CSI volume is deleted while its plugin is not running, the
function `volAndPluginLookup` returns a `nil` plugin value resulting in a
panic in the request handler.
2023-08-17 10:09:45 -04:00
Tim Gross 0a19fe3b60 fix multiple overflow errors in exponential backoff (#18200)
We use capped exponential backoff in several places in the code when handling
failures. The code we've copy-and-pasted all over has a check to see if the
backoff is greater than the limit, but this check happens after the bitshift and
we always increment the number of attempts. This causes an overflow with a
fairly small number of failures (ex. at one place I tested it occurs after only
24 iterations), resulting in a negative backoff which then never recovers. The
backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC
handler or an external API such as Vault. Note this doesn't occur in places
where we cap the number of iterations so the loop breaks (usually to return an
error), so long as the number of iterations is reasonable.

Introduce a helper with a check on the cap before the bitshift to avoid overflow in all 
places this can occur.

Fixes: #18199
Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>
2023-08-15 14:39:09 -04:00
Seth Hoenig a45b689d8e update go1.21 (#18184)
* build: update to go1.21

* go: eliminate helpers in favor of min/max

* build: run go mod tidy

* build: swap depguard for semgrep

* command: fixup broken tls error check on go1.21
2023-08-15 14:40:33 +02:00
Tim Gross a3a86a849a test: deflake node drain intergration test (#18171)
The `TestDrainer_AllTypes_NoDeadline` test has been flaky. It looks like this
might be because the final update of batch allocations to complete is improperly
updating the state store directly rather than by RPC. If the service jobs have
restarted in the meantime, the `allocClientStateSimulator` will have updated the
index on the allocations table and that will prevent the drainer from
unblocking (and being marked complete) when the batch jobs are written with an
earlier index.

This changeset attempts to fix that by making the update via RPC (as it normally
would be in real code).
2023-08-14 16:19:00 -04:00
Tim Gross 577d96034d test: deflake job endpoint registration test (#18170)
We've seen test flakiness in the `TestJobEndpoint_Register_NonOverlapping` test,
which asserts that we don't try to placed allocations for blocked evals until
resources have been actually freed by setting the client status of the previous
alloc to complete.

The flaky assertion includes sorting the two allocations by CreateIndex and this
appears to be a non-stable sort in the context of the test run, which results in
failures that shouldn't exist. There's no reason to sort the allocations instead
of just examining them by ID. This changeset does so.
2023-08-14 16:18:53 -04:00
Esteban Barrios 9f19d7c373 config: add configurable content security policy (#18085) 2023-08-14 14:25:21 -04:00
hc-github-team-nomad-core f812bccb4e
Backport of Tuning job versions retention. #17635 into release/1.6.x (#18169)
This pull request was automerged via backport-assistant
2023-08-07 13:48:09 -05:00
hc-github-team-nomad-core ebcdd4d82d
backport of commit 65501ff97aa2ec6fa3c4f53d3f8c6c80c6a0e8a3 (#18166)
This pull request was automerged via backport-assistant
2023-08-07 10:17:34 -05:00
Charlie Voiselle bac4d112d1 [dep] bump golang.org/x/exp (#18102)
There are some refactorings that have to be made in the getter and state
where the api changed in `slices`

* Bump golang.org/x/exp
* Bump golang.org/x/exp in api
* Update job_endpoint_test
* [feedback] unexport sort function
2023-08-03 15:14:39 -04:00
hc-github-team-nomad-core 2ed92e0c6c
Backport of feature: Add new field render_templates on restart block into release/1.6.x (#18094)
This pull request was automerged via backport-assistant
2023-07-28 13:54:00 -05:00
hc-github-team-nomad-core 4f087674f4
backport of commit 7fe432042eaa0a97c0aaa40d302055eb18e8a9b0 (#18040)
This pull request was automerged via backport-assistant
2023-07-24 02:28:28 -05:00
hc-github-team-nomad-core 30260f06e8
Backport of state: canonicalize namespace on restore into release/1.6.x (#18018)
This pull request was automerged via backport-assistant
2023-07-20 15:05:16 -05:00
hc-github-team-nomad-core e891026755
Backport of CSI: improve controller RPC reliability into release/1.6.x (#18015)
This pull request was automerged via backport-assistant
2023-07-20 13:52:27 -05:00
hc-github-team-nomad-core c67a225882 Prepare for next release 2023-07-18 18:51:15 +00:00
hc-github-team-nomad-core 609a97cfab Generate files for 1.6.0 release 2023-07-18 18:51:11 +00:00
Tim Gross e8bfef8148 search: fix ACL filtering for plugins and variables
ACL permissions for the search endpoints are done in three passes. The
first (the `sufficientSearchPerms` method) is for performance and coarsely
rejects requests based on the passed-in context parameter if the user has no
permissions to any object in that context. The second (the
`filteredSearchContexts` method) filters out contexts based on whether the user
has permissions either to the requested namespace or again by context (to catch
the "all" context). Finally, when iterating over the objects available, we do
the usual filtering in the iterator.

Internal testing found several bugs in this filtering:
* CSI plugins can be searched by any authenticated user.
* Variables can be searched if the user has `job:read` permissions to the
  variable's namespace instead of `variable:list`.
* Variables cannot be searched by wildcard namespace.

This is an information leak of the plugin names and variable paths, which we
don't consider to be privileged information but intended to protect anyways.

This changeset fixes these bugs by ensuring CSI plugins are filtered in the 1st
and 2nd pass ACL filters, and changes variables to check `variable:list` in the
2nd pass filter unless the wildcard namespace is passed (at which point we'll
fallback to filtering in the iterator).

Fixes: CVE-2023-3300
Fixes: #17906
2023-07-18 12:09:55 -04:00
hc-github-team-nomad-core 0951fe1c50
backport of commit 0a5e90120b18ff450457463d6bcee68ec6804bb0 (#17900)
This pull request was automerged via backport-assistant
2023-07-11 10:00:05 -05:00
Lance Haig 0455389534
Add the ability to customise the details of the CA (#17309)
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
2023-07-11 08:53:09 +01:00
Tim Gross ad7355e58b
CSI: persist previous mounts on client to restore during restart (#17840)
When claiming a CSI volume, we need to ensure the CSI node plugin is running
before we send any CSI RPCs. This extends even to the controller publish RPC
because it requires the storage provider's "external node ID" for the
client. This primarily impacts client restarts but also is a problem if the node
plugin exits (and fingerprints) while the allocation that needs a CSI volume
claim is being placed.

Unfortunately there's no mapping of volume to plugin ID available in the
jobspec, so we don't have enough information to wait on plugins until we either
get the volume from the server or retrieve the plugin ID from data we've
persisted on the client.

If we always require getting the volume from the server before making the claim,
a client restart for disconnected clients will cause all the allocations that
need CSI volumes to fail. Even while connected, checking in with the server to
verify the volume's plugin before trying to make a claim RPC is inherently racy,
so we'll leave that case as-is and it will fail the claim if the node plugin
needed to support a newly-placed allocation is flapping such that the node
fingerprint is changing.

This changeset persists a minimum subset of data about the volume and its plugin
in the client state DB, and retrieves that data during the CSI hook's prerun to
avoid re-claiming and remounting the volume unnecessarily.

This changeset also updates the RPC handler to use the external node ID from the
claim whenever it is available.

Fixes: #13028
2023-07-10 13:20:15 -04:00
Tim Gross 5025731ebe
consul: handle "not found" errors from Consul when deleting tokens (#17847)
In Consul 1.15.0, the Delete Token API was changed so as to return an error when
deleting a non-existent ACL token. This means that if Nomad successfully deletes
the token but fails to persist that fact, it will get stuck trying to delete a
non-existent token forever.

Update the token deletion function to ignore "not found" errors and treat them
as successful deletions.

Fixes: #17833
2023-07-07 16:22:13 -04:00
James Rasell 45073e8a05
job: ensure node pool is canonicalized for state restores. (#17765) 2023-06-30 07:37:22 +01:00
nicoche 649831c1d3
deploymentwatcher: fail early whenever possible (#17341)
Given a deployment that has a `progress_deadline`, if a task group runs
out of reschedule attempts, allow it to fail at this time instead of
waiting until the `progress_deadline` is reached.

Fixes: #17260
2023-06-26 14:01:03 -04:00
James Rasell 74ab0badb4
test: add drain config tests. (#17724) 2023-06-26 16:23:13 +01:00
Luiz Aoqui 66962b2b28
np: fix list of jobs for node pool `all` (#17705)
Unlike nodes, jobs are allowed to be registered in the node pool `all`,
in which case all nodes are used for evaluating placements. When listing
jobs for the `all` node pool only those that are explicitly in this node
pool should be returned.
2023-06-23 15:47:53 -04:00