open-nomad

Commit Graph

Author	SHA1	Message	Date
hc-github-team-nomad-core	922512677e	backport of commit b55dcb39672e00686b70b1ec5f6b99c66c5397ce (#18841 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-10-24 08:08:35 +01:00
hc-github-team-nomad-core	63c2013ec1	backport of commit ca9e08e6b5eee00d055b9429df5976a70cdcb2d6 (#18813 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-10-20 08:35:54 +01:00
hc-github-team-nomad-core	897bcef932	backport: do not embed *Server (#18786 ) (#18789 ) these structs embedding Server, then Server _also embedding them_, confused my IDE, isn't necessary, and just feels wrong!	2023-10-17 15:55:56 -05:00
Daniel Bennett	b6298dc073	Only generate default workload identity once per alloc task - 1.6.x (#18783 ) this can save a bit of cpu when running plans for tasks that already exist, and prevents Nomad tokens from changing, which can cause task template{}s to restart unnecessarily.	2023-10-17 13:06:20 -05:00
hc-github-team-nomad-core	c96ca6f81c	vault: use an importable const for Vault header string. (#18740 ) (#18750 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-10-13 08:11:54 +01:00
hc-github-team-nomad-core	f6900307e8	backport deps: remove Vault SDK into release/1.6.x (#18727 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-10-11 12:02:48 -04:00
hc-github-team-nomad-core	7725931942	backport of commit 9c57ddd8383c2302884272d0b01b034e2509f194 (#18714 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-10-10 10:15:23 +01:00
hc-github-team-nomad-core	ce6c86a057	backport of commit e7136f80c5c1277ea2dea4eeeda84005224d7835 (#18648 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-10-03 12:27:26 -05:00
Michael Schurter	547a95795a	client: prevent using stale allocs (#18601 ) Similar to #18269, it is possible that even if Node.GetClientAllocs retrieves fresh allocs that the subsequent Alloc.GetAllocs call retrieves stale allocs. While `diffAlloc(existing, updated)` properly ignores stale alloc updates, alloc deletions have no such check. So if a client retrieves an alloc created at index 123, and then a subsequent Alloc.GetAllocs call hits a new server which returns results at index 100, the client will stop the alloc created at 123 because it will be missing from the stale response. This change applies the same logic as #18269 and ensures only fresh responses are used. Glossary: * fresh - modified at an index > the query index * stale - modified at an index <= the query index	2023-09-29 14:34:04 -07:00
Phil Renaud	bfba4f5e13	[ui] ACL Roles in the UI, plus Role, Policy and Token management (#17770 ) (#18599 ) * Rename pages to include roles * Models and adapters * [ui] Any policy checks in the UI now check for roles' policies as well as token policies (#18346) * combinedPolicies as a concept * Classic decorator on role adapter * We added a new request for roles, so the test based on a specific order of requests got fickle fast * Mirage roles cluster scaffolded * Acceptance test for roles and policies on the login page * Update mirage mock for nodes fetch to account for role policies / empty token.policies * Roles-derived policies checks * [ui] Access Control with Roles and Tokens (#18413) * top level policies routes moved into access control * A few more routes and name cleanup * Delog and test fixes to account for new url prefix and document titles * Overview page * Tokens and Roles routes * Tokens helios table * Add a role * Hacky role page and deletion * New policy keyboard shortcut and roles breadcrumb nav * If you leave New Role but havent made any changes, remove the newly-created record from store * Roles index list and general role route crud * Roles index actually links to roles now * Helios button styles for new roles and policies * Handle when you try to create a new role without having any policies * Token editing generally * Create Token functionality * Cant delete self-token but management token editing and deleting is fine * Upgrading helios caused codemirror to explode, shimmed * Policies table fix * without bang-element condition, modifier would refire over and over * Token TTL or Time setting * time will take you on * Mirage hooks for create and list roles * Ensure policy names only use allow characters in mirage mocks * Mirage mocked roles and policies in the default cluster * log and lintfix * chromedriver to 2.1.2 * unused unit tests removed * Nice profile dropdown * With the HDS accordion, rename our internal component scss ref * design revisions after discussion * Tooltip on deleted-policy tokens * Two-step button peripheral isDeleting gcode removed * Never to null on token save * copywrite headers added and empty routefiles removed * acceptance test fixes for policies endpoint * Route for updating a token * Policies testfixes * Ember on-click-outside modifier upgraded with general ember-modifier upgrade * Test adjustments to account for new profile header dropdown * Test adjustments for tokens via policy pages * Removed an unused route * Access Control index page tests * a11y tests * Tokens index acceptance tests generally * Lintfix * Token edit page tests * Token editing tests * New token expiration tests * Roles Index tests * Role editing policies tests * A complete set of Access Control Roles tests * Policies test * Be more specific about which row to check for expiration time * Nil check on expirationTime equality * Management tokens shouldnt show No Roles/Policies, give them their own designation * Route guard on selftoken, conditional columns, and afterModel at parent to prevent orphaned policies on tokens/roles from stopping a new save * Policy unloading on delete and other todos plus autofocus conditionally re-enabled * Invalid policies non-links now a concept for Roles index * HDS style links to make job.variables.alert links look like links again * Mirage finding looks weird so making model async in hash even though redundant * Drop rsvp * RSVP wasnt the problem, cached lookups were * remove old todo comments * de-log	2023-09-27 17:02:48 -04:00
hc-github-team-nomad-core	3cc387749e	backport of commit 9b74e11f064ecc53a53f13e82419927b533a9e4a (#18589 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-26 15:23:35 -05:00
hc-github-team-nomad-core	a2f56797a0	backport of commit 4895d708b438b42e52fd54a128f9ec4cb6d72277 (#18531 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-18 14:29:29 -05:00
hc-github-team-nomad-core	c7b1966565	backport of commit 1339599185af9dbfcca6f0aa1001c6753b8c682b (#18517 ) Co-authored-by: Gerard Nguyen <nguyenvanthao1991@gmail.com>	2023-09-15 09:16:38 -04:00
hc-github-team-nomad-core	46b4847885	backport of commit c6dbba7cde911bb08f1f8da445a44a0125cd2047 (#18505 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-14 14:38:05 -05:00
hc-github-team-nomad-core	ea59b59035	backport of commit 532911c3803b51ad093f682a5dc0ded13b636751 (#18472 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-13 08:55:29 +01:00
hc-github-team-nomad-core	2ef7a280b0	backport of commit d923fc554d09ceb51b530467a354860b25114fd3 (#18450 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-11 16:21:44 +01:00
hc-github-team-nomad-core	552dad53f5	test: fix name of state service registration test file. (#18406 ) (#18422 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-07 11:00:13 +01:00
hc-github-team-nomad-core	286164288d	backport of commit b6f6541f5024fb847a230ecd612a15f4a4fc05a4 (#18416 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-07 09:14:56 +01:00
hc-github-team-nomad-core	a9441eb589	backport of commit c28cd59655b88d64f841bed5da7996ae8d88772f (#18411 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-06 11:32:42 -05:00
hc-github-team-nomad-core	1b2237d6a8	backport of commit 776a26bce7cf3a320fc7e7f4a6bf9da2b30f3da7 (#18375 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-01 10:25:08 +01:00
hc-github-team-nomad-core	b93dc92ec2	backport of commit f187afab9f06b7489f7103c3e3c8eed72f210621 (#18350 ) Co-authored-by: Gerard Nguyen <nguyenvanthao1991@gmail.com>	2023-08-28 19:14:45 -04:00
hc-github-team-nomad-core	b0bece8a18	backport of commit da830b10463f1cc0a704ec4a4f66e35d4324d728 (#18337 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-08-25 21:36:35 -04:00
James Rasell	3730b66d8c	test: use correct parallel test setup func (#18326 ) (#18330 )	2023-08-25 14:48:06 +01:00
hc-github-team-nomad-core	4ebd0d251f	backport of commit f7a336d2ba95a362504d6094e581b8aeedbd554e (#18323 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-08-25 09:37:13 +01:00
hc-github-team-nomad-core	e4c7388608	backport of commit 3e61b3a37df9ff0836b52ba5440106ad0f607dd7 (#18294 ) Co-authored-by: Андрей Неустроев <99169437+aneustroev@users.noreply.github.com>	2023-08-22 16:01:24 -04:00
James Rasell	1397ec4ad6	nomad: remove custom max func and use Go 1.21.0 builtin (#18237 ) (#18251 )	2023-08-17 16:17:43 +01:00
Luiz Aoqui	d910457159	csi: prevent panic on volume delete (#18234 ) When a CSI volume is deleted while its plugin is not running, the function `volAndPluginLookup` returns a `nil` plugin value resulting in a panic in the request handler.	2023-08-17 10:09:45 -04:00
Tim Gross	0a19fe3b60	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:39:09 -04:00
Seth Hoenig	a45b689d8e	update go1.21 (#18184 ) * build: update to go1.21 * go: eliminate helpers in favor of min/max * build: run go mod tidy * build: swap depguard for semgrep * command: fixup broken tls error check on go1.21	2023-08-15 14:40:33 +02:00
Tim Gross	a3a86a849a	test: deflake node drain intergration test (#18171 ) The `TestDrainer_AllTypes_NoDeadline` test has been flaky. It looks like this might be because the final update of batch allocations to complete is improperly updating the state store directly rather than by RPC. If the service jobs have restarted in the meantime, the `allocClientStateSimulator` will have updated the index on the allocations table and that will prevent the drainer from unblocking (and being marked complete) when the batch jobs are written with an earlier index. This changeset attempts to fix that by making the update via RPC (as it normally would be in real code).	2023-08-14 16:19:00 -04:00
Tim Gross	577d96034d	test: deflake job endpoint registration test (#18170 ) We've seen test flakiness in the `TestJobEndpoint_Register_NonOverlapping` test, which asserts that we don't try to placed allocations for blocked evals until resources have been actually freed by setting the client status of the previous alloc to complete. The flaky assertion includes sorting the two allocations by CreateIndex and this appears to be a non-stable sort in the context of the test run, which results in failures that shouldn't exist. There's no reason to sort the allocations instead of just examining them by ID. This changeset does so.	2023-08-14 16:18:53 -04:00
Esteban Barrios	9f19d7c373	config: add configurable content security policy (#18085 )	2023-08-14 14:25:21 -04:00
hc-github-team-nomad-core	f812bccb4e	Backport of Tuning job versions retention. #17635 into release/1.6.x (#18169 ) This pull request was automerged via backport-assistant	2023-08-07 13:48:09 -05:00
hc-github-team-nomad-core	ebcdd4d82d	backport of commit 65501ff97aa2ec6fa3c4f53d3f8c6c80c6a0e8a3 (#18166 ) This pull request was automerged via backport-assistant	2023-08-07 10:17:34 -05:00
Charlie Voiselle	bac4d112d1	[dep] bump golang.org/x/exp (#18102 ) There are some refactorings that have to be made in the getter and state where the api changed in `slices` * Bump golang.org/x/exp * Bump golang.org/x/exp in api * Update job_endpoint_test * [feedback] unexport sort function	2023-08-03 15:14:39 -04:00
hc-github-team-nomad-core	2ed92e0c6c	Backport of feature: Add new field render_templates on restart block into release/1.6.x (#18094 ) This pull request was automerged via backport-assistant	2023-07-28 13:54:00 -05:00
hc-github-team-nomad-core	4f087674f4	backport of commit 7fe432042eaa0a97c0aaa40d302055eb18e8a9b0 (#18040 ) This pull request was automerged via backport-assistant	2023-07-24 02:28:28 -05:00
hc-github-team-nomad-core	30260f06e8	Backport of state: canonicalize namespace on restore into release/1.6.x (#18018 ) This pull request was automerged via backport-assistant	2023-07-20 15:05:16 -05:00
hc-github-team-nomad-core	e891026755	Backport of CSI: improve controller RPC reliability into release/1.6.x (#18015 ) This pull request was automerged via backport-assistant	2023-07-20 13:52:27 -05:00
hc-github-team-nomad-core	c67a225882	Prepare for next release	2023-07-18 18:51:15 +00:00
hc-github-team-nomad-core	609a97cfab	Generate files for 1.6.0 release	2023-07-18 18:51:11 +00:00
Tim Gross	e8bfef8148	search: fix ACL filtering for plugins and variables ACL permissions for the search endpoints are done in three passes. The first (the `sufficientSearchPerms` method) is for performance and coarsely rejects requests based on the passed-in context parameter if the user has no permissions to any object in that context. The second (the `filteredSearchContexts` method) filters out contexts based on whether the user has permissions either to the requested namespace or again by context (to catch the "all" context). Finally, when iterating over the objects available, we do the usual filtering in the iterator. Internal testing found several bugs in this filtering: * CSI plugins can be searched by any authenticated user. * Variables can be searched if the user has `job:read` permissions to the variable's namespace instead of `variable:list`. * Variables cannot be searched by wildcard namespace. This is an information leak of the plugin names and variable paths, which we don't consider to be privileged information but intended to protect anyways. This changeset fixes these bugs by ensuring CSI plugins are filtered in the 1st and 2nd pass ACL filters, and changes variables to check `variable:list` in the 2nd pass filter unless the wildcard namespace is passed (at which point we'll fallback to filtering in the iterator). Fixes: CVE-2023-3300 Fixes: #17906	2023-07-18 12:09:55 -04:00
hc-github-team-nomad-core	0951fe1c50	backport of commit 0a5e90120b18ff450457463d6bcee68ec6804bb0 (#17900 ) This pull request was automerged via backport-assistant	2023-07-11 10:00:05 -05:00
Lance Haig	0455389534	Add the ability to customise the details of the CA (#17309 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-07-11 08:53:09 +01:00
Tim Gross	ad7355e58b	CSI: persist previous mounts on client to restore during restart (#17840 ) When claiming a CSI volume, we need to ensure the CSI node plugin is running before we send any CSI RPCs. This extends even to the controller publish RPC because it requires the storage provider's "external node ID" for the client. This primarily impacts client restarts but also is a problem if the node plugin exits (and fingerprints) while the allocation that needs a CSI volume claim is being placed. Unfortunately there's no mapping of volume to plugin ID available in the jobspec, so we don't have enough information to wait on plugins until we either get the volume from the server or retrieve the plugin ID from data we've persisted on the client. If we always require getting the volume from the server before making the claim, a client restart for disconnected clients will cause all the allocations that need CSI volumes to fail. Even while connected, checking in with the server to verify the volume's plugin before trying to make a claim RPC is inherently racy, so we'll leave that case as-is and it will fail the claim if the node plugin needed to support a newly-placed allocation is flapping such that the node fingerprint is changing. This changeset persists a minimum subset of data about the volume and its plugin in the client state DB, and retrieves that data during the CSI hook's prerun to avoid re-claiming and remounting the volume unnecessarily. This changeset also updates the RPC handler to use the external node ID from the claim whenever it is available. Fixes: #13028	2023-07-10 13:20:15 -04:00
Tim Gross	5025731ebe	consul: handle "not found" errors from Consul when deleting tokens (#17847 ) In Consul 1.15.0, the Delete Token API was changed so as to return an error when deleting a non-existent ACL token. This means that if Nomad successfully deletes the token but fails to persist that fact, it will get stuck trying to delete a non-existent token forever. Update the token deletion function to ignore "not found" errors and treat them as successful deletions. Fixes: #17833	2023-07-07 16:22:13 -04:00
James Rasell	45073e8a05	job: ensure node pool is canonicalized for state restores. (#17765 )	2023-06-30 07:37:22 +01:00
nicoche	649831c1d3	deploymentwatcher: fail early whenever possible (#17341 ) Given a deployment that has a `progress_deadline`, if a task group runs out of reschedule attempts, allow it to fail at this time instead of waiting until the `progress_deadline` is reached. Fixes: #17260	2023-06-26 14:01:03 -04:00
James Rasell	74ab0badb4	test: add drain config tests. (#17724 )	2023-06-26 16:23:13 +01:00
Luiz Aoqui	66962b2b28	np: fix list of jobs for node pool `all` (#17705 ) Unlike nodes, jobs are allowed to be registered in the node pool `all`, in which case all nodes are used for evaluating placements. When listing jobs for the `all` node pool only those that are explicitly in this node pool should be returned.	2023-06-23 15:47:53 -04:00

1 2 3 4 5 ...

4424 Commits