open-nomad

Author	SHA1	Message	Date
Samantha	54f8c04c91	Fix health checking for ephemeral poststart tasks (#11945 ) Update the logic in the Nomad client's alloc health tracker which erroneously marks existing healthy allocations with dead poststart ephemeral tasks as unhealthy even if they were already successful during a previous deployment.	2022-02-02 16:29:49 -05:00
Seth Hoenig	db2347a86c	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Luiz Aoqui	c4cff5359f	Verify TLS certificate on endpoints that are used between agents only (#11956 )	2022-02-02 15:03:18 -05:00
Seth Hoenig	f6217fe424	Merge pull request #11972 from hashicorp/b-disable-semgrep-structs build: disable semgrep on structs.go for now	2022-02-02 12:57:47 -06:00
James Rasell	ba735bc35f	Merge pull request #11976 from hashicorp/b-gh-11950-missed e2e: moved missed volume test stop command to util helper.	2022-02-02 09:58:11 +01:00
James Rasell	adc3c44e29	e2e: moved missed volume test stop command to util helper.	2022-02-02 08:42:58 +01:00
Tim Gross	0b1978736e	Merge pull request #11971 from hashicorp/merge-release-1.2.5-branch prepare for next release	2022-02-01 11:16:38 -05:00
Tim Gross	7a0d151fab	prepare for next release	2022-02-01 11:13:22 -05:00
Seth Hoenig	60ca29161f	build: disable semgrep on structs.go for now	2022-02-01 10:09:49 -06:00
Tim Gross	95f26b307d	update download to Nomad v1.2.5 (#11969 )	2022-02-01 11:04:06 -05:00
James Rasell	fb7dbdf35d	Merge pull request #11968 from hashicorp/b-gh-11950 e2e: account for new job stop CLI exit behaviour.	2022-02-01 15:56:56 +01:00
Seth Hoenig	5f07ab5c80	Merge pull request #11966 from hashicorp/deps-no-special-vendor deps: import libtime the normal way	2022-02-01 08:46:30 -06:00
James Rasell	0a50d9fd2a	e2e: account for new job stop CLI exit behaviour. PR #11550 changed the job stop exit behaviour when monitoring the deployment. When stopping a job, the deployment becomes cancelled and therefore the CLI now exits with status code 1 as it see this as an error. This change adds a new utility e2e function that accounts for this behaviour.	2022-02-01 14:16:37 +01:00
Michael Schurter	fd242ab7f8	Merge pull request #11878 from kainoaseto/fix/multi-task-group-canary-deploys Bugfix: auto-promote canary taskgroups when mixed with non-canary taskgroups	2022-01-31 16:22:51 -08:00
kainoaseto	d575b3f4ae	rename test variable names to something a easier to identify	2022-01-31 14:59:52 -08:00
Michael Schurter	8973cc39a3	Merge pull request #11944 from hashicorp/b-validate-plan core: prevent malformed plans from crashing leader	2022-01-31 13:14:28 -08:00
Seth Hoenig	04f84bcdfe	deps: import libtime the normal way Previously we copied this library by hand to avoid vendor-ing a bunch of files related to minimock. Now that we no longer vendor, just import the library normally. Also we might use more of the library for handling `time.After` uses, for which this library provides a Context-based solution.	2022-01-31 14:49:05 -06:00
Michael Schurter	dcf15d5960	docs: add changelog for #11878	2022-01-31 12:21:31 -08:00
Michael Schurter	d87ed3fcd7	core: prevent malformed plans from crashing leader The Plan.Submit endpoint assumed PlanRequest.Plan was never nil. While there is no evidence it ever has been nil, we should not panic if a nil plan is ever submitted because that would crash the leader.	2022-01-31 12:15:15 -08:00
Nomad Release Bot	cf7f0977ff	Release v1.2.5	2022-01-31 15:36:54 +00:00
Nomad Release bot	8af121bfbe	Generate files for 1.2.5 release	2022-01-31 14:54:26 +00:00
Tim Gross	18c528313c	docs: add 1.2.5 to changelog	2022-01-28 15:08:48 -05:00
Tim Gross	6af1b359ed	docs: missing changelog for #11892 (#11959 )	2022-01-28 15:08:48 -05:00
Tim Gross	ea69eda522	docs: missing changelog for #11892 (#11959 )	2022-01-28 15:04:32 -05:00
Tim Gross	d8a74efb07	set LAST_RELEASE to 1.2.4 for the 1.2.5 release branch	2022-01-28 14:50:54 -05:00
Tim Gross	622ed093ae	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 14:43:58 -05:00
Tim Gross	5773fc93a2	CSI: move terminal alloc handling into denormalization (#11931 ) * The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.	2022-01-28 14:43:50 -05:00
Tim Gross	c67c31e543	csi: ensure that PastClaims are populated with correct mode (#11932 ) In the client's `(csiHook) Postrun()` method, we make an unpublish RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing` state and using the mode from the client. But then in the `(CSIVolume) Unpublish` RPC handler, we query the volume from the state store (because we only get an ID from the client). And when we make the client RPC for the node unpublish step, we use the _current volume's_ view of the mode. If the volume's mode has been changed before the old allocations can have their claims released, then we end up making a CSI RPC that will never succeed. Why does this code path get the mode from the volume and not the claim? Because the claim written by the GC job in `(*CoreScheduler) csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim in the unpublishing state to ensure the volumewatcher detects a "past claim" change and reaps all the claims on the volumes. Fix this by ensuring that the `CSIVolumeDenormalize` creates past claims for all nil allocations with a correct access mode set.	2022-01-28 14:43:43 -05:00
Tim Gross	951661db04	CSI: resolve invalid claim states (#11890 ) * csi: resolve invalid claim states on read It's currently possible for CSI volumes to be claimed by allocations that no longer exist. This changeset asserts a reasonable state at the state store level by registering these nil allocations as "past claims" on any read. This will cause any pass through the periodic GC or volumewatcher to trigger the unpublishing workflow for those claims. * csi: make feasibility check errors more understandable When the feasibility checker finds we have no free write claims, it checks to see if any of those claims are for the job we're currently scheduling (so that earlier versions of a job can't block claims for new versions) and reports a conflict if the volume can't be scheduled so that the user can fix their claims. But when the checker hits a claim that has a GCd allocation, the state is recoverable by the server once claim reaping completes and no user intervention is required; the blocked eval should complete. Differentiate the scheduler error produced by these two conditions.	2022-01-28 14:43:35 -05:00
Tim Gross	4e559c6255	csi: update leader's ACL in volumewatcher (#11891 ) The volumewatcher that runs on the leader needs to make RPC calls rather than writing to raft (as we do in the deploymentwatcher) because the unpublish workflow needs to make RPC calls to the clients. This requires that the volumewatcher has access to the leader's ACL token. But when leadership transitions, the new leader creates a new leader ACL token. This ACL token needs to be passed into the volumewatcher when we enable it, otherwise the volumewatcher can find itself with a stale token.	2022-01-28 14:43:27 -05:00
Derek Strickland	460416e787	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-28 14:41:49 -05:00
Noel Quiles	9dcb7306da	website: Add Demandbase tag to consent manager (#11941 ) * chore: Add Demandbase tag to consent manager * fix: Add services to manager options	2022-01-28 14:37:35 -05:00
Jai	9a3a440dcf	Merge pull request #11711 from hashicorp/f-ui/evaluations-table feat: add evaluations view with table	2022-01-28 11:15:42 -05:00
Jai Bhagat	8533abde2e	fix: update eval serializer to latest changes	2022-01-28 10:16:23 -05:00
Jai Bhagat	ead706037d	ui: add assert.expect to a11y test	2022-01-28 09:47:23 -05:00
Jai Bhagat	3cc798d967	chore: fix js linting	2022-01-28 09:37:32 -05:00
Jai Bhagat	3eb34a577e	style: add styling icons and padding to table footer buttons	2022-01-28 09:35:44 -05:00
Jai Bhagat	e5b154e295	feat: add pagination and filtering to evaluations view	2022-01-28 09:35:44 -05:00
Jai Bhagat	1f80081c9d	feat: add pagination to evaluations.index	2022-01-28 09:35:44 -05:00
Jai Bhagat	9086d4e2d4	feat: add meta evaluations To support pagination on evaluations queries.	2022-01-28 09:35:44 -05:00
Jai Bhagat	9128f13676	feat: extract status cell logic into component	2022-01-28 09:35:44 -05:00
Jai Bhagat	e3ca737f97	fix: move evaluations template to index and inside page layout	2022-01-28 09:35:43 -05:00
Jai Bhagat	aaa2dadf16	chore: run prettier on gutter-menu	2022-01-28 09:35:43 -05:00
Jai Bhagat	ab4c768340	feat: add evalutions view with table	2022-01-28 09:35:43 -05:00
Tim Gross	66b4b28b1a	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 08:30:31 -05:00
Charlie Voiselle	f522e08835	Update gopsutil to 3.21.12	2022-01-27 14:10:15 -05:00
Jai	ff9d39a6b3	Merge pull request #11942 from hashicorp/f-ui/test-tooling ui: test tooling	2022-01-27 11:21:23 -05:00
Seth Hoenig	56bf1e4e7b	Merge pull request #11951 from hashicorp/b-cgroups-broken-part1-oss client: change test to not poke cgroupv2 edge case	2022-01-27 10:06:03 -06:00
Tim Gross	b20a6c9ffb	CSI: move terminal alloc handling into denormalization (#11931 ) * The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.	2022-01-27 10:39:08 -05:00
Tim Gross	a40a20cff8	csi: ensure that PastClaims are populated with correct mode (#11932 ) In the client's `(csiHook) Postrun()` method, we make an unpublish RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing` state and using the mode from the client. But then in the `(CSIVolume) Unpublish` RPC handler, we query the volume from the state store (because we only get an ID from the client). And when we make the client RPC for the node unpublish step, we use the _current volume's_ view of the mode. If the volume's mode has been changed before the old allocations can have their claims released, then we end up making a CSI RPC that will never succeed. Why does this code path get the mode from the volume and not the claim? Because the claim written by the GC job in `(*CoreScheduler) csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim in the unpublishing state to ensure the volumewatcher detects a "past claim" change and reaps all the claims on the volumes. Fix this by ensuring that the `CSIVolumeDenormalize` creates past claims for all nil allocations with a correct access mode set.	2022-01-27 10:05:41 -05:00

... 3 4 5 6 7 ...

22626 commits