open-nomad

Commit Graph

Author	SHA1	Message	Date
Jai	e55edf58ff	chore: add percy tests (#17157 )	2023-05-12 09:57:22 -04:00
Jai	27f0d104e5	16664/upgrade (#17158 ) * chore: upgrade Upgrade @babel/helper-string-parserprop-types * chore: add resolution * chore: update component API for breaking changes * chore: update arguments * api: forgive user for pass wrong args * chore: update tests * chore: update yarn lock * chore: upgrade to Glimmer component * styling: add properties to component invocation * chore: add inset styles	2023-05-12 09:54:13 -04:00
Jai	ce29e55b7a	chore: write js doc (#17156 ) * chore: write jsdoc comments * chore: update comments	2023-05-11 15:30:54 -04:00
Seth Hoenig	81e36b3650	core: eliminate second index on job_submissions table (#17146 ) * core: eliminate second index on job_submissions table This PR refactors the job_submissions state store code to eliminate the use of a second index formerly used for purging all versions of a given job. In practice we ended up with duplicate entries on the table. Instead, use index prefix scanning on the primary index and tidy up any potential for creating (or removing) duplicates. * core: pr comments followup	2023-05-11 09:51:08 -05:00
Phil Renaud	a910e4be1c	[ui, deployments] Denominator based on completed allocations for batch jobs (#17147 ) * Denominator based on completed allocations for batch jobs * Test for denominatored batch job change	2023-05-11 10:23:23 -04:00
Tim Gross	9ed75e1f72	client: de-duplicate alloc updates and gate during restore (#17074 ) When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors: * The `allocSync` method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the `allocSync` races with the `Node.ClientGetAlloc` RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete". * The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and `allocSync` sending the bogus or duplicate update. This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the `allocSync` before adding the update to the batch, and updated when `Node.UpdateAlloc` returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts. The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore. Fixes #16381	2023-05-11 09:05:24 -04:00
Seth Hoenig	4abb3e03ca	cli: upload var file(s) content on job submission (#17128 ) This PR makes it so that the content of any -var-file files is uploaded to Nomad on job run.	2023-05-11 08:04:33 -05:00
Jai	24afd86cc5	ui: add sign-in link on err page (#17140 )	2023-05-11 08:24:58 -04:00
Luiz Aoqui	d800dc3367	deps: update go-metrics to prevent panic (#17133 ) nomad#15861 describes intermitent panics caused by go-metrics Prometheus client. We have not been able to further debug this problem due to the lack of information when the panic happens. go-metrics#146 prevents the panic from happening and also logs additional information that can help us understand the root cause of the problem. This commits pins the go-metric dependency to this branch until we can better debug the issue.	2023-05-10 21:33:15 -04:00
Phil Renaud	25b66e249d	[ui] Batch jobs, aside from child jobs, get the new status panel (#17118 ) * Batch jobs, aside from child jobs, get the new status panel * Clean up the imported jobAllocStatuses * Note for mirage that batch jobs now have a historical status panel * Batch job test for complete status * Parameterized and periodic child jobs get the panel treatment * Undo parameterized and periodic child test situations	2023-05-10 16:59:33 -04:00
Phil Renaud	681ea73913	Versions returned as an array rather than an object now, pending changed to unknown, and sorted (#17150 )	2023-05-10 15:40:54 -04:00
James Rasell	c60c5ace60	api: update Go mod go version to 1.20 to match main mod. (#17137 )	2023-05-10 16:29:06 +01:00
James Rasell	48357c609b	test: fix flakey broker notifier test. (#16994 )	2023-05-10 13:40:25 +01:00
Jai	08d97a19ca	feat: visualize HCL Job Specification in the Nomad UI `jobs.job.definition` view (#16669 ) * ui: Toggle for `read-only` view (#16279) * ui: model update for specification * style: add styling for select * style: add styling for select * refact: add spec to view * refact: update component API * test: refactor for new UI state * refact: clean conditional * refact: update component API for prop * chore: correct naming * chore: remove `fn` helper Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * update `default` Mirage scenario (#16496) * chore: update mirage scenario: * ui: conditionally render toggle button (#16497) * chore: update css variable name (#16498) --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * ui: Display JSON view of variables associated to job specification (#16570) * chore: move fixture to util * chore: update tests: * ui: display variables table * chore: add mirage fixture (#16572) * ui: regex for job spec parse (#16668) * ui: remove variable table (#16670) * ui: notify user if specification has variables (#16671) * ui: regex for job spec parse * chore: deprecate variable references * chore: update mirage * ui: add notification * test: add test coverage for parse method (#16590) * refact: `JobEditor` reactive query parameters (#16710) * refact: add query parameter * refact: move toggle action to controller * ui: remove toggle behavior in `JobEditor` (#16711) * refact: rename logic for select * chore: instantiate qp in route * refact: uniform alerts (#16715) * style: buffer between alert and header * refact: extract alerts into a component * chore: update tests for qp * chore: defensive logic for app controller * refact: move `edit` state to controller (#16725) * refact: move edit state to controller * refact: handle edit state (#16731) * refact: handle edit state * ui: warning message (#16732) * ui: warning message * ui: enable editing of HCL vars in the UI (#16734) * enable editing of HCL vars * refact: default qp logic * refact: alert condition * refact: Pass `variables` as string (#16849) * ui: Toggle for `read-only` view (#16279) * ui: model update for specification * style: add styling for select * style: add styling for select * refact: add spec to view * refact: update component API * test: refactor for new UI state * refact: clean conditional * refact: update component API for prop * chore: correct naming * chore: remove `fn` helper Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * update `default` Mirage scenario (#16496) * chore: update mirage scenario: * ui: conditionally render toggle button (#16497) * chore: update css variable name (#16498) --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * refact: `JobEditor` reactive query parameters (#16710) * refact: add query parameter * refact: move toggle action to controller * ui: remove toggle behavior in `JobEditor` (#16711) * refact: rename logic for select * chore: instantiate qp in route * refact: uniform alerts (#16715) * style: buffer between alert and header * refact: extract alerts into a component * chore: update tests for qp * chore: defensive logic for app controller * refact: move `edit` state to controller (#16725) * refact: move edit state to controller * refact: handle edit state (#16731) * refact: handle edit state * ui: warning message (#16732) * ui: warning message * ui: enable editing of HCL vars in the UI (#16734) * enable editing of HCL vars * refact: default qp logic * refact: alert condition * refact: variables as string * style: revert styling change --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * bug: correctly edit variables (#16989) * ui: visualize variables (#16987) * ui: fetchRawSpecification * refact: integrate new model method * test: fetchRaw unit * styling: enable height on cm * chore: update copy * feat: visual variables * chore: conditional render info txt * refact: add mirage endpoint * refact: update test for new schema * refact: job submit flow (#17015) * refact: job update logic * chore: remove dead code * bug: update `job.run` and `job.update` adapter methods (#17055) * refact: update adapter * chore: update api usage * styling: UX requests (#17064) * refact: update adapter * chore: update api usage * styling: disable toggle w text * styling: stick button * style: space out alerts * chore: autofocus on first editor * bug: dismiss alert * chore: add jsdoc and assertion check * chore: update mirage for Vercel (#17054) * chore: mirage logic for vercel deploy * chore: update test for mirage change * refact: API refactoring (#17083) * refact: udpate for req schema * refact: update for variable flags and literal * bug: visualize job model not derived state * chore: update copy * chore: fix incorrect copy * chore: deprecate variables derived state * chore: update copy * feat: enable toggle on edit * chore: prettify * refact: move conditional --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>	2023-05-09 11:03:52 -04:00
Seth Hoenig	6f4992ef29	client: unveil /etc/ssh/ssh_known_hosts for artifact downloads (#17122 ) This PR fixes a bug where nodes configured with populated /etc/ssh/ssh_known_hosts files would be unable to read them during artifact downloading. Fixes #17086	2023-05-09 09:43:52 -05:00
Seth Hoenig	74714272cc	api: set the job submission during job reversion (#17097 ) * api: set the job submission during job reversion This PR fixes a bug where the job submission would always be nil when a job goes through a reversion to a previous version. Basically we need to detect when this happens, lookup the submission of the job version being reverted to, and set that as the submission of the new job being created. * e2e: add e2e test for job submissions during reversion This e2e test ensures a reverted job inherits the job submission associated with the version of the job being reverted to.	2023-05-08 14:18:34 -05:00
Daniel Bennett	a7ed6f5c53	full task cleanup when alloc prerun hook fails (#17104 ) to avoid leaking task resources (e.g. containers, iptables) if allocRunner prerun fails during restore on client restart. now if prerun fails, TaskRunner.MarkFailedKill() will only emit an event, mark the task as failed, and cancel the tr's killCtx, so then ar.runTasks() -> tr.Run() can take care of the actual cleanup. removed from (formerly) tr.MarkFailedDead(), now handled by tr.Run(): * set task state as dead * save task runner local state * task stop hooks also done in tr.Run() now that it's not skipped: * handleKill() to kill tasks while respecting their shutdown delay, and retrying as needed * also includes task preKill hooks * clearDriverHandle() to destroy the task and associated resources * task exited hooks	2023-05-08 13:17:10 -05:00
Luiz Aoqui	53020c0941	Revert "ci: use `BACKPORT_MERGE_COMMIT` option (#16730 )" (#17116 ) This reverts commit 1721e687c0832bea3d9b7eec5dcd3c4e7a924d71. The change was expected to solve the sporadic problems we were having with Backport Assistant, but it end up creating even more failures.	2023-05-08 13:30:43 -04:00
Tim Gross	ba269eaf3f	docs: add note to upgrade guide about yanked version (#17115 ) Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad 1.5.5. Unfortunately we can't yank the release but we should leave a note in the upgrade guide telling users to avoid it.	2023-05-08 13:28:45 -04:00
stswidwinski	9c1c2cb5d2	Correct the status description and modify time of canceled evals. (#17071 ) Fix for #17070. Corrected the status description and modify time of evals which are canceled due to another eval having completed in the meantime.	2023-05-08 08:50:36 -04:00
Phil Renaud	2fbbac5dd8	[ui, deployments] Job status for System Jobs (#17046 ) * System jobs get a panel and lost status reinstated * Leveraging nodes and not worrying about rescheds for system jobs * Consistency w restarted as well * Text shadow removed and early return where possible * System jobs added to the Historical Click list * System alloc and client summary panels removed * Bones of some new system jobs tests * [ui, deployments] handle node read permissions for system job panel (#17073) * Do the next-best-thing when we cant read nodes for system jobs * Whitespace control handlebars expr * Simplifies system jobs to not attempt to show a desired count, since it is a particularly complex number depending on constraints, number of nodes, etc. * [ui, deployments] Fix order in which allocations are ascribed to the status chart (#17063) * Discovery of alloc.isOld * Correct sorting and better types * A more honest walk-back that prioritizes running and pending allocs first * Test scenario for descending-order allocs to show * isOld mandates that we set a job version for our created job. Could also do this in the factory but maybe side-effecty * Type simplification * Fixed up a test that needed system job summary to be updated * Tests for modifications to the job summary * Explicitly mark the service jobs in test as not-deploying	2023-05-05 16:25:21 -04:00
Shantanu Gadgil	2cf27389ad	minor typo; 1.3.x not 1.13.x (#17101 )	2023-05-05 13:51:05 -04:00
Tim Gross	5f3ff346ea	post release 1.5.5 (#17098 ) * changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9, and 1.3.14 * note on deprecation of `logs.enabled` field	2023-05-05 11:46:08 -04:00
Seth Hoenig	fff2eec625	connect: use heuristic to detect sidecar task driver (#17065 ) * connect: use heuristic to detect sidecar task driver This PR adds a heuristic to detect whether to use the podman task driver for the connect sidecar proxy. The podman driver will be selected if there is at least one task in the task group configured to use podman, and there are zero tasks in the group configured to use docker. In all other cases the task driver defaults to docker. After this change, we should be able to run typical Connect jobspecs (e.g. nomad job init [-short] -connect) on Clusters configured with the podman task driver, without modification to the job files. Closes #17042 * golf: cleanup driver detection logic	2023-05-05 10:19:30 -05:00
James Rasell	6ec4a69f47	scale: fixed a bug where evals could be created with wrong type. (#17092 ) The job scale RPC endpoint hard-coded the eval creation to use the type of service. This meant scaling events triggered on jobs of type batch would create evaluations with the wrong type, which does not seem to cause any problems, just confusion when correlating the two.	2023-05-05 14:46:10 +01:00
Tim Gross	17bd930ca9	logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087 ) When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076	2023-05-04 16:01:18 -04:00
Seth Hoenig	b4c9f3bbc2	client: fix job_max_source_size client config name (#17067 ) Intended to be job_max_source_size, rather than max_job_source_size. This way it fits better with existing client config options related to jobs.	2023-05-04 13:54:51 -05:00
Seth Hoenig	4347c1d705	docs: move CNI reference plugins installation to CNI overview page (#17068 ) * docs: move CNI reference plugins installation to CNI overview page This PR moves the instruction steps for install the CNI reference plugins from the Consul Mesh integration page to the general Networking CNI page. These plugins are required for bridge networking, not just Consul Mesh, so it makes sense to have them on the general CNI page. Closes #17038 * docs: fix a link to post install steps	2023-05-04 11:32:06 -05:00
James Rasell	50414bba12	docs: update artifact jobspec sshkey example path. (#17077 )	2023-05-04 14:29:36 +01:00
Michael Schurter	3b3b02b741	dep: update from jwt/v4 to jwt/v5 (#17062 ) Their release notes are here: https://github.com/golang-jwt/jwt/releases Seemed wise to upgrade before we do even more with JWTs. For example this upgrade would have mattered if we already implemented common JWT claims such as expiration. Since we didn't rely on any claim verification this upgrade is a noop... ...except for 1 test that called `Claims.Valid()`! Removing that assertion seems scary, but it didn't actually do anything because we didn't implement any of the standard claims it validated: https://github.com/golang-jwt/jwt/blob/v4.5.0/map_claims.go#L120-L151 So functionally this major upgrade is a noop.	2023-05-03 11:17:38 -07:00
Charlie Voiselle	8f6fa14e9e	[deps] Update consul-template to v0.31.0 (#16908 ) * Update consul-template to v0.31.0 * Add changelog	2023-05-03 09:15:17 -04:00
Michael Schurter	f8f9e91b8a	build: upgrade from go 1.20.3 to 1.20.4 (#17056 ) Includes CVE fixes that do not impact Nomad: https://groups.google.com/g/golang-announce/c/MEb0UyuSMsU	2023-05-02 13:09:11 -07:00
Charlie Voiselle	61f997d806	Add WriterUI (#17051 ) This special purpose UI provides commands that can benefit from direct access to the io.Reader and io.Writers of the base cli.Ui. It can traverse a chain of ColoredUis to find the base. Currently, it can retrieve writers from a cli.BasicUi (or cli.MockUi for testing). Renames ui.go and ui_test.go to log_ui.go and log_ui_test.go	2023-05-02 13:40:44 -04:00
Seth Hoenig	e9fec4ebc8	connect: remove unusable path for fallback envoy image names (#17044 ) This PR does some cleanup of an old code path for versions of Consul that did not support reporting the supported versions of Envoy in its API. Those versions are no longer supported for years at this point, and the fallback version of envoy hasn't been supported by any version of Consul for almost as long. Remove this code path that is no longer useful.	2023-05-02 09:48:44 -05:00
Seth Hoenig	e8d53ea30b	connect: use explicit docker.io prefix in default envoy image names (#17045 ) This PR modifies references to the envoyproxy/envoy docker image to explicitly include the docker.io prefix. This does not affect existing users, but makes things easier for Podman users, who otherwise need to specify the full name because Podman does not default to docker.io	2023-05-02 09:27:48 -05:00
Luiz Aoqui	7b5a8f1fb0	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.	2023-05-01 17:18:34 -04:00
Seth Hoenig	86f6a38867	connect: do not restrict auto envoy version to docker task driver (#17041 ) This PR updates the envoy_bootstrap_hook to no longer disable itself if the task driver in use is not docker. In other words, make it work for podman and other image based task drivers. The hook now only checks that 1. the task is a connect sidecar 2. the task.config block contains an "image" field	2023-05-01 15:07:35 -05:00
Phil Renaud	922c593203	[ui, deployments] Restarted and Rescheduled panel cells (#16972 ) * Status panel shows failed and lost, but probably dont have the condition quite right * Rescheduled and Replaced cells instead of a general failed/lost one * Tests moving to acceptance * Fixed desiredTotal and added acceptance test for restarted * moved integration test into acceptance test generally * Now that we represent Lost in the graph, have to make our unplaced testcase as Unknown * No need to declare new vars for immediately returned getters * Literal restart and resched add to the tallies, rather than 'would have but ran out of attampts' like before * Testfixes now that weve redefined what restarts and reschedules are indicated by	2023-05-01 15:24:21 -04:00
Tim Gross	5503eb97f5	add copywrite headers commit to ignore-revs config file (#17037 )	2023-05-01 10:57:43 -04:00
Phil Renaud	a637354ae0	[ui, deployments] Don't separate allocation groups based on their deployment health unless they're "running" (#17016 ) * Group up non-running allocs regardless of deploymenthealth * Better asynchrony in test	2023-04-28 14:52:42 -04:00
Phil Renaud	0805271f8f	percy-specific css to hide table cells in the job status panel acceptance test (#17021 )	2023-04-28 14:51:53 -04:00
Phil Renaud	5ca59aef56	Move the token JWT console log out of an interator (#17010 )	2023-04-28 13:46:10 -04:00
Seth Hoenig	5744b2cd4f	docs: add more notes about artifact breaking changes in 1.5.0 (#17005 ) * changelog: note artifact breaking changes for 1.5.0 * docs: add note about environment variables to artifact job spec docs * Update website/content/docs/job-specification/artifact.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-27 11:41:18 -05:00
Michael Schurter	d3b0bbc088	deps: update go-bexpr from 0.1.11 to 0.1.12 (#16991 ) Pulls in https://github.com/hashicorp/go-bexpr/pull/38 Fixes #16758	2023-04-27 09:01:42 -07:00
Tim Gross	87f416943c	testing: improve fidelity of mock driver task restore (#16990 ) While working on client status update improvements, I encountered problems getting tests with the mock driver to correctly restore. Unlike typical drivers the mock driver doesn't have an external source of truth for whether the task is running (ex. making API calls to `dockerd` or looking for a running PID), and so in order to make up that information, it re-parses the original task config. But the taskrunner doesn't call the encoding step for `RecoverTask`, only `StartTask`, so the task config the mock driver gets is missing data. Update the mock driver to stash the "external" state in the task state that we'll get from the task runner, so that we don't have to try to recover from the original `TaskConfig` anymore. This should bring the mock driver closer to the behavior of the other drivers.	2023-04-27 11:54:10 -04:00
James Rasell	fddef4c6e1	docs: use appropriate file extension for autoscaler agent config. (#16993 )	2023-04-27 15:00:28 +01:00
Phil Renaud	7f7f764c5a	[ui] Fixed: Evaluations sidebar/response not scrollable (#16960 ) * Sets up a CSS grid for Evaluations sidebar * Flex seems more sensible for this actually * Tighten up the header margin * Percy found a diff; the expand button wasnt showing for view logs sidebar	2023-04-27 09:49:18 -04:00
James Rasell	ac98c2ed40	vars: ensure struct reciever names are consistent. (#16995 )	2023-04-27 13:51:11 +01:00
James Rasell	4d2c1403c2	scale: do not allow scaling of jobs with type system. (#16969 )	2023-04-25 15:47:44 +01:00
Seth Hoenig	f221e99572	tools: update dependencies and use tree set (#16974 ) * tools: bump go mod deps for tools module * tools: use treeset in tools/missing	2023-04-25 07:47:19 -05:00

1 2 3 4 5 ...

24598 Commits All Branches Search

24598 Commits

All Branches