open-nomad

Commit Graph

Author	SHA1	Message	Date
Phil Renaud	dd824ac3f8	Changelog for visual diff tests (#12909 )	2022-05-06 11:29:10 -04:00
Luiz Aoqui	eb50273b32	ci: update backport assitant workflow (#12899 ) Remove the step to automatically backport `backport/website` PRs to the latest release. This will be done manually by adding the proper tags. Also use squash backports to match the pattern we use for `main`.	2022-05-06 10:15:59 -04:00
James Rasell	9ea1a6faf6	fsm: add service registration snapshot persistence. (#12896 )	2022-05-06 15:53:27 +02:00
Luiz Aoqui	c502a249b9	ci: revert file changes and add some checks (#12873 ) During the release there are several files that need to be modified: - .release/ci.hcl: the notification channel needs to be updated to a channel with greater team visibility during the release. - version/version.go: the Version and VersionPrerelease variables need to be set so they match the release version. After the release these files need to be reverted. For GA releases the following additional changes also need to happen: - version/version.go: the Version variable needs to be bumped to the next version number. - GNUMakefile: the LAST_RELEASE variable needs to be set to the version that was just released. Since the release process will commit file changes to the branch being used for the release, it should _never_ run on main, so the first step is now to protect against that. It also adds a validation to make the user input version is correct. After looking at the different release options and steps I noticed that automatic CHANGELOG generation is actually the exception, so it would be better to have the default to be false.	2022-05-05 18:07:51 -04:00
Phil Renaud	6a8f98723e	Chronological most-recent evals by default (#12847 ) * Chronological most-recent evals by default * Adding reverse: true to the list of expected queryparams in test * changelog	2022-05-05 16:11:27 -04:00
Phil Renaud	b67bd4c377	Percy snapshot tests (#12872 ) * Sample percy test added * Node engine up to 14.x for UI prep * Force ui test rerun * Updated config.yml * Node v upgraded to 14 for docker image * Expect length in test * Running ember tests under percy exec * Percy exec format * Percy cli added * Noop to rerun tests with updated percy_token * Evals full list and details open snapshots * Pretty legit use of assert so disable the warning * Jobs list tests * Snapshots for top-level clients, servers, ACL, topology, and storage lists * Expect caveat for Topology test * Stabilizing tests with faker seeded to 1 * Seed-stabilizing any tests with percySnapshots * Faker import * Drop unused param * Assets and test audit using an older node version * New strategy: avoid seeding, just use percyCSS to hide certain things	2022-05-05 16:05:13 -04:00
Seth Hoenig	90ff784dcf	Merge pull request #12875 from hashicorp/b-cgroupsv2-task-restarts cgroups: make sure cgroup still exists after task restart	2022-05-05 10:54:29 -05:00
Tim Gross	26b9f88ef3	docs: add missing `set_contains_any` constraint docs (#12886 ) This constraint and affinity was added in 0.9.x but was only documented for affinities. Close that documentation gap.	2022-05-05 11:11:05 -04:00
Bryce Kalow	e9319abc78	website: remove source code and related CI jobs (#12596 ) * remove website source code and related circle jobs * remove data files * updates platform-cli * update local instructions * updates package-lock	2022-05-05 09:53:22 -05:00
Seth Hoenig	96ec19788d	cgroups: make sure cgroup still exists after task restart This PR modifies raw_exec and exec to ensure the cgroup for a task they are driving still exists during a task restart. These drivers have the same bug but with different root cause. For raw_exec, we were removing the cgroup in 2 places - the cpuset manager, and in the unix containment implementation (the thing that uses freezer cgroup to clean house). During a task restart, the containment would remove the cgroup, and when the task runner hooks went to start again would block on waiting for the cgroup to exist, which will never happen, because it gets created by the cpuset manager which only runs as an alloc pre-start hook. The fix here is to simply not delete the cgroup in the containment implementation; killing the PIDs is enough. The removal happens in the cpuset manager later anyway. For exec, it's the same idea, except DestroyTask is called on task failure, which in turn calls into libcontainer, which in turn deletes the cgroup. In this case we do not have control over the deletion of the cgroup, so instead we hack the cgroup back into life after the call to DestroyTask. All of this only applies to cgroups v2.	2022-05-05 09:51:03 -05:00
James Rasell	a05114fdac	core: add namespace to plan for node rejected log line. (#12868 )	2022-05-05 10:56:40 +02:00
James Rasell	e1bf9138a1	release: fix hcl linting error within CI file. (#12867 )	2022-05-04 10:48:42 +02:00
Michele Degges	9c85ddcb7f	Add config key to the promote-staging event (#12857 )	2022-05-03 20:33:14 -07:00
Michele Degges	417d3ca232	Add config key to the promote-staging event	2022-05-03 08:51:19 -07:00
Tim Gross	45b238ec82	CSI: node drain should end once only plugins remain (#12846 ) In #12324 we made it so that plugins wait until the node drain is complete, as we do for system jobs. But we neglected to mark the node drain as complete once only plugins (or system jobs) remaining, which means that the node drain is left in a draining state until the `deadline` time expires. This was incorrectly documented as expected behavior in #12324.	2022-05-03 10:20:22 -04:00
Alex Carpenter	d59b517ab2	[WIP] feat: homepage and use case pages redesign (#11873 ) * feat: connect homepage and use case pages * fix: internalLink usage * fix: query name * chore: add homepage patterns * chore: remove offerings * chore: add intro features * chore: bump subnav * chore: updating patterns * chore: add use case to the subnav * chore: cleanup unused import * chore: remove subnav border	2022-05-03 09:06:00 -04:00
Luiz Aoqui	6cd9881d2d	Update CHANGELOG for 1.3.0-rc.1 (#12849 )	2022-05-02 16:52:00 -04:00
Seth Hoenig	35728cbc58	Merge pull request #12740 from hashicorp/cleanup-makefile-help build: add missing help descriptions to makefile	2022-05-02 10:33:22 -05:00
Seth Hoenig	b8d807c320	Merge pull request #12840 from hashicorp/docs-nvidia-updates docs: update nvidia driver documentation	2022-05-02 10:07:02 -05:00
Luiz Aoqui	758e85bc84	ui: fix an error when navigating to a task group (#12832 ) Clicking in a task group row in the job details page would throw the error: Uncaught Error: You didn't provide enough string/numeric parameters to satisfy all of the dynamic segments for route jobs.job.task-group. Missing params: name createParamHandlerInfo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4814 applyToHandlers http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4804 applyToState http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4801 getTransitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4843 transitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4836 refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4885 refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2254 queryParamsDidChange http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2326 k http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2423 triggerEvent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2349 fireQueryParamDidChange http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4863 getTransitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4848 transitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4836 doTransition http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4853 transitionTo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4882 _doTransition http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2392 transitionTo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2177 gotoTaskGroup http://localhost:4646/ui/assets/nomad-ui-4a2c1941e03e60e1feef715f23cf268c.js:623 ... This was caused because the attribute being passed to the transitionTo function was not the task group name, but the whole model.	2022-05-02 11:01:19 -04:00
Seth Hoenig	684abb9e28	docs: update nvidia driver documentation notably: - name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu' - link to Nvidia docs for installing the container runtime toolkit - list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit	2022-05-02 09:11:05 -05:00
Matus Goljer	a741cc76b5	nomad can also install autocomplete for fish shell (#12834 )	2022-05-02 09:26:55 -04:00
Luiz Aoqui	59e2bcd809	ci: remove unused CircleCI Makefile (#12828 ) This Makefile was used to generate the full config.yml from smaller sub-files, but this is not done anymore.	2022-04-29 15:25:23 -04:00
Tim Gross	d06ad50538	docs: clarify `capacity_min/max` for volumes (#12825 ) The capacity fields for `create volume` set bounds on the resulting size of the volume, but the ultimate size of the volume will be determined by the storage provider (between the min and max). Clarify this in the documentation and provide a suggestion for how to set a exact size.	2022-04-29 13:38:30 -04:00
Thomas Wunderlich	245d2a463b	Fix formatting	2022-04-29 10:02:20 -04:00
Thomas Wunderlich	c86e287de9	Remove debug log lines	2022-04-28 19:14:31 -04:00
Thomas Wunderlich	960e192359	Quick and dirty hack to get interpolated dns values working	2022-04-28 17:09:53 -04:00
Derek Strickland	584bf0162f	docs: Add known limitations callouts to Max Client Disconnect section (#12801 ) * docs: Add known limitations callouts to Max Client Disconnect section	2022-04-28 16:17:14 -04:00
Phil Renaud	067234792a	Moves the evaluations table toolbar outside of the table-container (#12799 )	2022-04-28 16:08:46 -04:00
Luiz Aoqui	6c3473b778	ci: update the `hashicorp/actions-generate-metadata` action version (#12813 )	2022-04-28 15:24:55 -04:00
Jai	316daf581e	fix broken link to `task-group` in `Recent Allocation` table in `jobs.job.index` (#12765 ) * chore: run prettier on hbs files * ui: ensure to pass a real job object to task-group link * chore: add changelog entry * chore: prettify template * ui: template helper for formatting jobId in LinkTo component * ui: handle async relationship * ui: pass in job id to model arg instead of job model * update test for serialized namespace * ui: defend against null in tests * ui: prettified template added whitespace * ui: rollback ember-data to 3.24 because watcher return undefined on abort * ui: use format-job-helper instead of job model via alloc * ui: fix whitespace in template caused by prettier using template helper * ui: update test for new namespace * ui: revert prettier change Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-04-28 14:02:15 -04:00
Dave May	97cf204c00	debug: add version constraint to avoid pprof panic (#12807 )	2022-04-28 13:18:55 -04:00
Luiz Aoqui	0830e3c787	ci: fix build workflow trigger on push (#12806 )	2022-04-28 11:15:54 -04:00
Luiz Aoqui	cca49a054f	ci: setup release process with CRT (#12781 )	2022-04-27 20:14:23 -04:00
Derek Strickland	90daed7c1d	e2e: Wait for deployment to finish before disconnect (#12795 ) * Wait for deployment to finish * Don't reschedule disconnect or restart-node jobs	2022-04-27 12:27:03 -04:00
Phil Renaud	182bead357	[ui, mirage] Evaluation mocks (#12471 ) * Linear and Branching mock evaluations * De-comment * test-trigger * Making evaluation trees dynamic * Reinstated job relationship on eval mock * Dasherize job prefix back to normal * Handle bug where UUIDKey is not present on job * Appending node to eval * Job ID as a passed property * Remove unused import * Branching evals set up as generatable	2022-04-27 12:11:24 -04:00
Tim Gross	c763c4cb96	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Michael Schurter	e2544dd089	client: fix waiting on preempted alloc (#12779 ) Fixes #10200 The bug A user reported receiving the following error when an alloc was placed that needed to preempt existing allocs: ``` [ERROR] client.alloc_watcher: error querying previous alloc: alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup failed: index error: UUID must be 36 characters" ``` The previous alloc (8e) was already complete on the client. This is possible if an alloc stops after the scheduling decision was made to preempt it, but before the node running both allocations was able to pull and start the preemptor. While that is hopefully a narrow window of time, you can expect it to occur in high throughput batch scheduling heavy systems. However the RPC error made no sense! `previous_alloc` in the logs was a valid 36 character UUID! The fix The fix is: ``` - prevAllocID: c.Alloc.PreviousAllocation, + prevAllocID: watchedAllocID, ``` The alloc watcher new func used for preemption improperly referenced Alloc.PreviousAllocation instead of the passed in watchedAllocID. When multiple allocs are preempted, a watcher is created for each with watchedAllocID set properly by the caller. In this case Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters` error was coming from! Sadly we were properly referencing watchedAllocID in the log, so it made the error make no sense! The repro I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl) and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl) for ease of repro. First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad), then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad) that evicts 2 low priority jobs. Everything worked as expected. However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147), it reproduces the bug because the watcher references PreviousAllocation instead of watchedAllocID.	2022-04-26 13:14:43 -07:00
Tim Gross	cfd353207f	E2E: move volume mounts test to use golang's stdlib test runner (#12788 ) Part of ongoing work to remove the old E2E framework code.	2022-04-26 14:28:20 -04:00
Tim Gross	83eb879d61	E2E: remove old CLI for driving provisioning (#12787 ) We moved off the old provisioning process for nightly E2E to one driven entirely by Terraform quite a while back now. We're in the slow process of removing the framework code for this test-by-test, but this chunk of code no longer has any callers.	2022-04-26 13:43:25 -04:00
Tim Gross	3d630a3629	CSI: enforce one plugin supervisor loop via `sync.Once` (#12785 ) We enforce exactly one plugin supervisor loop by checking whether `running` is set and returning early. This works but is fairly subtle. It can briefly result in two goroutines where one quickly exits before doing any work. Clarify the intent by using `sync.Once`. The goroutine we've spawned only exits when the entire task runner is being torn down, and not when the task driver restarts the workload, so it should never be re-run.	2022-04-26 10:38:50 -04:00
Michael Schurter	6449ba8d41	api: add ParseHCLOpts helper method (#12777 ) The existing ParseHCL func didn't allow setting HCLv1=true.	2022-04-25 11:51:52 -07:00
Tim Gross	b2e4841747	CSI: plugin config updates should always be destructive (#12774 )	2022-04-25 12:59:25 -04:00
Luiz Aoqui	b8dd60f79c	update LAST_RELEASE comment to match new release branches structure (#12773 )	2022-04-25 11:57:55 -04:00
Michael Schurter	1256c8ef66	docs: update json jobs docs (#12766 ) * docs: update json jobs docs Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you want to acknowledge that sometimes our JSON job representations have a Job top-level wrapper and sometimes do not. The 2½ formats are: ``` 1. HCL JSON 2. Input API JSON (top-level Job field) 2.5. Output API JSON (lacks top-level Job field) ``` `#2` is what our docs consider our API JSON. `#2.5` seems to be an accident of history we can't fix with breaking API compatibility. `#1` is an even more interesting accident of history: the `jobspec2` package automatically detects if the input to Parse is JSON and switches to a JSON parser. This behavior is undocumented, the format is unspecified, and there is no official HashiCorp tooling to produce this JSON from HCL. The plot thickens when you discover popular third party tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to produce JSON that `nomad run` accepts! Since we have no telemetry around whether or not anyone passes HCL JSON to `nomad run`, and people don't file bugs around features that Just Work, I'm choosing to leave that code path in place and acknowledged but not suggested in documentation. See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive discussion of what officially supporting HCL JSON in Nomad would look like. (I also added some of the missing fields to the (Input API flavor) JSON Job documentation, but it still needs a lot of work to be comprehensive.) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-04-22 15:57:27 -07:00
Jai	b3985db31f	bug: fix filter and search (#12587 ) * chore: remove commented out code and skipped tests * refact: triggeredBy requires filter expression not qp * refact: use filter expression dsl instead of named params * fix: add type * docs: add in-line reference to filter expression DSL * fix: update filter copy for non-matches * fix: correct conditional logic to render no match copy	2022-04-22 15:40:13 -04:00
Phil Renaud	aed56e5732	Sets up a new z-modal z-index and assigns it to the sidebar (#12758 )	2022-04-22 15:23:49 -04:00
Phil Renaud	c0792b1092	Accidentally added while setting lint rules elsewhere (#12759 )	2022-04-22 15:04:21 -04:00
Tim Gross	766025cde7	CSI: plugin supervisor prestart should not mark itself done (#12752 ) The task runner hook `Prestart` response object includes a `Done` field that's intended to tell the client not to run the hook again. The plugin supervisor creates mount points for the task during prestart and saves these mounts in the hook resources. But if a client restarts the hook resources will not be populated. If the plugin task restarts at any time after the client restarts, it will fail to have the correct mounts and crash loop until restart attempts run out. Fix this by not returning `Done` in the response, just as we do for the `volume_mount_hook`.	2022-04-22 13:07:47 -04:00
James Rasell	24b499791d	deps: update consul-template to v0.29.0 (#12747 ) * deps: update consul-template to v0.29.0 * changelog: add entry for #12747	2022-04-22 09:58:54 -07:00

... 3 4 5 6 7 ...

23205 Commits All Branches Search

23205 Commits

All Branches