open-nomad

Author	SHA1	Message	Date
Derek Strickland	7d6a3df197	csi_hook: valid if any driver supports csi (#13446 ) * csi_hook: valid if any driver supports csi volumes	2022-06-22 10:43:43 -04:00
Derek Strickland	9de4d7367c	cli: fix detach handling (#13405 ) Fix detach handling for: - `deployment fail` - `deployment promote` - `deployment resume` - `deployment unblock` - `job promote`	2022-06-21 06:01:23 -04:00
Jeffrey Clark	a97699221c	cni: add loopback to linux bridge (#13428 ) CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. https://github.com/containernetworking/cni/pull/121 Fixes #10014	2022-06-20 11:22:53 -04:00
James Rasell	f1f7c5040b	api: added sysbatch job type constant to match other schedulers. (#13359 )	2022-06-16 11:53:04 +02:00
Joseph Martin	4aa96d5bfc	Return evalID if `-detach` flag is passed to job revert (#13364 ) * Return evalID if `-detach` flag is passed to job revert	2022-06-15 14:20:29 -04:00
Tim Gross	12d87c040c	fixup changelog entry for backported regression fix (#13370 ) The changelog entry for #13340 indicated it was an improvement. But on discussion, it was determined that this was a workaround for a regression. Update the changelog to make this clear.	2022-06-14 14:33:39 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Daniel Rossbach	8c52c03c8c	qemu driver: Add option to configure drive_interface (#11864 )	2022-06-10 10:03:51 -04:00
Luiz Aoqui	e8b788b372	changelog: add entry for #12961 (#13318 )	2022-06-10 09:04:00 -04:00
Tim Gross	9d5523a72d	CSI: skip node unpublish on GC'd or down nodes (#13301 ) If the node has been GC'd or is down, we can't send it a node unpublish. The CSI spec requires that we don't send the controller unpublish before the node unpublish, but in the case where a node is gone we can't know the final fate of the node unpublish step. The `csi_hook` on the client will unpublish if the allocation has stopped and if the host is terminated there's no mount for the volume anyways. So we'll now assume that the node has unpublished at its end. If it hasn't, any controller unpublish will potentially hang or error and need to be retried.	2022-06-09 11:33:22 -04:00
phreakocious	94a78597d2	Add `guest_agent` config option for QEMU driver (#12800 ) Add boolean 'guest_agent' config option for QEMU driver, which will create the socket file for the QEMU Guest Agent in the task dir when enabled.	2022-06-09 09:21:38 -04:00
Derek Strickland	13ea5ae87a	consul-template: Add fault tolerant defaults (#13041 ) consul-template: Add fault tolerant defaults Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-08 14:08:25 -04:00
Luiz Aoqui	2e0bffba90	changelog: add entry for #12925 (#13250 )	2022-06-08 10:14:33 -04:00
Tim Gross	8ff5ea1bee	CSI: no early return when feasibility check fails on eligible nodes (#13274 ) As a performance optimization in the scheduler, feasibility checks that apply to an entire class are only checked once for all nodes of that class. Other feasibility checks are "available" checks because they rely on more ephemeral characteristics and don't contribute to the hash for the node class. This currently includes only CSI. We have a separate fast path for "available" checks when the node has already been marked eligible on the basis of class. This fast path has a bug where it returns early rather than continuing the loop. This causes the entire task group to be rejected. Fix the bug by not returning early in the fast path and instead jump to the top of the loop like all the other code paths in this method. Includes a new test exercising topology at whole-scheduler level and a fix for an existing test that should've caught this previously.	2022-06-07 13:31:10 -04:00
Derek Strickland	12f3ee46ea	alloc_runner: stop sidecar tasks last (#13055 ) alloc_runner: stop sidecar tasks last	2022-06-07 11:35:19 -04:00
Tim Gross	81c70f4973	changelog entry for #12534 (#13260 )	2022-06-06 16:19:17 -04:00
Conor Evans	86116a7607	add filebase64 function (#11791 ) Signed-off-by: Conor Evans <coevans@tcd.ie>	2022-06-06 11:58:17 -04:00
Lance Haig	4bf27d743d	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	7d15157635	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Seth Hoenig	45e8748658	Merge pull request #13205 from hashicorp/b-batch-preempt2 core: reschedule evicted batch job when resources become available	2022-06-02 16:32:01 -05:00
Shantanu Gadgil	6cb8c95534	fingerprint kernel architecture name (#13182 )	2022-06-02 15:51:00 -04:00
Seth Hoenig	0692190e12	core: reschedule evicted batch job when resources become available This PR fixes a bug where an evicted batch job would not be rescheduled once resources become available. Closes #9890	2022-06-02 14:04:13 -05:00
Seth Hoenig	54efec5dfe	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Seth Hoenig	4631045d83	connect: enable setting connect upstream destination namespace	2022-05-26 09:39:36 -05:00
Seth Hoenig	f7c0e078a9	build: update golang version to 1.18.2 This PR update to Go 1.18.2. Also update the versions of hclfmt and go-hclogfmt which includes newer dependencies necessary for dealing with go1.18. The hcl v2 branch is now 'nomad-v2.9.1+tweaks2', to include a fix for newer macOS versions: `8927e75e82`	2022-05-25 10:04:04 -05:00
Luiz Aoqui	769ff1dcc3	Merge pull request #13109 from hashicorp/merge-release-1.3.1-branch Merge release 1.3.1 branch	2022-05-25 10:45:09 -04:00
Seth Hoenig	20b6bf3c22	Merge pull request #13104 from hashicorp/b-blocked-eval-math core: fix blocked eval math	2022-05-24 16:23:06 -05:00
Michael Schurter	2965dc6a1a	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Seth Hoenig	83bab8ed64	Merge pull request #13058 from hashicorp/b-cgroupsv1-docker-cgparent drivers/docker: do not set cgroup parent in v1 mode	2022-05-24 14:07:40 -05:00
Seth Hoenig	c6c3ae020d	drivers/docker: do not set cgroup parent in v1 mode This PR fixes a bug where the CgroupParent on the docker HostConfig struct was accidently being set when running in cgroups v1 mode.	2022-05-24 11:22:50 -05:00
Seth Hoenig	27d0c0dc9f	docs: add changelog	2022-05-24 09:13:15 -05:00
Will Jordan	d515e5c3b0	Don't buffer json logs on agent startup (#13076 ) There's no reason to buffer json logs on agent startup since logs in this format already aren't reordered.	2022-05-19 15:40:30 -04:00
Seth Hoenig	fc58f4972c	cli: correctly use and validate job with vault token set This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN', '-vault-namespace' if set.	2022-05-19 12:13:34 -05:00
Tim Gross	b72ff42ada	api: include Consul token in job revert API (#13065 )	2022-05-19 11:30:29 -04:00
Seth Hoenig	29d3da6dfd	cl: update changelog	2022-05-17 10:35:08 -05:00
Seth Hoenig	26b5c01431	Merge pull request #12817 from twunderlich-grapl/fix-network-interpolation Fix network.dns interpolation	2022-05-17 09:31:32 -05:00
Seth Hoenig	08becb117c	cl: add changelog note for network interpolation	2022-05-17 09:14:55 -05:00
Phil Renaud	45dc1cfd58	12986 UI fails to load job when there is an "@" in job name in nomad 130 (#13012 ) * LastIndexOf and always append a namespace on job links * Confirmed the volume equivalent and simplified idWIthNamespace logic * Changelog added * PR comments addressed * Drop the redirect for the time being * Tests updated to reflect namespace on links * Task detail test default namespace link for test	2022-05-13 17:01:27 -04:00
Tim Gross	faeb3fcd44	scheduler: volume updates should always be destructive (#13008 )	2022-05-13 11:34:04 -04:00
James Rasell	636b647a30	agent: fix panic when logging about protocol version config use. (#12962 ) The log line comes before the agent logger has been setup, therefore we need to use the UI logging to avoid panic.	2022-05-13 09:28:43 +02:00
Phil Renaud	dd824ac3f8	Changelog for visual diff tests (#12909 )	2022-05-06 11:29:10 -04:00
Phil Renaud	6a8f98723e	Chronological most-recent evals by default (#12847 ) * Chronological most-recent evals by default * Adding reverse: true to the list of expected queryparams in test * changelog	2022-05-05 16:11:27 -04:00
Jai	316daf581e	fix broken link to `task-group` in `Recent Allocation` table in `jobs.job.index` (#12765 ) * chore: run prettier on hbs files * ui: ensure to pass a real job object to task-group link * chore: add changelog entry * chore: prettify template * ui: template helper for formatting jobId in LinkTo component * ui: handle async relationship * ui: pass in job id to model arg instead of job model * update test for serialized namespace * ui: defend against null in tests * ui: prettified template added whitespace * ui: rollback ember-data to 3.24 because watcher return undefined on abort * ui: use format-job-helper instead of job model via alloc * ui: fix whitespace in template caused by prettier using template helper * ui: update test for new namespace * ui: revert prettier change Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-04-28 14:02:15 -04:00
Dave May	97cf204c00	debug: add version constraint to avoid pprof panic (#12807 )	2022-04-28 13:18:55 -04:00
Tim Gross	c763c4cb96	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Michael Schurter	e2544dd089	client: fix waiting on preempted alloc (#12779 ) Fixes #10200 The bug A user reported receiving the following error when an alloc was placed that needed to preempt existing allocs: ``` [ERROR] client.alloc_watcher: error querying previous alloc: alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup failed: index error: UUID must be 36 characters" ``` The previous alloc (8e) was already complete on the client. This is possible if an alloc stops after the scheduling decision was made to preempt it, but before the node running both allocations was able to pull and start the preemptor. While that is hopefully a narrow window of time, you can expect it to occur in high throughput batch scheduling heavy systems. However the RPC error made no sense! `previous_alloc` in the logs was a valid 36 character UUID! The fix The fix is: ``` - prevAllocID: c.Alloc.PreviousAllocation, + prevAllocID: watchedAllocID, ``` The alloc watcher new func used for preemption improperly referenced Alloc.PreviousAllocation instead of the passed in watchedAllocID. When multiple allocs are preempted, a watcher is created for each with watchedAllocID set properly by the caller. In this case Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters` error was coming from! Sadly we were properly referencing watchedAllocID in the log, so it made the error make no sense! The repro I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl) and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl) for ease of repro. First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad), then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad) that evicts 2 low priority jobs. Everything worked as expected. However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147), it reproduces the bug because the watcher references PreviousAllocation instead of watchedAllocID.	2022-04-26 13:14:43 -07:00
Michael Schurter	6449ba8d41	api: add ParseHCLOpts helper method (#12777 ) The existing ParseHCL func didn't allow setting HCLv1=true.	2022-04-25 11:51:52 -07:00
Tim Gross	b2e4841747	CSI: plugin config updates should always be destructive (#12774 )	2022-04-25 12:59:25 -04:00
Tim Gross	766025cde7	CSI: plugin supervisor prestart should not mark itself done (#12752 ) The task runner hook `Prestart` response object includes a `Done` field that's intended to tell the client not to run the hook again. The plugin supervisor creates mount points for the task during prestart and saves these mounts in the hook resources. But if a client restarts the hook resources will not be populated. If the plugin task restarts at any time after the client restarts, it will fail to have the correct mounts and crash loop until restart attempts run out. Fix this by not returning `Done` in the response, just as we do for the `volume_mount_hook`.	2022-04-22 13:07:47 -04:00
James Rasell	24b499791d	deps: update consul-template to v0.29.0 (#12747 ) * deps: update consul-template to v0.29.0 * changelog: add entry for #12747	2022-04-22 09:58:54 -07:00

1 2 3 4 5 ...

402 commits