open-nomad

Author	SHA1	Message	Date
Tim Gross	04e049caed	license: show Terminated field in `license get` command (#16892 )	2023-04-17 09:01:43 -04:00
Tim Gross	62548616d4	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
Phil Renaud	99185e2d8f	[ui, compliance] Remove the newline after .hbs copyright headers (#16861 ) * Remove the newline after .hbs copyright headers * Trying with the whitespace control char	2023-04-14 13:08:13 -04:00
Tim Gross	5a9abdc469	drain: use client status to determine drain is complete (#14348 ) If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`, the node drain is marked as complete prematurely, even though drain monitoring will continue to report allocation migrations. This impacts the UI or API clients that monitor node draining to shut down nodes. This changeset updates the behavior to wait until the client status of all drained allocs are terminal before marking the node as done draining.	2023-04-13 08:55:28 -04:00
Michael Schurter	79c521e570	docs: add node meta command docs (#16828 ) * docs: add node meta command docs Fixes #16758 * it helps if you actually add the files to git * fix typos and examples vs usage	2023-04-12 15:29:33 -07:00
Shawn	007b534020	fix: typo (#16873 )	2023-04-12 16:18:13 -04:00
Tim Gross	4df2d9bda8	E2E: clarify drain `-deadline` and `-force` flag behaviors (#16868 ) The `-deadline` and `-force` flag for the `nomad node drain` command only cause the draining to ignore the `migrate` block's healthy deadline, max parallel, etc. These flags don't have anything to do with the `kill_timeout` or `shutdown_delay` options of the jobspec. This changeset fixes the skipped E2E tests so that they validate the intended behavior, and updates the docs for more clarity.	2023-04-12 15:27:24 -04:00
Seth Hoenig	ec1a8ae12a	deps: update docker to 23.0.3 (#16862 ) * [no ci] deps: update docker to 23.0.3 This PR brings our docker/docker dependency (which is hosted at github.com/moby/moby) up to 23.0.3 (forward about 2 years). Refactored our use of docker/libnetwork to reference the package in its new home, which is docker/docker/libnetwork (it is no longer an independent repository). Some minor nearby test case cleanup as well. * add cl	2023-04-12 14:13:36 -05:00
Seth Hoenig	dbb6edd96d	e2e: add e2e tests for job submission api (#16841 ) * e2e: add e2e tests for job submission api * e2e: fixup callers of AllocLogs * fix typo	2023-04-12 08:36:17 -05:00
James Rasell	b7a41fe48d	core: ensure all Server receiver names are consistent. (#16859 )	2023-04-12 14:03:07 +01:00
Juana De La Cuesta	8302085384	Deployment Status Command Does Not Respect -namespace Wildcard (#16792 ) * func: add namespace support for list deployment * func: add wildcard to namespace filter for deployments * Update deployment_endpoint.go * style: use must instead of require or asseert * style: rename paginator to avoid clash with import * style: add changelog entry * fix: add missing parameter for upsert jobs	2023-04-12 11:02:14 +02:00
Seth Hoenig	f615bd92e5	deps: bump build and linter tool versions (#16858 ) This just helps keep up with the podman driver repo.	2023-04-11 17:01:14 -05:00
Tim Gross	657ae6f7d2	docs: document signal handling (#16835 ) Expand documentation about Nomad's signal handling behaviors, including removing incorrect information about graceful client shutdowns.	2023-04-11 16:26:39 -04:00
Tim Gross	a9a350cfdb	drainer: fix codec race condition in integration test (#16845 ) msgpackrpc codec handles are specific to a connection and cannot be shared between goroutines; this can cause corrupted decoding. Fix the drainer integration test so that we create separate codecs for the goroutines that the test helper spins up to simulate client updates. This changeset also refactors the drainer integration test to bring it up to current idioms and library usages, make assertions more clear, and reduce duplication.	2023-04-11 14:31:13 -04:00
James Rasell	bc01d47071	consul/connect: fixed a bug where restarting proxy tasks failed. (#16815 ) The first start of a Consul Connect proxy sidecar triggers a run of the envoy_version hook which modifies the task config image entry. The modification takes into account a number of factors to correctly populate this. Importantly, once the hook has run, it marks itself as done so the taskrunner will not execute it again. When the client receives a non-destructive update for the allocation which the proxy sidecar is a member of, it will update and overwrite the task definition within the taskerunner. In doing so it overwrite the modification performed by the hook. If the allocation is restarted, the envoy_version hook will be skipped as it previously marked itself as done, and therefore the sidecar config image is incorrect and causes a driver error. The fix removes the hook in marking itself as done to the view of the taskrunner.	2023-04-11 15:56:03 +01:00
Seth Hoenig	ba728f8f97	api: enable support for setting original job source (#16763 ) * api: enable support for setting original source alongside job This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped. * api: avoid writing var content to disk for parsing * api: move submission validation into RPC layer * api: return an error if updating a job submission without namespace or job id * api: be exact about the job index we associate a submission with (modify) * api: reword api docs scheduling * api: prune all but the last 6 job submissions * api: protect against nil job submission in job validation * api: set max job source size in test server * api: fixups from pr	2023-04-11 08:45:08 -05:00
Piotr Kazmierczak	dea8b1a093	acl: bump JWT auth gate to 1.5.4 (#16838 )	2023-04-11 10:07:45 +02:00
Michael Schurter	a407cbb626	Merge pull request #16836 from hashicorp/compliance/add-headers [COMPLIANCE] Add Copyright and License Headers	2023-04-10 16:32:03 -07:00
Daniel Bennett	fa33ee567a	gracefully recover tasks that use csi node plugins (#16809 ) new WaitForPlugin() called during csiHook.Prerun, so that on startup, clients can recover running tasks that use CSI volumes, instead of them being terminated and rescheduled because they need a node plugin that is "not found" yet, only because the plugin task has not yet been recovered.	2023-04-10 17:15:33 -05:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	1335543731	ephemeral disk: `migrate` should imply `sticky` (#16826 ) The `ephemeral_disk` block's `migrate` field allows for best-effort migration of the ephemeral disk data to new nodes. The documentation says the `migrate` field is only respected if `sticky=true`, but in fact if client ACLs are not set the data is migrated even if `sticky=false`. The existing behavior when client ACLs are disabled has existed since the early implementation, so "fixing" that case now would silently break backwards compatibility. Additionally, having `migrate` not imply `sticky` seems nonsensical: it suggests that if we place on a new node we migrate the data but if we place on the same node, we throw the data away! Update so that `migrate=true` implies `sticky=true` as follows: * The failure mode when client ACLs are enabled comes from the server not passing along a migration token. Update the server so that the server provides a migration token whenever `migrate=true` and not just when `sticky=true` too. * Update the scheduler so that `migrate` implies `sticky`. * Update the client so that we check for `migrate \|\| sticky` where appropriate. * Refactor the E2E tests to move them off the old framework and make the intention of the test more clear.	2023-04-07 16:33:45 -04:00
Tim Gross	e7eae66cf1	E2E: update subset of node drain tests off the old framework (#16823 ) While working on several open drain issues, I'm fixing up the E2E tests. This subset of tests being refactored are existing ones that already work. I'm shipping these as their own PR to keep review sizes manageable when I push up PRs in the next few days for #9902, #12314, and #12915.	2023-04-07 09:17:19 -04:00
Michael Schurter	a8b379f962	docker: default device.container_path to host_path (#16811 ) * docker: default device.container_path to host_path Matches docker cli behavior. Fixes #16754	2023-04-06 14:44:33 -07:00
Seth Hoenig	4b7cd0a651	e2e/acl: export ACL resource Cleanup helpers (#16822 ) The e2e/acl package has some nice helpers for tracking and cleaning up ACL objects, but they are currently private. Export them so I can abuse them in other e2e tests.	2023-04-06 14:35:22 -05:00
Charlie Voiselle	9dfe4aa7c0	Set RequireRoot to be a test helper. (#16641 )	2023-04-06 14:34:36 -04:00
Seth Hoenig	d11fe234e4	e2e: swap assert for test package in e2eutil/jobs.go (#16820 )	2023-04-06 10:02:27 -05:00
James Rasell	15eee99db4	client: ensure envoy version hook uses all pointer receiver funcs. (#16813 )	2023-04-06 14:47:00 +01:00
Tim Gross	09c19fa44a	E2E: test enforcement of ACL system (#16796 ) This changeset provides a matrix test of ACL enforcement across several dimensions: * anonymous vs bogus vs valid tokens * permitted vs not permitted by policy * request sent to server vs sent to client (and forwarded)	2023-04-06 09:11:20 -04:00
Tim Gross	e117ff3877	docs: remove reference to vSphere from CSI concepts docs (#16765 ) The vSphere plugin is exclusive to k8s because it relies on k8s-APIs (and crashes without them being present). Upstream unfortunately will not support Nomad, so we shouldn't refer to it in our concept docs here.	2023-04-05 15:20:24 -04:00
Tim Gross	602a2f70dc	agent: add top-level warning if mTLS is not configured (#16800 ) Nomad's security model requires mTLS in order to secure client-to-server and server-to-server communications. Configuring ACLs alone is not enough. Loudly warn the user if mTLS is not configured in non-dev modes.	2023-04-05 14:43:45 -04:00
Tim Gross	6f2b9266bc	Merge pull request #16794 from hashicorp/post-1.5.3-release Post 1.5.3 release	2023-04-05 13:02:37 -04:00
the-nando	f541f2e59b	Do not set attributes when spawning the getter child (#16791 ) * Do not set attributes when spawning the getter child * Cleanup * Cleanup --------- Co-authored-by: the-nando <the-nando@invalid.local>	2023-04-05 11:47:51 -05:00
Seth Hoenig	378c3582ff	deps: update go-m1cpu with fix for BigSur users (#16793 ) https://github.com/shoenig/go-m1cpu/releases/tag/v0.1.5 which may also help with the Nix package for Nomad, which makes use of the older IOKit framework version https://github.com/shoenig/go-m1cpu/issues/5	2023-04-05 11:44:31 -05:00
Tim Gross	c3d9c598f5	Merge release 1.5.3 files	2023-04-05 12:32:00 -04:00
hc-github-team-nomad-core	3578078caf	Prepare for next release	2023-04-05 12:31:42 -04:00
hc-github-team-nomad-core	b64ee2726d	Generate files for 1.5.3 release	2023-04-05 12:31:30 -04:00
Tim Gross	66a01bb35a	upgrade go to 1.20.3	2023-04-05 12:18:19 -04:00
Tim Gross	8278f23042	acl: fix ACL bypass for anon requests that pass thru client HTTP Requests without an ACL token that pass thru the client's HTTP API are treated as though they come from the client itself. This allows bypass of ACLs on RPC requests where ACL permissions are checked (like `Job.Register`). Invalid tokens are correctly rejected. Fix the bypass by only setting a client ID on the identity if we have a valid node secret. Note that this changeset will break rate metrics for RPCs sent by clients without a client secret such as `Node.GetClientAllocs`; these requests will be recorded as anonymous. Future work should: * Ensure the node secret is sent with all client-driven RPCs except `Node.Register` which is TOFU. * Create a new `acl.ACL` object from client requests so that we can enforce ACLs for all endpoints in a uniform way that's less error-prone.~	2023-04-05 12:17:51 -04:00
Juana De La Cuesta	9b4871fece	Prevent kill_timeout greater than progress_deadline (#16761 ) * func: add validation for kill timeout smaller than progress dealine * style: add changelog * style: typo in changelog * style: remove refactored test * Update .changelog/16761.txt Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/structs/structs.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-04-04 18:17:10 +02:00
Seth Hoenig	15a2d912b3	cleanup: use jobID name rather than jobName in job endpoints (#16777 ) These endpoints all refer to JobID by the time you get to the RPC request layer, but the HTTP handler functions call the field JobName, which is a different field (... though often with the same value).	2023-04-04 09:11:58 -05:00
James Rasell	bcfb4ea1f2	cli: fix up failing quota inspect enterprise test. (#16781 )	2023-04-04 15:02:40 +01:00
James Rasell	cb6ba80f0f	cli: stream both stdout and stderr when following an alloc. (#16556 ) This update changes the behaviour when following logs from an allocation, so that both stdout and stderr files streamed when the operator supplies the follow flag. The previous behaviour is held when all other flags and situations are provided. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-04 10:42:27 +01:00
Mike Nomitch	b5a1051fe6	Merge pull request #16575 from hashicorp/docs-add-roadmap-project Adds public roadmap project to readme	2023-04-03 08:21:13 -07:00
Tim Gross	118b703164	CSI: set mounts in alloc hook resources atomically (#16722 ) The allocrunner has a facility for passing data written by allocrunner hooks to taskrunner hooks. Currently the only consumers of this facility are the allocrunner CSI hook (which writes data) and the taskrunner volume hook (which reads that same data). The allocrunner hook for CSI volumes doesn't set the alloc hook resources atomically. Instead, it gets the current resources and then writes a new version back. Because the CSI hook is currently the only writer and all readers happen long afterwards, this should be safe but #16623 shows there's some sequence of events during restore where this breaks down. Refactor hook resources so that hook data is accessed via setters and getters that hold the mutex.	2023-04-03 11:03:36 -04:00
Tim Gross	0c582a2c94	docs: fix use of gpg to avoid teeing binary to terminal (#16767 )	2023-04-03 10:54:21 -04:00
Tim Gross	ffd5435ceb	docs: fix install instructions for apt (#16764 ) The workflow described in the docs for apt installation is deprecated. Update to match the workflow described in the Tutorials and official packaging guide.	2023-04-03 10:06:59 -04:00
Georgy Buranov	ca80546ef7	take maximum processor Mhz (#16740 ) * take maximum processor Mhz * remove break * cl: add cl for 16740 --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-03-31 11:25:32 -05:00
Juana De La Cuesta	89baa13b14	Update quota name on failing test for quota status (#16662 ) * fix: update quota name on test * Update quota_status_test.go * Update quota_status_test.go * fix: simplify template call for quota status	2023-03-31 18:07:21 +02:00
Juana De La Cuesta	1fc13b83d8	style: update documentation (#16729 )	2023-03-31 16:38:16 +02:00
Daniel Bennett	c9adc22eec	Update enterprise licensing documentation (#16615 ) updated various docs for new expiration behavior and new command `nomad license inspect` to validate pre-upgrade	2023-03-30 16:40:19 -05:00

... 2 3 4 5 6 ...

24673 commits