open-nomad

Author	SHA1	Message	Date
Luiz Aoqui	d5aa72190f	node pools: namespace integration (#17562 ) Add structs and fields to support the Nomad Pools Governance Enterprise feature of controlling node pool access via namespaces. Nomad Enterprise allows users to specify a default node pool to be used by jobs that don't specify one. In order to accomplish this, it's necessary to distinguish between a job that explicitly uses the `default` node pool and one that did not specify any. If the `default` node pool is set during job canonicalization it's impossible to do this, so this commit allows a job to have an empty node pool value during registration but sets to `default` at the admission controller mutator. In order to guarantee state consistency the state store validates that the job node pool is set and exists before inserting it.	2023-06-16 16:30:22 -04:00
Tim Gross	3da948d0c8	node pools: support `node.pool` constraint in scheduler (#17548 ) Although most of the time jobs will be assigned to a single node pool, users may want to set the node pool to "all" and then constraint to a subset of node pools. Add support for setting a contraint like `${node.pool}`.	2023-06-16 13:31:46 -04:00
Tim Gross	f411f0c0fb	docs: node pool specification (#17553 )	2023-06-16 10:37:47 -04:00
Tim Gross	df366df1cd	docs: fix broken link in variables spec page (#17554 )	2023-06-15 15:57:00 -04:00
Tim Gross	524183e2b1	docs: add missing `client.allocs` metrics (#17540 ) The docs were missing counter metrics emitted by the task runner around task state changes.	2023-06-15 09:18:11 -04:00
Tim Gross	5b9322c70a	docs: clarify node pool apply/delete behavior (#17529 )	2023-06-14 15:58:53 -04:00
Tim Gross	dc9fae34ca	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Tim Gross	5f509b8ce0	cli: fix missing `-quiet` flag for `var init` (#17526 ) The `var init` command was intended to have support for a `-quiet` flag but it was not documented and never parsed.	2023-06-14 14:52:46 -04:00
Tim Gross	736ad3ed32	docs: note namespace apply/delete behaviors, fix metric (#17527 ) This changeset includes some fixes to documentation discovered while working on node pools, but we didn't want to include in the node pool PRs so they can get backported easily: * namespace apply/delete commands are forwarded to the authoritative region * deleting a namespace requires there are no non-terminal jobs in any of the federated regions * fixed a typo in the name of the `nomad.client.allocated.disk` metric	2023-06-14 14:52:06 -04:00
Tim Gross	c1a01697c8	node pools: implement `node pool init` command (#17479 ) Implement a `nomad node pool init` command that generates an example spec file in either HCL or JSON format.	2023-06-13 14:51:29 -04:00
Luiz Aoqui	bc17cffaef	node pool: node pool upsert on multiregion node register (#17503 ) When registering a node with a new node pool in a non-authoritative region we can't create the node pool because this new pool will not be replicated to other regions. This commit modifies the node registration logic to only allow automatic node pool creation in the authoritative region. In non-authoritative regions, the client is registered, but the node pool is not created. The client is kept in the `initialing` status until its node pool is created in the authoritative region and replicated to the client's region.	2023-06-13 11:28:28 -04:00
Piotr Kazmierczak	57dad0ca07	docs: corrections and additional information for OIDC-related concepts (#17470 )	2023-06-09 16:50:22 +02:00
Luiz Aoqui	5878113c41	node pool: implement `nomad node pool nodes` CLI (#17444 )	2023-06-07 10:37:27 -04:00
Tim Gross	06fc284644	node pools: implement CLI for `node pool jobs` command (#17432 )	2023-06-06 15:02:26 -04:00
Luiz Aoqui	2420c93179	node pools: list nodes in pool (#17413 )	2023-06-06 10:43:43 -04:00
Tim Gross	2d16ec6c6f	node pools: implement RPC to list jobs in a given node pool (#17396 ) Implements the `NodePool.ListJobs` RPC, with pagination and filtering based on the existing `Job.List` RPC.	2023-06-05 15:36:52 -04:00
KamilCuk	cc64281445	Add group_add docker option (#17313 )	2023-06-02 20:26:01 -04:00
Luiz Aoqui	6039c18ab6	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Luiz Aoqui	b770f2b1ef	node pools: implement CLI (#17388 )	2023-06-02 15:49:57 -04:00
Samantha	b92a782b6e	check: Add support for Consul field tls_server_name (#17334 )	2023-06-02 10:19:12 -04:00
Tim Gross	4f14fa0518	node pools: add `node_pool` field to job spec (#17379 ) This changeset only adds the `node_pool` field to the jobspec, and ensures that it gets picked up correctly as a change. Without the rest of the implementation landed yet, the field will be ignored.	2023-06-01 16:08:55 -04:00
Luiz Aoqui	c61e75f302	node pools: add CRUD API (#17384 )	2023-06-01 15:55:49 -04:00
Luiz Aoqui	45b0391378	np: implement ACL for node pools (#17365 )	2023-06-01 13:03:20 -04:00
Seth Hoenig	e04d8cf77b	docs: fixup example of readiness check (#17296 ) A "readiness" check implies a failing healthcheck will not cause the deployment of a service to stop - i.e. it is only used as a liveness probe in the context of service discoverability. Fix our docs example to reflect that a readiness check is created by setting on_update to "ignore" (as opposed to "ignore_warnings").	2023-05-23 15:29:10 -05:00
Tim Gross	b9ca3bc9b1	build: remove 386 builds for Nomad 1.6.0 (#17239 ) The 32-bit Intel builds (aka "386") are not tested and likely have bugs involving platform-sized integers when operated at any non-trivial scale. Remove these builds from the upcoming Nomad 1.6.0 and provide recommendations in the upgrade notes for those users who might have hobbyist boards running 32-bit ARM (this will primarily be the RaspberryPi Zero or older spins of the RaspPi). DO NOT BACKPORT TO 1.5.x OR EARLIER!	2023-05-22 13:27:17 -04:00
Lance Haig	568da5918b	cli: tls certs not created with correct SANs (#16959 ) The `nomad tls cert` command did not create certificates with the correct SANs for them to work with non default domain and region names. This changset updates the code to support non default domains and regions in the certificates.	2023-05-22 09:31:56 -04:00
Tim Gross	9838349c23	document which fields can be updated by `volume register` (#17249 ) The `volume register` command can update a small subset of the volume's fields in-place, with some restrictions depending on whether the volume is currently in use. Document these in the `volume register` command docs and the volume specification docs. Fixes: #17247	2023-05-22 09:15:25 -04:00
Tim Gross	4881f2451a	docs: describe the default Workload Identity ACL policy (#17245 ) Workload Identities have an implicit default policy. This policy can't currently be described via HCL because it includes task interpolation for Variables and access to the Services API (which doesn't exist as its own ACL capbility). Describe this in our WI documentation. Fixes: #16277	2023-05-19 11:38:05 -04:00
Mike Nomitch	6df2160e69	docs: add documentation on ephemeral disk and logs (#15829 )	2023-05-17 16:58:11 -04:00
Roman Zipp	edf83f432a	docs: remove unneeded brackets from job specification template docs (#17219 )	2023-05-17 16:45:00 -04:00
Tim Gross	6814e8e6d9	drivers: make internal `DisableLogCollection` capability public (#17196 ) The `DisableLogCollection` capability was introduced as an experimental interface for the Docker driver in 0.10.4. The interface has been stable and allowing third-party task drivers the same capability would be useful for those drivers that don't need the additional overhead of logmon. This PR only makes the capability public. It doesn't yet add it to the configuration options for the other internal drivers. Fixes: #14636 #15686	2023-05-16 09:16:03 -04:00
Mark Lewis	1729c955d2	Update delete.mdx (#17184 ) Fix typo	2023-05-15 13:31:52 +01:00
Tim Gross	ba269eaf3f	docs: add note to upgrade guide about yanked version (#17115 ) Nomad 1.5.4 shipped with a logmon bug that we rolled out a fix for in Nomad 1.5.5. Unfortunately we can't yank the release but we should leave a note in the upgrade guide telling users to avoid it.	2023-05-08 13:28:45 -04:00
Tim Gross	5f3ff346ea	post release 1.5.5 (#17098 ) * changelog entries for 1.5.5 and missing merge of changelog for 1.5.4, 1.4.9, and 1.3.14 * note on deprecation of `logs.enabled` field	2023-05-05 11:46:08 -04:00
Tim Gross	17bd930ca9	logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087 ) When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076	2023-05-04 16:01:18 -04:00
Seth Hoenig	4347c1d705	docs: move CNI reference plugins installation to CNI overview page (#17068 ) * docs: move CNI reference plugins installation to CNI overview page This PR moves the instruction steps for install the CNI reference plugins from the Consul Mesh integration page to the general Networking CNI page. These plugins are required for bridge networking, not just Consul Mesh, so it makes sense to have them on the general CNI page. Closes #17038 * docs: fix a link to post install steps	2023-05-04 11:32:06 -05:00
James Rasell	50414bba12	docs: update artifact jobspec sshkey example path. (#17077 )	2023-05-04 14:29:36 +01:00
Seth Hoenig	e8d53ea30b	connect: use explicit docker.io prefix in default envoy image names (#17045 ) This PR modifies references to the envoyproxy/envoy docker image to explicitly include the docker.io prefix. This does not affect existing users, but makes things easier for Podman users, who otherwise need to specify the full name because Podman does not default to docker.io	2023-05-02 09:27:48 -05:00
Seth Hoenig	5744b2cd4f	docs: add more notes about artifact breaking changes in 1.5.0 (#17005 ) * changelog: note artifact breaking changes for 1.5.0 * docs: add note about environment variables to artifact job spec docs * Update website/content/docs/job-specification/artifact.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-27 11:41:18 -05:00
Tim Gross	72cbe53f19	logs: allow disabling log collection in jobspec (#16962 ) Some Nomad users ship application logs out-of-band via syslog. For these users having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow disabling the logmon and pointing the task's stdout/stderr to /dev/null. This changeset is the first of several incremental improvements to log collection short of full-on logging plugins. The next step will likely be to extend the internal-only task driver configuration so that cluster administrators can turn off log collection for the entire driver. --- Fixes: #11175 Co-authored-by: Thomas Weber <towe75@googlemail.com>	2023-04-24 10:00:27 -04:00
Tim Gross	b5a54b3b5f	docs: fix keyring path in install docs (#16946 )	2023-04-20 16:20:39 -04:00
Luiz Aoqui	c7387dbd3a	docs: add missing API field JobACL and fix workload identity headers (#16930 )	2023-04-19 08:12:58 -07:00
Chris van Meer	d2f1766f3a	Updates to the UI block (#16328 ) 1. On the Consul address, following the recommendation for the HTTPS API on port 8501. 2. Add the hint to use HEX values for the colors.	2023-04-18 18:28:17 -07:00
Tim Gross	04e049caed	license: show Terminated field in `license get` command (#16892 )	2023-04-17 09:01:43 -04:00
Tim Gross	62548616d4	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
Michael Schurter	79c521e570	docs: add node meta command docs (#16828 ) * docs: add node meta command docs Fixes #16758 * it helps if you actually add the files to git * fix typos and examples vs usage	2023-04-12 15:29:33 -07:00
Tim Gross	4df2d9bda8	E2E: clarify drain `-deadline` and `-force` flag behaviors (#16868 ) The `-deadline` and `-force` flag for the `nomad node drain` command only cause the draining to ignore the `migrate` block's healthy deadline, max parallel, etc. These flags don't have anything to do with the `kill_timeout` or `shutdown_delay` options of the jobspec. This changeset fixes the skipped E2E tests so that they validate the intended behavior, and updates the docs for more clarity.	2023-04-12 15:27:24 -04:00
Tim Gross	657ae6f7d2	docs: document signal handling (#16835 ) Expand documentation about Nomad's signal handling behaviors, including removing incorrect information about graceful client shutdowns.	2023-04-11 16:26:39 -04:00
Seth Hoenig	ba728f8f97	api: enable support for setting original job source (#16763 ) * api: enable support for setting original source alongside job This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped. * api: avoid writing var content to disk for parsing * api: move submission validation into RPC layer * api: return an error if updating a job submission without namespace or job id * api: be exact about the job index we associate a submission with (modify) * api: reword api docs scheduling * api: prune all but the last 6 job submissions * api: protect against nil job submission in job validation * api: set max job source size in test server * api: fixups from pr	2023-04-11 08:45:08 -05:00
Tim Gross	1335543731	ephemeral disk: `migrate` should imply `sticky` (#16826 ) The `ephemeral_disk` block's `migrate` field allows for best-effort migration of the ephemeral disk data to new nodes. The documentation says the `migrate` field is only respected if `sticky=true`, but in fact if client ACLs are not set the data is migrated even if `sticky=false`. The existing behavior when client ACLs are disabled has existed since the early implementation, so "fixing" that case now would silently break backwards compatibility. Additionally, having `migrate` not imply `sticky` seems nonsensical: it suggests that if we place on a new node we migrate the data but if we place on the same node, we throw the data away! Update so that `migrate=true` implies `sticky=true` as follows: * The failure mode when client ACLs are enabled comes from the server not passing along a migration token. Update the server so that the server provides a migration token whenever `migrate=true` and not just when `sticky=true` too. * Update the scheduler so that `migrate` implies `sticky`. * Update the client so that we check for `migrate \|\| sticky` where appropriate. * Refactor the E2E tests to move them off the old framework and make the intention of the test more clear.	2023-04-07 16:33:45 -04:00

1 2 3 4 5 ...

726 commits