open-nomad

Author	SHA1	Message	Date
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Piotr Kazmierczak	077b6e7098	docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212 ) This PR documents a change made in the enterprise version of nomad that addresses the following issue: When a user tries to filter audit logs, they do so with a stanza that looks like the following: audit { enabled = true filter "remove deletes" { type = "HTTPEvent" endpoints = ["*"] stages = ["OperationComplete"] operations = ["DELETE"] } } When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered. When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered. When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered. The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".	2022-08-24 16:31:49 +02:00
Tim Gross	afb9fe6a4e	docs: fix an anchor link in secure vars docs (#14231 )	2022-08-23 10:46:24 -04:00
Seth Hoenig	b5427a9f3b	Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd docs: update check documentation with NSD specifics	2022-08-23 09:23:53 -05:00
Seth Hoenig	fb82f11e70	docs: fix checks doc typo Co-authored-by: Piotr Kazmierczak <phk@mm.st>	2022-08-23 09:23:36 -05:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
Seth Hoenig	ea6d010790	docs: update check documentation with NSD specifics This PR updates the checks documentation to mention support for checks when using the Nomad service provider. There are limitations of NSD compared to Consul, and those configuration options are now noted as being Consul-only.	2022-08-22 10:50:26 -05:00
Tim Gross	a4e89d72a8	secure vars: filter by path in List RPCs (#14036 ) The List RPCs only checked the ACL for the Prefix argument of the request. Add an ACL filter to the paginator for the List RPC. Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a "deny" capability so that operators can deny specific paths or prefixes below an allowed path.	2022-08-15 11:38:20 -04:00
Seth Hoenig	394aebfbd9	Merge pull request #14088 from hashicorp/b-plan-vault-token cli: support vault token in plan command	2022-08-12 09:05:34 -05:00
Seth Hoenig	1224fdf60d	Merge pull request #14089 from hashicorp/f-docker-disable-healthchecks docker: configuration for disable docker healthcheck	2022-08-12 09:00:31 -05:00
James Rasell	f6a5961a20	docs: correctly state RPC port is used by servers and clients. (#14091 )	2022-08-12 10:14:14 +02:00
Seth Hoenig	dc761aa7ec	docker: create a docker task config setting for disable built-in healthcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068	2022-08-11 10:33:48 -05:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
dgotlieb	7fbc8baaeb	doc typo fix docker and podman don't suck 🤣	2022-08-10 15:04:07 +03:00
Charlie Voiselle	9a19279f59	Sweep of docs for repeated words; minor edits (#14032 )	2022-08-05 16:45:30 -04:00
Luiz Aoqui	9affe31a0f	qemu: reduce monitor socket path (#13971 ) The QEMU driver can take an optional `graceful_shutdown` configuration which will create a Unix socket to send ACPI shutdown signal to the VM. Unix sockets have a hard length limit and the driver implementation assumed that QEMU versions 2.10.1 were able to handle longer paths. This is not correct, the linked QEMU fix only changed the behaviour from silently truncating longer socket paths to throwing an error. By validating the socket path before starting the QEMU machine we can provide users a more actionable and meaningful error message, and by using a shorter socket file name we leave a bit more room for user-defined values in the path, such as the task name. The maximum length allowed is also platform-dependant, so validation needs to be different for each OS.	2022-08-04 12:10:35 -04:00
Luiz Aoqui	e3d78c343c	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Luiz Aoqui	8f05a55def	docs: remove link to HCL2 `timestamp` function (#13999 ) The `timestamp` HCL2 function was never part of the set of supported functions.	2022-08-04 10:07:51 -04:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
Tim Gross	e025afdf87	docs: concepts for secure variables and workload identity (#13764 ) Includes concept docs for secure variables, concept docs for workload identity, and an operations docs for keyring management.	2022-08-02 10:06:26 -04:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
asymmetric	b89718d70e	Update filesystem.mdx (#13738 ) fix alloc working directory path	2022-07-25 10:25:48 -04:00
Scott Holodak	12ef89a61a	docs: fix placement for `scaling` and `csi_plugin` (#13892 )	2022-07-25 10:06:59 -04:00
Michael Schurter	0d1c9a53a4	docs: clarify submit-job allows stopping (#13871 )	2022-07-21 10:18:57 -07:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Tim Gross	49ad3dc3ba	docs: document secure variables server config options (#13695 )	2022-07-20 14:13:39 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
Niklas Hambüchen	422c83e97a	docs: job-specification: Explain that priority has no effect on run order (#13835 ) Makes the issues from #9845 and #12792 less surprising to the user.	2022-07-19 08:55:29 -04:00
Andy Assareh	e49c021792	word typo digestible (#13772 )	2022-07-19 09:00:52 +02:00
Seth Hoenig	4dea14267d	Merge pull request #13813 from hashicorp/docs-move-checks docs: move checks into own page	2022-07-18 12:27:43 -05:00
Seth Hoenig	4459312541	docs: move checks into own page This PR creates a top-level 'check' page for job-specification docs. The content for checks is about half the content of the service page, and is about to increase in size when we add docs about Nomad service checks. Seemed like a good idea to just split the checks section out into its own thing (e.g. check_restart is already a topic). Doing the move first lets us backport this change without adding Nomad service check stuff yet. Mostly just a lift-and-shift but with some tweaked examples to de-emphasize the use of script checks.	2022-07-18 09:34:55 -05:00
Tim Gross	1e8978ca04	docs: ACL policy spec reference (#13787 ) The "Secure Nomad with Access Control" guide provides a tutorial for bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference guide just for the ACL policy specification.	2022-07-18 09:35:28 -04:00
Michael Schurter	e97548b5f8	Improve metrics reference documentation (#13769 ) * docs: tighten up parameterized job metrics docs * docs: improve alloc status descriptions Remove `nomad.client.allocations.start` as it doesn't exist.	2022-07-15 14:22:39 -07:00
Michael Schurter	5414f49821	docs: clarify blocked_evals metrics (#13751 ) Related to #13740 - blocked_evals.total_blocked is the number of evals blocked for any reason - blocked_evals.total_quota_limit is the number of evals blocked by quota limits, but critically: their resources are not counted in the cpu/memory	2022-07-14 11:32:33 -07:00
Seth Hoenig	3a32220b3b	Merge pull request #13716 from hashicorp/docs-update-consul-warning docs: remove consul 1.12.0 warning	2022-07-14 08:45:56 -05:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Seth Hoenig	a9fa48f3db	docs: remove consul 1.12.0 warning	2022-07-12 09:53:17 -05:00
Tim Gross	fc4cd53cfb	docs: rename Internals to Concepts (#13696 )	2022-07-11 16:55:33 -04:00
Tim Gross	d49ff0175c	docs: move operator subcommands under their own trees (#13677 ) The sidebar navigation tree for the `operator` sub-sub commands is getting cluttered and we have a new set of commands coming to support secure variables keyring as well. Move these all under their own subtrees.	2022-07-11 14:00:24 -04:00
Seth Hoenig	ed2f2b1a75	docs: move upgrade docs for max_client_timeout Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-07 16:46:26 -05:00
Seth Hoenig	905e673553	docs: upgrade guide for client max_kill_timeout	2022-07-07 15:27:40 -05:00
Luiz Aoqui	03433dd8af	cli: improve output of eval commands (#13581 ) Use the same output format when listing multiple evals in the `eval list` command and when `eval status <prefix>` matches more than one eval. Include the eval namespace in all output formats and always include the job ID in `eval status` since, even `node-update` evals are related to a job. Add Node ID to the evals table output to help differentiate `node-update` evals. Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 13:13:34 -04:00
Ted Behling	6a032a54d2	driver/docker: Don't pull InfraImage if it exists (#13265 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 17:44:06 +02:00
Seth Hoenig	b9fe6c8d2c	docs: fixup from cr comments	2022-07-07 08:37:10 -05:00
Seth Hoenig	1c31ef285e	docs: add docs for simple load balancing nomad services This PR adds a section to template docs for simple load balancing with nomad servicse.	2022-07-06 17:34:30 -05:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00

1 2 3 4 5 ...

498 commits