open-nomad

Commit Graph

Author	SHA1	Message	Date
Bryce Kalow	a84d2de9be	website: content updates for developer (#14473 ) Co-authored-by: Geoffrey Grosenbach <26+topfunky@users.noreply.github.com> Co-authored-by: Anthony <russo555@gmail.com> Co-authored-by: Ashlee Boyer <ashlee.boyer@hashicorp.com> Co-authored-by: Ashlee M Boyer <43934258+ashleemboyer@users.noreply.github.com> Co-authored-by: HashiBot <62622282+hashibot-web@users.noreply.github.com> Co-authored-by: Kevin Wang <kwangsan@gmail.com>	2022-09-16 10:38:39 -05:00
Mahmood Ali	a9d5e4c510	scheduler: stopped-yet-running allocs are still running (#10446 ) * scheduler: stopped-yet-running allocs are still running * scheduler: test new stopped-but-running logic * test: assert nonoverlapping alloc behavior Also add a simpler Wait test helper to improve line numbers and save few lines of code. * docs: tried my best to describe #10446 it's not concise... feedback welcome * scheduler: fix test that allowed overlapping allocs * devices: only free devices when ClientStatus is terminal * test: output nicer failure message if err==nil Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-09-13 12:52:47 -07:00
Tim Gross	9636b0f837	docs: tweak some copy in the concept docs (#14566 )	2022-09-13 13:21:09 -04:00
Seth Hoenig	afc815c0c7	Merge pull request #14559 from hashicorp/docs-nsd-check-watcher docs: add documentation for nomad service check restarts	2022-09-13 10:52:01 -05:00
Ashlee M Boyer	fc973ebe0e	docs: Fixing heading order, adding text for links in /docs/ecosystem (#14549 ) * Fixing heading order, adding text for links * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com> * Applying more suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-09-13 10:59:02 -04:00
Seth Hoenig	5b661ec84d	docs: update docs for NSD check restart	2022-09-13 09:59:02 -05:00
Tim Gross	357e7f4521	docs: include path in ACL requirements for variables (#14561 ) Also add links to the ACL policy reference and variables concepts docs near the top of the page.	2022-09-13 10:21:29 -04:00
Charlie Voiselle	8eb1689fca	Variables CLI documentation (#14249 )	2022-09-12 16:44:31 -04:00
Tim Gross	14b536ee86	docs: update `template` for Nomad Variables (#14527 )	2022-09-12 16:36:18 -04:00
Tim Gross	9259a373cd	remove root keyring install API (#14514 ) * keyring rotate API should require put/post method * remove keyring install API	2022-09-09 08:50:35 -04:00
James Rasell	813c5daa96	hcl2: add strlen function and update docs. (#14463 )	2022-09-06 18:42:40 +02:00
Luiz Aoqui	1ae26981a0	connect: interpolate task env in config values (#14445 ) When configuring Consul Service Mesh, it's sometimes necessary to provide dynamic value that are only known to Nomad at runtime. By interpolating configuration values (in addition to configuration keys), user are able to pass these dynamic values to Consul from their Nomad jobs.	2022-09-02 15:00:28 -04:00
Luiz Aoqui	99bddfe04d	docs: add warning about changing region config (#14443 )	2022-09-01 16:47:06 -04:00
Luiz Aoqui	94d7dddccd	cli: set -hcl2-strict to false if -hcl1 is defined (#14426 ) These options are mutually exclusive but, since `-hcl2-strict` defaults to `true` users had to explicitily set it to `false` when using `-hcl1`. Also return `255` when job plan fails validation as this is the expected code in this situation.	2022-09-01 10:42:08 -04:00
Tim Gross	0ef073a669	docs: clarify CSI plugin compatibility (#14434 ) Nomad is generally compliant with the CSI specification for Container Orchestrators (CO), except for unimplemented features. However, some storage vendors have built CSI plugins that are not compliant with the specification or which expect that they're only deployed on Kubernetes. Nomad cannot vouch for the compatibility of any particular plugin, so clarify this in the docs. Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-09-01 10:06:44 -04:00
Brett Larson	9912dfd1e6	Update ephemeral_disk.mdx (#14356 ) It is really unclear on how to use this feature. it took me a while to find this, so I thought I would purpose how to use this.	2022-08-31 20:17:41 -04:00
James Rasell	986355bcd9	docs: add documentation for ACL token expiration and ACL roles. (#14332 ) The ACL command docs are now found within a sub-dir like the operator command docs. Updates to the ACL token commands to accommodate token expiry have also been added. The ACL API docs are now found within a sub-dir like the operator API docs. The ACL docs now include the ACL roles endpoint as well as updated ACL token endpoints for token expiration. The configuration section is also updated to accommodate the new ACL and server parameters for the new ACL features.	2022-08-31 16:13:47 +02:00
Tim Gross	c9d678a91a	keyring: wrap root key in key encryption key (#14388 ) Update the on-disk format for the root key so that it's wrapped with a unique per-key/per-server key encryption key. This is a bit of security theatre for the current implementation, but it uses `go-kms-wrapping` as the interface for wrapping the key. This provides a shim for future support of external KMS such as cloud provider APIs or Vault transit encryption. * Removes the JSON serialization extension we had on the `RootKey` struct; this struct is now only used for key replication and not for disk serialization, so we don't need this helper. * Creates a helper for generating cryptographically random slices of bytes that properly accounts for short reads from the source. * No observable functional changes outside of the on-disk format, so there are no test updates.	2022-08-30 10:59:25 -04:00
Tim Gross	37905d94b7	docs: fixing a few more places we missed "secure" during rename (#14395 )	2022-08-30 10:08:50 -04:00
quoing	ce7a3745d5	docs: template change script example correction (#14368 ) "path" parameter doesn't work, should be command	2022-08-30 12:09:55 +02:00
Tim Gross	d7652fdd3a	docs: rename Secure Variables to Variables (#14352 )	2022-08-29 11:37:08 -04:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Piotr Kazmierczak	077b6e7098	docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212 ) This PR documents a change made in the enterprise version of nomad that addresses the following issue: When a user tries to filter audit logs, they do so with a stanza that looks like the following: audit { enabled = true filter "remove deletes" { type = "HTTPEvent" endpoints = ["*"] stages = ["OperationComplete"] operations = ["DELETE"] } } When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered. When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered. When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered. The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".	2022-08-24 16:31:49 +02:00
Tim Gross	afb9fe6a4e	docs: fix an anchor link in secure vars docs (#14231 )	2022-08-23 10:46:24 -04:00
Seth Hoenig	b5427a9f3b	Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd docs: update check documentation with NSD specifics	2022-08-23 09:23:53 -05:00
Seth Hoenig	fb82f11e70	docs: fix checks doc typo Co-authored-by: Piotr Kazmierczak <phk@mm.st>	2022-08-23 09:23:36 -05:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
Seth Hoenig	ea6d010790	docs: update check documentation with NSD specifics This PR updates the checks documentation to mention support for checks when using the Nomad service provider. There are limitations of NSD compared to Consul, and those configuration options are now noted as being Consul-only.	2022-08-22 10:50:26 -05:00
Tim Gross	a4e89d72a8	secure vars: filter by path in List RPCs (#14036 ) The List RPCs only checked the ACL for the Prefix argument of the request. Add an ACL filter to the paginator for the List RPC. Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a "deny" capability so that operators can deny specific paths or prefixes below an allowed path.	2022-08-15 11:38:20 -04:00
Seth Hoenig	394aebfbd9	Merge pull request #14088 from hashicorp/b-plan-vault-token cli: support vault token in plan command	2022-08-12 09:05:34 -05:00
Seth Hoenig	1224fdf60d	Merge pull request #14089 from hashicorp/f-docker-disable-healthchecks docker: configuration for disable docker healthcheck	2022-08-12 09:00:31 -05:00
James Rasell	f6a5961a20	docs: correctly state RPC port is used by servers and clients. (#14091 )	2022-08-12 10:14:14 +02:00
Seth Hoenig	dc761aa7ec	docker: create a docker task config setting for disable built-in healthcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068	2022-08-11 10:33:48 -05:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
dgotlieb	7fbc8baaeb	doc typo fix docker and podman don't suck 🤣	2022-08-10 15:04:07 +03:00
Charlie Voiselle	9a19279f59	Sweep of docs for repeated words; minor edits (#14032 )	2022-08-05 16:45:30 -04:00
Luiz Aoqui	9affe31a0f	qemu: reduce monitor socket path (#13971 ) The QEMU driver can take an optional `graceful_shutdown` configuration which will create a Unix socket to send ACPI shutdown signal to the VM. Unix sockets have a hard length limit and the driver implementation assumed that QEMU versions 2.10.1 were able to handle longer paths. This is not correct, the linked QEMU fix only changed the behaviour from silently truncating longer socket paths to throwing an error. By validating the socket path before starting the QEMU machine we can provide users a more actionable and meaningful error message, and by using a shorter socket file name we leave a bit more room for user-defined values in the path, such as the task name. The maximum length allowed is also platform-dependant, so validation needs to be different for each OS.	2022-08-04 12:10:35 -04:00
Luiz Aoqui	e3d78c343c	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Luiz Aoqui	8f05a55def	docs: remove link to HCL2 `timestamp` function (#13999 ) The `timestamp` HCL2 function was never part of the set of supported functions.	2022-08-04 10:07:51 -04:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
Tim Gross	e025afdf87	docs: concepts for secure variables and workload identity (#13764 ) Includes concept docs for secure variables, concept docs for workload identity, and an operations docs for keyring management.	2022-08-02 10:06:26 -04:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
asymmetric	b89718d70e	Update filesystem.mdx (#13738 ) fix alloc working directory path	2022-07-25 10:25:48 -04:00
Scott Holodak	12ef89a61a	docs: fix placement for `scaling` and `csi_plugin` (#13892 )	2022-07-25 10:06:59 -04:00
Michael Schurter	0d1c9a53a4	docs: clarify submit-job allows stopping (#13871 )	2022-07-21 10:18:57 -07:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Tim Gross	49ad3dc3ba	docs: document secure variables server config options (#13695 )	2022-07-20 14:13:39 -04:00

1 2 3 4 5 ...

520 Commits