open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	9259a373cd	remove root keyring install API (#14514 ) * keyring rotate API should require put/post method * remove keyring install API	2022-09-09 08:50:35 -04:00
Tim Gross	3fc7482ecd	CSI: failed allocation should not block its own controller unpublish (#14484 ) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.	2022-09-08 13:30:05 -04:00
James Rasell	813c5daa96	hcl2: add strlen function and update docs. (#14463 )	2022-09-06 18:42:40 +02:00
Luiz Aoqui	1ae26981a0	connect: interpolate task env in config values (#14445 ) When configuring Consul Service Mesh, it's sometimes necessary to provide dynamic value that are only known to Nomad at runtime. By interpolating configuration values (in addition to configuration keys), user are able to pass these dynamic values to Consul from their Nomad jobs.	2022-09-02 15:00:28 -04:00
Luiz Aoqui	99bddfe04d	docs: add warning about changing region config (#14443 )	2022-09-01 16:47:06 -04:00
Luiz Aoqui	94d7dddccd	cli: set -hcl2-strict to false if -hcl1 is defined (#14426 ) These options are mutually exclusive but, since `-hcl2-strict` defaults to `true` users had to explicitily set it to `false` when using `-hcl1`. Also return `255` when job plan fails validation as this is the expected code in this situation.	2022-09-01 10:42:08 -04:00
Tim Gross	0ef073a669	docs: clarify CSI plugin compatibility (#14434 ) Nomad is generally compliant with the CSI specification for Container Orchestrators (CO), except for unimplemented features. However, some storage vendors have built CSI plugins that are not compliant with the specification or which expect that they're only deployed on Kubernetes. Nomad cannot vouch for the compatibility of any particular plugin, so clarify this in the docs. Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-09-01 10:06:44 -04:00
Brett Larson	9912dfd1e6	Update ephemeral_disk.mdx (#14356 ) It is really unclear on how to use this feature. it took me a while to find this, so I thought I would purpose how to use this.	2022-08-31 20:17:41 -04:00
James Rasell	986355bcd9	docs: add documentation for ACL token expiration and ACL roles. (#14332 ) The ACL command docs are now found within a sub-dir like the operator command docs. Updates to the ACL token commands to accommodate token expiry have also been added. The ACL API docs are now found within a sub-dir like the operator API docs. The ACL docs now include the ACL roles endpoint as well as updated ACL token endpoints for token expiration. The configuration section is also updated to accommodate the new ACL and server parameters for the new ACL features.	2022-08-31 16:13:47 +02:00
Tim Gross	c9d678a91a	keyring: wrap root key in key encryption key (#14388 ) Update the on-disk format for the root key so that it's wrapped with a unique per-key/per-server key encryption key. This is a bit of security theatre for the current implementation, but it uses `go-kms-wrapping` as the interface for wrapping the key. This provides a shim for future support of external KMS such as cloud provider APIs or Vault transit encryption. * Removes the JSON serialization extension we had on the `RootKey` struct; this struct is now only used for key replication and not for disk serialization, so we don't need this helper. * Creates a helper for generating cryptographically random slices of bytes that properly accounts for short reads from the source. * No observable functional changes outside of the on-disk format, so there are no test updates.	2022-08-30 10:59:25 -04:00
Tim Gross	37905d94b7	docs: fixing a few more places we missed "secure" during rename (#14395 )	2022-08-30 10:08:50 -04:00
quoing	ce7a3745d5	docs: template change script example correction (#14368 ) "path" parameter doesn't work, should be command	2022-08-30 12:09:55 +02:00
Tim Gross	d7652fdd3a	docs: rename Secure Variables to Variables (#14352 )	2022-08-29 11:37:08 -04:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
Piotr Kazmierczak	077b6e7098	docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212 ) This PR documents a change made in the enterprise version of nomad that addresses the following issue: When a user tries to filter audit logs, they do so with a stanza that looks like the following: audit { enabled = true filter "remove deletes" { type = "HTTPEvent" endpoints = ["*"] stages = ["OperationComplete"] operations = ["DELETE"] } } When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered. When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered. When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered. The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".	2022-08-24 16:31:49 +02:00
Tim Gross	afb9fe6a4e	docs: fix an anchor link in secure vars docs (#14231 )	2022-08-23 10:46:24 -04:00
Seth Hoenig	b5427a9f3b	Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd docs: update check documentation with NSD specifics	2022-08-23 09:23:53 -05:00
Seth Hoenig	fb82f11e70	docs: fix checks doc typo Co-authored-by: Piotr Kazmierczak <phk@mm.st>	2022-08-23 09:23:36 -05:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
Seth Hoenig	ea6d010790	docs: update check documentation with NSD specifics This PR updates the checks documentation to mention support for checks when using the Nomad service provider. There are limitations of NSD compared to Consul, and those configuration options are now noted as being Consul-only.	2022-08-22 10:50:26 -05:00
Phil Renaud	cbd4deedf8	[ui] general keyboard navigation: 1.3.4 release (#14138 ) * Initialized keyboard service Neat but funky: dynamic subnav traversal 👻 generalized traverseSubnav method Shift as special modifier key Nice little demo panel Keyboard shortcuts keycard Some animation styles on keyboard shortcuts Handle situations where a link is deeply nested from its parent menu item Keyboard service cleanup helper-based initializer and teardown for new contextual commands Keyboard shortcuts modal component added and demo-ghost removed Removed j and k from subnav traversal Register and unregister methods for subnav plus new subnavs for volumes and volume register main nav method Generalizing the register nav method 12762 table keynav (#12975) * Experimental feature: shortcut visual hints * Long way around to a custom modifier for keyboard shortcuts * dynamic table and list iterative shortcuts * Progress with regular old tether * Delogging * Table Keynav tether fix, server and client navs, and fix to shiftless on modified arrow keys Go to Optimize keyboard link and storage key changed to g r parameterized jobs keyboard nav Dynamic numeric keynav for multiple tables (#13482) * Multiple tables init * URL-bind enumerable keyboard commands and add to more taskRow and allocationRows * Type safety and lint fixes * Consolidated push to keyCommands * Default value when removing keyCommands * Remove the URL-based removal method and perform a recompute on any add Get tests passing in Keynav: remove math helpers and a few other defensive moves (#13761) * Remove ember math helpers * Test fixes for jobparts/body * Kill an unneeded integration helper test * delog * Trying if disabling percy lets this finish * Okay so its not percy; try parallelism in circle * Percyless yet again * Trying a different angle to not have percy * Upgrade percy to 1.6.1 [ui] Keyboard nav: "u" key to go up a level (#13754) * U to go up a level * Mislabelled my conditional * Custom lint ignore rule * Custom lint ignore rule, this time with commas * Since we're getting rid of ember math helpers elsewhere, do the math ourselves here Replace ArrowLeft etc. with an ascii arrow (#13776) * Replace ArrowLeft etc. with an ascii arrow * non-mutative helper cleanup Keyboard Nav: let users rebind their shortcuts (#13781) * click-outside and shortcuts enabled/disabled toggle * Trap focus when modal open * Enabled/disabled saved to localStorage * Autofocus edit button on variable index * Modal overflow styles * Functional rebind * Saving rebinds to localStorage for all majors * Started on defaultCommandBindings * Modal header style and cancel rebind on escape * keyboardable keybindings w buttons instead of spans * recording and defaultvalues * Enter short-circuits rebind * Only some commands are rebindable, and dont show dupes * No unused get import * More visually distinct header on modal * Disallowed keys for rebind, showing buffer as you type, and moving dedupe to modal logic willDestroy hook to prevent tests from doubling/tripling up addEventListener on kb events remove unused tests Keyboard Navigation acceptance tests (#13893) * Acceptance tests for keyboard modal * a11y audit fix and localStorage clear * Bind/rebind/localStorage tests * Keyboard tests for dynamic nav and tables * Rebinder and assert expectation * Second percy snapshot showing hints no longer relevant Weird issue where linktos with query props specifically from the task-groups page would fail to route / hit undefined.shouldSuperCede errors Adds the concept of exclusivity to a keycommand, removing peers that also share its label Lintfix Changelog and PR feedback Changelog and PR feedback Fix to rebinding in firefox by blurring the now-disabled button on rebind (#14053) * Secure Variables shortcuts removed * Variable index route autofocus removed * Updated changelog entry * Updated changelog entry * Keynav docs (#14148) * Section added to the API Docs UI page * Added a note about disabling * Prev and Next order * Remove dev log and unneeded comments	2022-08-17 12:59:33 -04:00
Kerim Satirli	614171610f	adds link for Nomad-Pack GitHub action (#14118 )	2022-08-16 08:34:26 +02:00
Tim Gross	a4e89d72a8	secure vars: filter by path in List RPCs (#14036 ) The List RPCs only checked the ACL for the Prefix argument of the request. Add an ACL filter to the paginator for the List RPC. Extend test coverage of ACLs in the List RPC and in the `acl` package, and add a "deny" capability so that operators can deny specific paths or prefixes below an allowed path.	2022-08-15 11:38:20 -04:00
Mike Nomitch	ce310b350d	Add notes about DAS being prometheus only (#14040 )	2022-08-15 10:17:31 +02:00
Seth Hoenig	eb966a4ce8	Merge pull request #14086 from Morantron/patch-1 Fix typo in /tools/autoscaling	2022-08-12 09:07:12 -05:00
Seth Hoenig	394aebfbd9	Merge pull request #14088 from hashicorp/b-plan-vault-token cli: support vault token in plan command	2022-08-12 09:05:34 -05:00
Seth Hoenig	1224fdf60d	Merge pull request #14089 from hashicorp/f-docker-disable-healthchecks docker: configuration for disable docker healthcheck	2022-08-12 09:00:31 -05:00
James Rasell	f6a5961a20	docs: correctly state RPC port is used by servers and clients. (#14091 )	2022-08-12 10:14:14 +02:00
Seth Hoenig	dc761aa7ec	docker: create a docker task config setting for disable built-in healthcheck This PR adds a docker driver task configuration setting for turning off built-in HEALTHCHECK of a container. References) https://docs.docker.com/engine/reference/builder/#healthcheck https://github.com/docker/engine-api/blob/master/types/container/config.go#L16 Closes #5310 Closes #14068	2022-08-11 10:33:48 -05:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
Morantron	741170160f	Update index.mdx	2022-08-11 09:03:52 +02:00
Seth Hoenig	3d925a78e5	Merge pull request #14065 from hashicorp/b-fwd-vtoken-validation cli: forward request for job validation to nomad leader	2022-08-10 15:14:01 -05:00
Seth Hoenig	3aaaedf52e	cli: forward request for job validation to nomad leader This PR changes the behavior of 'nomad job validate' to forward the request to the nomad leader, rather than responding from any server. This is because we need the leader when validating Vault tokens, since the leader is the only server with an active vault client.	2022-08-10 14:34:04 -05:00
dgotlieb	7fbc8baaeb	doc typo fix docker and podman don't suck 🤣	2022-08-10 15:04:07 +03:00
Charlie Voiselle	9a19279f59	Sweep of docs for repeated words; minor edits (#14032 )	2022-08-05 16:45:30 -04:00
Luiz Aoqui	9affe31a0f	qemu: reduce monitor socket path (#13971 ) The QEMU driver can take an optional `graceful_shutdown` configuration which will create a Unix socket to send ACPI shutdown signal to the VM. Unix sockets have a hard length limit and the driver implementation assumed that QEMU versions 2.10.1 were able to handle longer paths. This is not correct, the linked QEMU fix only changed the behaviour from silently truncating longer socket paths to throwing an error. By validating the socket path before starting the QEMU machine we can provide users a more actionable and meaningful error message, and by using a shorter socket file name we leave a bit more room for user-defined values in the path, such as the task name. The maximum length allowed is also platform-dependant, so validation needs to be different for each OS.	2022-08-04 12:10:35 -04:00
Luiz Aoqui	e3d78c343c	template: set default UID/GID to -1 (#13998 ) UID/GID 0 is usually reserved for the root user/group. While Nomad clients are expected to run as root it may not always be the case. Setting these values as -1 if not defined will fallback to the pervious behaviour of not attempting to set file ownership and use whatever UID/GID the Nomad agent is running as. It will also keep backwards compatibility, which is specially important for platforms where this feature is not supported, like Windows.	2022-08-04 11:26:08 -04:00
Luiz Aoqui	8f05a55def	docs: remove link to HCL2 `timestamp` function (#13999 ) The `timestamp` HCL2 function was never part of the set of supported functions.	2022-08-04 10:07:51 -04:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
Tim Gross	e025afdf87	docs: concepts for secure variables and workload identity (#13764 ) Includes concept docs for secure variables, concept docs for workload identity, and an operations docs for keyring management.	2022-08-02 10:06:26 -04:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
Tim Gross	f14fafe914	docs: fix path for quota/usage API (#13952 )	2022-08-02 08:46:45 -04:00
Seth Hoenig	6f4fda3999	website: enable setting custom tool for launching website dev container When working in a podman environment, it's nice to just run the website development container using podman.	2022-07-26 09:15:03 -05:00
asymmetric	b89718d70e	Update filesystem.mdx (#13738 ) fix alloc working directory path	2022-07-25 10:25:48 -04:00
Scott Holodak	12ef89a61a	docs: fix placement for `scaling` and `csi_plugin` (#13892 )	2022-07-25 10:06:59 -04:00
Charlie Voiselle	456ad33b7c	Fix link (#13881 )	2022-07-22 12:27:45 -04:00

1 2 3 4 5 ...

4199 Commits