open-nomad

Author	SHA1	Message	Date
Dao Thanh Tung	2fd908f63f	Fix documentation for `meta` block: string replacement in key from `-` to `_` (#15940 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-30 14:51:04 +01:00
James Rasell	6accfb1f43	cli: separate auth method config output for easier reading. (#15892 )	2023-01-30 11:44:26 +01:00
James Rasell	06664baeb1	docs: add ACL concepts page to introduce objects. (#15895 )	2023-01-30 11:00:29 +01:00
Tim Gross	d2fc65764e	docs: add more warnings about running agent as root on Linux (#15926 )	2023-01-27 15:22:18 -05:00
Tim Gross	40a47f63f2	docs: add post-install steps for CNI to main install docs page (#15919 ) The getting started Tutorial has a post-installation steps section that includes installing CNI plugins. Many users will want to use `bridge` networking right out of the gate, so adding these same post-install instructions to the main docs will be a better Day 0 experience for them.	2023-01-27 13:16:14 -05:00
Yorick Gersie	2a5c423ae0	Allow per_alloc to be used with host volumes (#15780 ) Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. Ensure host volumes understand the concept of per_alloc	2023-01-26 09:14:47 -05:00
Piotr Kazmierczak	f4d6efe69f	acl: make auth method default across all types (#15869 )	2023-01-26 14:17:11 +01:00
James Rasell	5d33891910	sso: allow binding rules to create management ACL tokens. (#15860 ) * sso: allow binding rules to create management ACL tokens. * docs: update binding rule docs to detail management type addition.	2023-01-26 09:57:44 +01:00
Luiz Aoqui	f2dd46d1db	docs: add caveat on dynamic blocks (#15857 )	2023-01-25 15:54:45 -05:00
Ashlee M Boyer	57f8ebfa26	docs: Migrate link formats (#15779 ) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com>	2023-01-25 09:31:14 -08:00
Nick Wales	825af1f62a	docker: add option for Windows isolation modes (#15819 )	2023-01-24 16:31:48 -05:00
Karl Johann Schubert	b773a1b77f	client: add disk_total_mb and disk_free_mb config options (#15852 )	2023-01-24 09:14:22 -05:00
Tim Gross	a51149736d	Rename `nomad.broker.total_blocked` metric (#15835 ) This changeset fixes a long-standing point of confusion in metrics emitted by the eval broker. The eval broker has a queue of "blocked" evals that are waiting for an in-flight ("unacked") eval of the same job to be completed. But this "blocked" state is not the same as the `blocked` status that we write to raft and expose in the Nomad API to end users. There's a second metric `nomad.blocked_eval.total_blocked` that refers to evaluations in that state. This has caused ongoing confusion in major customer incidents and even in our own documentation! (Fixed in this PR.) There's little functional change in this PR aside from the name of the metric emitted, but there's a bit refactoring to clean up the names in `eval_broker.go` so that there aren't name collisions and multiple names for the same state. Changes included are: * Everything that was previously called "pending" referred to entities that were associated witht he "ready" metric. These are all now called "ready" to match the metric. * Everything named "blocked" in `eval_broker.go` is now named "pending", except for a couple of comments that actually refer to blocked RPCs. * Added a note to the upgrade guide docs for 1.5.0. * Fixed the scheduling performance metrics docs because the description for `nomad.broker.total_blocked` was actually the description for `nomad.blocked_eval.total_blocked`.	2023-01-20 14:23:56 -05:00
Charlie Voiselle	5ea1d8a970	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00
Karel	ad56b4dbd2	docs: fix conflict metric documentation, fix typo (#15805 ) The description for the `nomad.nomad.blocked_evals.total_blocked` states that this could include evals blocked due to reached quota limits, but the `total_quota_limit` mentions being exclusive to its own metric. I personally interpret `total_blocked` as encompassing any blocked evals for any reason, as written in the docs. Though someone will have to verify the validity of that statement and possibly rectify the other metric description. Fixed a typo: `limtis` vs `limits`.	2023-01-20 13:54:11 -05:00
James Rasell	4cf40f5606	docs: clarify installing from source requirement on PATH. (#15833 )	2023-01-20 16:10:02 +01:00
James Rasell	c55efdd928	docs: add OIDC login API and CLI docs. (#15818 )	2023-01-20 10:07:26 +01:00
Ashlee M Boyer	4e82c96d36	[docs] Adjusting links for rewrite project (#15810 ) * Adjusting link to page about features * Fixing typo * Replacing old learn links with devdot paths * Removing extra space	2023-01-17 10:55:47 -05:00
Luiz Aoqui	a0652af5dd	docs: add missing parameter `propagation_mode` to `volume_mount` (#15785 )	2023-01-16 10:18:50 -05:00
Ashlee M Boyer	c75ea79f25	Fixing yaml syntax in frontmatter (#15781 )	2023-01-13 14:06:46 -05:00
Seth Hoenig	fe7795ce16	consul/connect: support for proxy upstreams opaque config (#15761 ) This PR adds support for configuring `proxy.upstreams[].config` for Consul Connect upstreams. This is an opaque config value to Nomad - the data is passed directly to Consul and is unknown to Nomad.	2023-01-12 08:20:54 -06:00
Anthony Davis	1c32471805	Fix rejoin_after_leave behavior (#15552 )	2023-01-11 16:39:24 -05:00
Seth Hoenig	719eee8112	consul: add client configuration for grpc_ca_file (#15701 ) * [no ci] first pass at plumbing grpc_ca_file * consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+ This PR adds client config to Nomad for specifying consul.grpc_ca_file These changes combined with https://github.com/hashicorp/consul/pull/15913 should finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections. * consul: add cl entgry for grpc_ca_file * docs: mention grpc_tls changes due to Consul 1.14	2023-01-11 09:34:28 -06:00
Dao Thanh Tung	09b25d71b8	cli: Add a nomad operator client state command (#15469 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-01-11 10:03:31 -05:00
Luiz Aoqui	ed5fccc183	scheduler: allow using device ID as attribute (#15455 ) Devices are fingerprinted as groups of similar devices. This prevented specifying specific device by their ID in constraint and affinity rules. This commit introduces the `${device.ids}` attribute that returns a comma separated list of IDs that are part of the device group. Users can then use the set operators to write rules.	2023-01-10 14:28:23 -05:00
Cyrille Colin	d9bf6ec6f7	Update template.mdx (#15737 ) fix typo issue in variable url : remove unwanted "r"	2023-01-10 10:42:33 +01:00
Luiz Aoqui	f4bf4528a1	docs: networking (#15358 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2023-01-06 11:47:10 -05:00
James Rasell	fc08eb9e12	docs: clarify shutdown_delay jobspec param and service behaviour. (#15695 )	2023-01-05 16:57:13 +01:00
Dao Thanh Tung	ca2f509e82	agent: Make agent syslog log level inherit from Nomad agent log (#15625 )	2023-01-04 09:38:06 -05:00
dgotlieb	a991342f8d	docs: nomad `eval delete` typo fix (#15667 ) Status instead of Stauts	2023-01-03 14:18:03 -05:00
James Rasell	11744de527	docs: fix service name interpolation key details. (#15643 )	2023-01-03 10:58:00 +01:00
Piotr Kazmierczak	f1450d25d2	ACL Binding Rules CLI documentation (#15584 )	2022-12-22 16:36:25 +01:00
Danish Prakash	dc81568f93	command/job_stop: accept multiple jobs, stop concurrently (#12582 ) * command/job_stop: accept multiple jobs, stop concurrently Signed-off-by: danishprakash <grafitykoncept@gmail.com> * command/job_stop_test: add test for multiple job stops Signed-off-by: danishprakash <grafitykoncept@gmail.com> * improve output, add changelog and docs Signed-off-by: danishprakash <grafitykoncept@gmail.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-12-16 15:46:58 -08:00
Piotr Kazmierczak	f91ab03920	acl: SSO auth methods CLI documentation (#15538 ) This PR provides documentation for the ACL Auth Methods CLI commands. Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2022-12-14 13:35:26 +01:00
Seth Hoenig	be3f89b5f9	artifact: enable inheriting environment variables from client (#15514 ) * artifact: enable inheriting environment variables from client This PR adds client configuration for specifying environment variables that should be inherited by the artifact sandbox process from the Nomad Client agent. Most users should not need to set these values but the configuration is provided to ensure backwards compatability. Configuration of go-getter should ideally be done through the artifact block in a jobspec task. e.g. ```hcl client { artifact { set_environment_variables = "TMPDIR,GIT_SSH_OPTS" } } ``` Closes #15498 * website: update set_environment_variables text to mention PATH	2022-12-09 15:46:07 -06:00
Michael Schurter	c28c5ad2e8	docs: clarify rescheduling happens when tasks fail (#15485 )	2022-12-08 12:58:26 -08:00
Seth Hoenig	825c5cc65e	artifact: add client toggle to disable filesystem isolation (#15503 ) This PR adds the client config option for turning off filesystem isolation, applicable on Linux systems where filesystem isolation is possible and enabled by default. ```hcl client{ artifact { disable_filesystem_isolation = <bool:false> } } ``` Closes #15496	2022-12-08 12:29:23 -06:00
Seth Hoenig	51a2212d3d	client: sandbox go-getter subprocess with landlock (#15328 ) * client: sandbox go-getter subprocess with landlock This PR re-implements the getter package for artifact downloads as a subprocess. Key changes include On all platforms, run getter as a child process of the Nomad agent. On Linux platforms running as root, run the child process as the nobody user. On supporting Linux kernels, uses landlock for filesystem isolation (via go-landlock). On all platforms, restrict environment variables of the child process to a static set. notably TMP/TEMP now points within the allocation's task directory kernel.landlock attribute is fingerprinted (version number or unavailable) These changes make Nomad client more resilient against a faulty go-getter implementation that may panic, and more secure against bad actors attempting to use artifact downloads as a privilege escalation vector. Adds new e2e/artifact suite for ensuring artifact downloading works. TODO: Windows git test (need to modify the image, etc... followup PR) * landlock: fixup items from cr * cr: fixup tests and go.mod file	2022-12-07 16:02:25 -06:00
Tim Gross	7404ef46e9	docs: update `plugin status` docs with capabilities and topology (#15448 ) The `plugin status` command supports displaying CSI capabilities and topology accessibility, but this was missing from the documentation. Extend the `-verbose` example to show that info.	2022-12-01 12:18:56 -05:00
Matus Goljer	2283c2d583	Update affinity.mdx (#15168 ) Fix the comment to correspond to the code	2022-11-30 19:01:56 -05:00
Jack	62f7de7ed5	cli: `wait` flag for use with `deployment status -monitor` (#15262 )	2022-11-23 16:36:13 -05:00
Lance Haig	0263e7af34	Add command "nomad tls" (#14296 )	2022-11-22 14:12:07 -05:00
James Rasell	e2a2ea68fc	client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309 ) * client: accommodate Consul 1.14.0 gRPC and agent self changes. Consul 1.14.0 changed the way in which gRPC listeners are configured, particularly when using TLS. Prior to the change, a single listener was responsible for handling plain-text and encrypted gRPC requests. In 1.14.0 and beyond, separate listeners will be used for each, defaulting to 8502 and 8503 for plain-text and TLS respectively. The change means that Nomad’s Consul Connect integration would not work when integrated with Consul clusters using TLS and running 1.14.0 or greater. The Nomad Consul fingerprinter identifies the gRPC port Consul has exposed using the "DebugConfig.GRPCPort" value from Consul’s “/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only represents the plain-text gRPC port which is likely to be disbaled in clusters running TLS. In order to fix this issue, Nomad now takes into account the Consul version and configured scheme to optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent self return. The “consul_grcp_socket” allocrunner hook has also been updated so that the fingerprinted gRPC port attribute is passed in. This provides a better fallback method, when the operator does not configure the “consul.grpc_address” option. * docs: modify Consul Connect entries to detail 1.14.0 changes. * changelog: add entry for #15309 * fixup: tidy tests and clean version match from review feedback. * fixup: use strings tolower func.	2022-11-21 09:19:09 -06:00
Tim Gross	510eb435dc	remove deprecated `AllocUpdateRequestType` raft entry (#15285 ) After Deployments were added in Nomad 0.6.0, the `AllocUpdateRequestType` raft log entry was no longer in use. Mark this as deprecated, remove the associated dead code, and remove references to the metrics it emits from the docs. We'll leave the entry itself just in case we encounter old raft logs that we need to be able to safely load.	2022-11-17 12:08:04 -05:00
Tim Gross	37134a4a37	eval delete: move batching of deletes into RPC handler and state (#15117 ) During unusual outage recovery scenarios on large clusters, a backlog of millions of evaluations can appear. In these cases, the `eval delete` command can put excessive load on the cluster by listing large sets of evals to extract the IDs and then sending larges batches of IDs. Although the command's batch size was carefully tuned, we still need to be JSON deserialize, re-serialize to MessagePack, send the log entries through raft, and get the FSM applied. To improve performance of this recovery case, move the batching process into the RPC handler and the state store. The design here is a little weird, so let's look a the failed options first: * A naive solution here would be to just send the filter as the raft request and let the FSM apply delete the whole set in a single operation. Benchmarking with 1M evals on a 3 node cluster demonstrated this can block the FSM apply for several minutes, which puts the cluster at risk if there's a leadership failover (the barrier write can't be made while this apply is in-flight). * A less naive but still bad solution would be to have the RPC handler filter and paginate, and then hand a list of IDs to the existing raft log entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and took roughly an hour to complete. Instead, we're filtering and paginating in the RPC handler to find a page token, and then passing both the filter and page token in the raft log. The FSM apply recreates the paginator using the filter and page token to get roughly the same page of evaluations, which it then deletes. The pagination process is fairly cheap (only abut 5% of the total FSM apply time), so counter-intuitively this rework ends up being much faster. A benchmark of 1M evaluations showed this blocked the FSM apply for 20-30ms at a time (typical for normal operations) and completes in less than 4 minutes. Note that, as with the existing design, this delete is not consistent: a new evaluation inserted "behind" the cursor of the pagination will fail to be deleted.	2022-11-14 14:08:13 -05:00
Douglas Jose	345ef0bbec	Fix wrong reference to `vault` (#15228 )	2022-11-14 10:49:09 +01:00
Kyle Root	99d5e7efb3	Fix broken URL to nvidia device plugin (#15234 )	2022-11-14 10:37:06 +01:00
Tim Gross	eabbcebdd4	exec: allow running commands from host volume (#14851 ) The exec driver and other drivers derived from the shared executor check the path of the command before handing off to libcontainer to ensure that the command doesn't escape the sandbox. But we don't check any host volume mounts, which should be safe to use as a source for executables if we're letting the user mount them to the container in the first place. Check the mount config to verify the executable lives in the mount's host path, but then return an absolute path within the mount's task path so that we can hand that off to libcontainer to run. Includes a good bit of refactoring here because the anchoring of the final task path has different code paths for inside the task dir vs inside a mount. But I've fleshed out the test coverage of this a good bit to ensure we haven't created any regressions in the process.	2022-11-11 09:51:15 -05:00
Seth Hoenig	01a3a29e51	docs: clarify how to access task meta values in templates (#15212 ) This PR updates template and meta docs pages to give examples of accessing meta values in templates. To do so one must use the environment variable form of the meta key name, which isn't obvious and wasn't yet documented.	2022-11-10 16:11:53 -06:00
twunderlich-grapl	1859559134	Fix s3 example URLs in the artifacts docs (#15123 ) * Fix s3 URLs so that they work Unfortunately, s3 urls prefixed with https:// do NOT work with the underlying go-getter library. As such, this fixes the examples so that they are working examples that won't cause problems for people reading the docs. See discussion in https://github.com/hashicorp/nomad/issues/1113 circa 2016. * Use s3:// protocol schema for artifact examples Per the discussion in https://github.com/hashicorp/nomad/pull/15123, we're going to use the explicit s3 protocol in the examples since that is the likeliest to work in all scenarios	2022-11-07 14:14:57 -05:00

1 2 3 4 5 ...

601 commits