open-nomad

Author	SHA1	Message	Date
Jasmine Dahilig	f67b108f9f	docs: update vault-token note in job run command #8040 (#12385 )	2022-04-06 10:01:38 -07:00
James Rasell	7096fecd10	website: add initial website docs for Nomad service discovery. (#12456 )	2022-04-06 18:51:14 +02:00
Derek Strickland	0ab89b1728	Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling disconnected clients: Feature branch merge	2022-04-06 10:11:57 -04:00
Mike Nomitch	7405ebbad1	Add max client disconnect docs (#12467 ) Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-04-06 08:54:14 -04:00
Seth Hoenig	2e2ff3f75e	Merge pull request #12419 from hashicorp/exec-cleanup raw_exec: make raw exec driver work with cgroups v2	2022-04-05 16:42:01 -05:00
Tim Gross	5b9772e68f	docs: updates for CSI plugin improvements for 1.3.0 (#12466 )	2022-04-05 17:13:51 -04:00
Derek Strickland	8e9f8be511	`MaxClientDisconnect` Jobspec checklist (#12177 ) * api: Add struct, conversion function, and tests * TaskGroup: Add field, validation, and tests * diff: Add diff handler and test * docs: Update docs	2022-04-05 17:12:23 -04:00
Derek Strickland	d7f44448e1	disconnected clients: Observability plumbing (#12141 ) * Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics	2022-04-05 17:12:23 -04:00
Shishir	a6801f73d1	cli: add -quiet to nomad node status command. (#12426 )	2022-04-05 15:53:43 -04:00
Luiz Aoqui	ab7eb5de6e	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
Grant Griffiths	18a0a2c9a4	CSI: Add secrets flag support for delete volume (#11245 )	2022-04-05 08:59:11 -04:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Danish Prakash	e7e8ce212e	command/operator_debug: add pprof interval (#11938 )	2022-04-04 15:24:12 -04:00
Seth Hoenig	f9b0ffafde	Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo docs: fix typo in system batch description	2022-04-01 09:58:06 -05:00
Seth Hoenig	e9eacb1153	docs: fix typo in system batch description	2022-04-01 09:46:03 -05:00
Bryce Kalow	9b0d77ae78	website: redirect /api to api-docs and update internal links (#12410 )	2022-03-31 11:33:27 -05:00
Tim Gross	8dccc43c2f	docs: remove deprecated client options parameters docs (#12416 ) The client configuration options for drivers have been deprecated since 0.9. We haven't torn them out completely but because they're deprecated it's been hard to guarantee correct behavior. Remove the documentation so that users aren't misled about their viability.	2022-03-31 11:45:51 -04:00
Michael Schurter	cae69ba8ce	Merge pull request #12312 from hashicorp/f-writeToFile template: disallow `writeToFile` by default	2022-03-29 13:41:59 -07:00
Tim Gross	03c1904112	csi: allow `namespace` field to be passed in volume spec (#12400 ) Use the volume spec's `namespace` field to override the value of the `-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.	2022-03-29 14:46:39 -04:00
Michael Schurter	33fe04ff6a	template: fix comments and docs Review notes from @lgfa29 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-03-29 09:25:23 -07:00
Michael Schurter	7a28fcb8af	template: disallow `writeToFile` by default Resolves #12095 by WONTFIXing it. This approach disables `writeToFile` as it allows arbitrary host filesystem writes and is only a small quality of life improvement over multiple `template` stanzas. This approach has the significant downside of leaving people who have altered their `template.function_denylist` still vulnerable! I added an upgrade note, but we should have implemented the denylist as a `map[string]bool` so that new funcs could be denied without overriding custom configurations. This PR also includes a bug fix that broke enabling all consul-template funcs. We repeatedly failed to differentiate between a nil (unset) denylist and an empty (allow all) one.	2022-03-28 17:05:42 -07:00
Shishir	afcce3eea5	Display OS name in nomad node status command. (#12388 ) Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-03-28 09:28:14 -04:00
Hunter Morris	dcaf99dcc1	client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371 )	2022-03-25 11:50:52 -04:00
Luiz Aoqui	848a3b271f	docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378 )	2022-03-25 10:11:46 -04:00
dgotlieb	f53f61c6ce	Add grpc and http2 listeners to gateway docs (#12367 ) Stating at Nomad version 1.2.0 `grpc` and `http2` [protocols are supported](https://github.com/hashicorp/nomad/pull/11187)	2022-03-24 17:09:19 -04:00
Seth Hoenig	987dda3092	Merge pull request #12274 from hashicorp/f-cgroupsv2 client: enable cpuset support for cgroups.v2	2022-03-24 14:22:54 -05:00
Seth Hoenig	113b7eb727	client: cgroups v2 code review followup	2022-03-24 13:40:42 -05:00
Tim Gross	ff1bed38cd	csi: add `-secret` and `-parameter` flag to `volume snapshot create` (#12360 ) Pass-through the `-secret` and `-parameter` flags to allow setting parameters for the snapshot and overriding the secrets we've stored on the CSI volume in the state store.	2022-03-24 10:29:50 -04:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
Tim Gross	60cfeacd76	drainer: defer CSI plugins until last (#12324 ) When a node is drained, system jobs are left until last so that operators can rely on things like log shippers running even as their applications are getting drained off. Include CSI plugins in this set so that Controller plugins deployed as services can be handled as gracefully as Node plugins that are running as system jobs.	2022-03-22 10:26:56 -04:00
Luiz Aoqui	68e5b58007	cli: display Raft version in `server members` (#12317 ) The previous output of the `nomad server members` command would output a column named `Protocol` that displayed the Serf protocol being currently used by servers. This is not a configurable option, so it holds very little value to operators. It is also easy to confuse it with the Raft Protocol version, which is configurable and highly relevant to operators. This commit replaces the previous `Protocol` column with the new `Raft Version`. It also updates the `-detailed` flag to be called `-verbose` so it matches other commands. The detailed output now also outputs the same information as the standard output with the addition of the previous `Protocol` column and `Tags`.	2022-03-17 14:15:10 -04:00
Luiz Aoqui	15089f055f	api: add related evals to eval details (#12305 ) The `related` query param is used to indicate that the request should return a list of related (next, previous, and blocked) evaluations. Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>	2022-03-17 13:56:14 -04:00
Luiz Aoqui	8db12c2a17	server: transfer leadership in case of error (#12293 ) When a Nomad server becomes the Raft leader, it must perform several actions defined in the establishLeadership function. If any of these actions fail, Raft will think the node is the leader, but it will not actually be able to act as a Nomad leader. In this scenario, leadership must be revoked and transferred to another server if possible, or the node should retry the establishLeadership steps.	2022-03-17 11:10:57 -04:00
Tim Gross	3bf948dc00	docs: clarify `restart` inheritance and add examples (#12275 ) Clarify the behavior of `restart` inheritance with respect to Connect sidecar tasks. Remove incorrect language about the scheduler being involved in restart decisions. Try to make the `delay` mode documentation more clear, and provide examples of delay vs fail.	2022-03-14 15:49:08 -04:00
Luiz Aoqui	9b393d0535	docs: initial docs for the new API features (#12094 )	2022-03-14 10:58:42 -04:00
Luiz Aoqui	2876739a51	api: apply consistent behaviour of the reverse query parameter (#12244 )	2022-03-11 19:44:52 -05:00
Luiz Aoqui	a42e64c039	docs: add namespace param to job parse API (#12258 )	2022-03-10 16:35:07 -05:00
Tim Gross	5ae30849a9	docs: add note about docker DNS config when using bridge mode (#12229 ) The Docker DNS configuration options are not compatible with a group-level network in `bridge` mode. Warn users about this in the Docker task configuration docs.	2022-03-08 11:59:20 -05:00
Merlin Scholz	68457be72c	docs: elaborate on networking issues with firewalld (#12214 )	2022-03-08 09:49:29 -05:00
Mike Nomitch	3955dd36d7	Merge pull request #12192 from hashicorp/website/add-new-tools Add openapi and caravan to tools page	2022-03-07 11:21:24 -08:00
Ignacio Torres Masdeu	2793054147	docs: fix examples for set_contains_all and set_contains_any (#12093 )	2022-03-07 13:55:57 -05:00
Michael Schurter	7bb8de68e5	Merge pull request #12138 from jorgemarey/f-ns-meta Add metadata to namespaces	2022-03-07 10:19:33 -08:00
Tim Gross	b94837a2b8	csi: add pagination args to `volume snapshot list` (#12193 ) The snapshot list API supports pagination as part of the CSI specification, but we didn't have it plumbed through to the command line.	2022-03-07 12:19:28 -05:00
Tim Gross	09a7612150	csi: volume snapshot list plugin option is required (#12197 ) The RPC for listing volume snapshots requires a plugin ID. Update the `volume snapshot list` command to find the specific plugin from the provided prefix.	2022-03-07 09:58:29 -05:00
Michael Schurter	69913d6ac5	docs: add meta to namespace docs	2022-03-04 14:18:57 -08:00
Mike Nomitch	32bc5638a0	Updated OpenAPI info on tools page Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-03-04 12:54:08 -08:00
Mike Nomitch	0129f7f1a5	Add openapi and caravan to tools page	2022-03-04 09:56:21 -06:00
James Rasell	6aa741dd16	docs: add note regarding HCLv2 func and interpolation.	2022-03-04 12:06:25 +01:00
Michael Schurter	0f6923c750	Merge pull request #10808 from hashicorp/f-curl cli: add operator api command	2022-03-02 10:12:16 -08:00
Michael Schurter	a8833b7d86	docs: add op api examples	2022-03-01 17:15:26 -08:00
Michael Schurter	72134ef5a7	docs: add op api examples	2022-03-01 17:12:58 -08:00
Michael Schurter	fcf4515875	docs: add op api options	2022-03-01 16:43:53 -08:00
Ashlee M Boyer	c3691a44df	docs: Fixing path for autoscaling/agent/source nav item (#12166 )	2022-03-01 17:24:12 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	c90e674918	CSI: use HTTP headers for passing CSI secrets (#12144 )	2022-03-01 08:47:01 -05:00
Tim Gross	ca06f6153a	docs: clarify that plugin commands are for CSI only (#12151 )	2022-03-01 07:57:41 -05:00
Jorge Marey	a466f01120	Add metadata to namespaces	2022-02-27 09:09:10 +01:00
Seth Hoenig	5269b2e02f	docs: clairfy advertise.rpc effect The advertise.rpc config option is not intuitive. At first glance you'd assume it works like advertise.http or advertise.serf, but it does not. The current behavior is working as intended, but the documentation is very hard to parse and doesn't draw a clear picture of what the setting actually does. Closes https://github.com/hashicorp/nomad/issues/11075	2022-02-25 16:02:29 -06:00
Michael Schurter	bb3daac628	rename `nomad curl` to `nomad operator api`	2022-02-24 15:52:54 -08:00
Michael Schurter	141db0c562	cli: add curl command Just a hackweek project at this point.	2022-02-24 15:52:54 -08:00
Luiz Aoqui	61d79e75b0	docs: add docs for the autoscaler `on_error` and `on_check_error` configuration (#12083 )	2022-02-24 12:12:29 -05:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	8e6d97744b	docs: emphasize snapshot before upgrading	2022-02-24 08:22:41 -06:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Tim Gross	246db87a74	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Charlie Voiselle	01f6e57602	Fixed scheduler config examples (#12049 )	2022-02-23 12:58:29 -05:00
Mike Nomitch	f3d1cf4dbd	Merge pull request #12065 from hashicorp/docs-add-form-link Adding link to interview form	2022-02-22 11:05:20 -08:00
Luiz Aoqui	02ee075506	docs: update link to `mount` in Docker task driver (#12101 )	2022-02-22 13:39:49 -05:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Adrián López	b1565c7bf4	Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977 )	2022-02-18 17:29:19 -05:00
James Rasell	f2d73442e8	docs: add autoscaler hcloud target plugin link. (#12087 )	2022-02-18 17:28:38 -05:00
Luiz Aoqui	110dbeeb9d	Add `go-bexpr` filters to evals and deployment list endpoints (#12034 )	2022-02-16 11:40:30 -05:00
Tiernan	c30b4617aa	interpolate network.dns block on client (#12021 )	2022-02-16 08:39:44 -05:00
Seth Hoenig	40c714a681	api: return sorted results in certain list endpoints These API endpoints now return results in chronological order. They can return results in reverse chronological order by setting the query parameter ascending=true. - Eval.List - Deployment.List	2022-02-15 13:48:28 -06:00
Mike Nomitch	8377f5cfe3	Adding link to interview form	2022-02-14 12:38:26 -08:00
James Rasell	926458c5b2	Merge pull request #12053 from marcaurele/fix-typo doc(typo): technical typo in advertised example	2022-02-11 14:27:12 +01:00
Luiz Aoqui	d976e4a19b	docs: add upgrade note and ACL requirements for the job submit endpoint (#12046 )	2022-02-10 15:35:16 -05:00
Marc-Aurèle Brothier	fb80dc57a1	small typo in advertised example	2022-02-10 13:53:05 +01:00
Tim Gross	59c8558969	docs and changelog for `nomad config validate` (#12031 )	2022-02-09 10:20:45 -05:00
Tim Gross	7ad15b2b42	raft: default to protocol v3 (#11572 ) Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.	2022-02-03 15:03:12 -05:00
René Moser	05db861938	api-docs: add SysBatchSchedulerEnabled docs (#11973 )	2022-02-02 16:54:47 -05:00
James Rasell	a7f569d0e1	docs: add `cores` to client reserved config block.	2022-01-26 15:56:16 +01:00
Dan Norris	160682cf2b	docs: Update volume create/register mount options to use []string example (#11912 ) The examples for `nomad volume create` and `nomad volume register` are not setting `mount_flags` using an array of strings. This fixes the issue by changing the example to be `mount_flags = ["noatime"]`.	2022-01-24 11:34:21 -05:00
Luiz Aoqui	626e633b41	docs: add `nomad.plan.node_rejected` metric (#11860 )	2022-01-18 13:47:20 -05:00
Dave May	330d24a873	cli: Add event stream capture to nomad operator debug (#11865 )	2022-01-17 21:35:51 -05:00
Luiz Aoqui	ed9f277925	docs: update 1.2.0 upgrade note now that the UI ACL is fixed (#11840 )	2022-01-17 11:09:08 -05:00
Luiz Aoqui	f981a1ed7e	docs: add HashiBox to the list of community tools (#11861 )	2022-01-17 11:08:41 -05:00
James Rasell	82b168bf34	Merge pull request #11403 from hashicorp/f-gh-11059 agent/docs: add better clarification when top-level data dir needs setting	2022-01-13 16:41:35 +01:00
Luiz Aoqui	7e6acf0e68	docs: fix autoscaling Datadog site configuration (#11824 )	2022-01-12 21:06:30 -05:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Tim Gross	fa64822e49	docs: note that clients need to have ACLs enabled (#11799 ) Client endpoints such as `alloc exec` are enforced on the client if the API client or CLI has "line of sight" to the client. This is already in the Learn guide but having it in the ACL configuration docs would be helpful.	2022-01-07 16:18:41 -05:00
Tim Gross	32f150d469	docs: new scheduler metrics (#11790 ) * Fixed name of `nomad.scheduler.allocs.reschedule` metric * Added new metrics to metrics reference documentation * Expanded definitions of "waiting" metrics * Changelog entry for #10236 and #10237	2022-01-07 09:51:15 -05:00
Charlie Voiselle	98a240cd99	Make number of scheduler workers reloadable (#11593 ) ## Development Environment Changes * Added stringer to build deps ## New HTTP APIs * Added scheduler worker config API * Added scheduler worker info API ## New Internals * (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume() * Update shutdown to use context * Add mutex for contended server data - `workerLock` for the `workers` slice - `workerConfigLock` for the `Server.Config.NumSchedulers` and `Server.Config.EnabledSchedulers` values ## Other * Adding docs for scheduler worker api * Add changelog message Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-01-06 11:56:13 -05:00
James Rasell	1f4e100edc	Merge pull request #11762 from hashicorp/b-gh-11681 docs: add 1.2.0 HCLv2 strict parsing upgrade note.	2022-01-04 09:30:09 +01:00
Tim Gross	6b1b3e7ef8	docs: fix attribute name for java version detection (#11764 )	2022-01-03 16:50:25 -05:00
James Rasell	117c79117e	docs: add 1.2.0 HCLv2 strict parsing upgrade note.	2022-01-03 15:41:18 +00:00
Tim Gross	2806dc2bd7	docs/tests for multiple HTTP address config (#11760 )	2022-01-03 10:17:13 -05:00
Kevin Schoonover	5d9a506bc0	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
Tim Gross	395628efe1	api: paginate deployment list and accept wildcard namespace (#11743 ) Add `per_page` and `next_token` handling to `Deployment.List` RPC, and allow the use of a wildcard namespace for namespace filtering.	2022-01-03 08:36:02 -05:00
Shishir	65eab35412	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
Tim Gross	b0c3b99b03	scheduler: fix quadratic performance with spread blocks (#11712 ) When the scheduler picks a node for each evaluation, the `LimitIterator` provides at most 2 eligible nodes for the `MaxScoreIterator` to choose from. This keeps scheduling fast while producing acceptable results because the results are binpacked. Jobs with a `spread` block (or node affinity) remove this limit in order to produce correct spread scoring. This means that every allocation within a job with a `spread` block is evaluated against _all_ eligible nodes. Operators of large clusters have reported that jobs with `spread` blocks that are eligible on a large number of nodes can take longer than the nack timeout to evaluate (60s). Typical evaluations are processed in milliseconds. In practice, it's not necessary to evaluate every eligible node for every allocation on large clusters, because the `RandomIterator` at the base of the scheduler stack produces enough variation in each pass that the likelihood of an uneven spread is negligible. Note that feasibility is checked before the limit, so this only impacts the number of _eligible_ nodes available for scoring, not the total number of nodes. This changeset sets the iterator limit for "large" `spread` block and node affinity jobs to be equal to the number of desired allocations. This brings an example problematic job evaluation down from ~3min to ~10s. The included tests ensure that we have acceptable spread results across a variety of large cluster topologies.	2021-12-21 10:10:01 -05:00
Andy Assareh	8ba4e063e2	Mesh Gateway doc enhancements (#11354 ) * Mesh Gateway doc enhancements 1. I believe this line should be corrected to add mesh as one of the choices 2. I found that we are not setting this meta, and it is a required element for wan federation. I believe it would be helpful and potentially time saving to note that right here.	2021-12-20 17:10:44 -05:00
Guilherme	ae05515b50	Fix 'check calculations' link (#11420 )	2021-12-20 17:09:15 -05:00
Tim Gross	e046bb31e9	api: respect wildcard in evaluations list API (#11710 )	2021-12-20 12:23:50 -05:00
Luiz Aoqui	a46d799f2a	docs: add v1.2.0 upgrade guide about Nomad UI ACL change for job details page (#11689 )	2021-12-16 14:32:20 -05:00
Luiz Aoqui	4b39494cd1	docs: add more references and examples to the `template` block (#11691 )	2021-12-16 14:14:01 -05:00
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Noel Quiles	235a778a56	website: Copy updates (#11677 )	2021-12-14 16:35:21 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	624ecab901	evaluations list pagination and filtering (#11648 ) API queries can request pagination using the `NextToken` and `PerPage` fields of `QueryOptions`, when supported by the underlying API. Add a `NextToken` field to the `structs.QueryMeta` so that we have a common field across RPCs to tell the caller where to resume paging from on their next API call. Include this field on the `api.QueryMeta` as well so that it's available for future versions of List HTTP APIs that wrap the response with `QueryMeta` rather than returning a simple list of structs. In the meantime callers can get the `X-Nomad-NextToken`. Add pagination to the `Eval.List` RPC by checking for pagination token and page size in `QueryOptions`. This will allow resuming from the last ID seen so long as the query parameters and the state store itself are unchanged between requests. Add filtering by job ID or evaluation status over the results we get out of the state store. Parse the query parameters of the `Eval.List` API into the arguments expected for filtering in the RPC call.	2021-12-10 13:43:03 -05:00
Kevin Wang	3e6757f211	feat(website): extract `/plugins` `/tools` docs (#11584 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>	2021-12-09 14:25:18 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Tim Gross	348f482c94	docs: improve docs for troubleshooting and monitoring scheduler (#11623 ) This changeset adds more specific recommendations as to what metrics to monitor, and what resources should be examined during incident response. It also renames the "Telemetry" section to "Monitoring Nomad" to surface the material better and distinguish it from the "Metric Reference". Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2021-12-07 15:52:13 -05:00
James Rasell	d44e5620dd	docs: add license expiry metric to metrics website doc.	2021-12-07 10:31:51 +00:00
Shantanu Gadgil	0838678609	mention `sysbatch` in addition to `batch` (#11587 )	2021-12-06 19:12:03 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Tim Gross	39acac33a0	ui: change Consul/Vault base URL field name (#11589 ) Give ourselves some room for extension in the UI configuration block by naming the field `ui_url`, which will let us have an `api_url`. Fix the template path to ensure we're getting the right value from the API.	2021-11-30 13:20:29 -05:00
James Rasell	e34bb8ab1d	Merge pull request #11577 from hashicorp/b-gh-11576 docs: add deprecation note to old style network task env vars.	2021-11-30 12:15:31 +01:00
Tim Gross	ba038a1ebc	docs: `mount_flags` takes a slice of strings (#11583 ) The `mount_flags` option takes a slice of strings, not a comma-separated string like the flags passed to `mount(8)`.	2021-11-29 10:07:34 -05:00
James Rasell	0260cc6306	docs: add deprecation note to old style network task env vars.	2021-11-25 12:58:32 +01:00
Luiz Aoqui	0b82d62bc6	docs: document new Prometheus configuration for the Autoscaler APM plugin (#11562 )	2021-11-24 17:37:35 -05:00
Luiz Aoqui	0859eac724	docs: add CLI and config docs for the Autoscaler policy source config (#11559 )	2021-11-24 16:17:37 -05:00
Luiz Aoqui	fa23106612	docs: add upgrade guide notes for Nomad 1.2.2 (#11567 )	2021-11-24 14:24:20 -05:00
Tim Gross	fcb96de9a7	config: UI configuration block with Vault/Consul links (#11555 ) Add `ui` block to agent configuration to enable/disable the web UI and provide the web UI with links to Vault/Consul.	2021-11-24 11:20:02 -05:00
James Rasell	6dddf9a1fb	Merge pull request #11535 from hashicorp/docs-vault-token docs: clarify vault.token only required on servers	2021-11-23 09:26:06 +01:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Luiz Aoqui	d3c1a03edd	Version 1.2.1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJhl94SAAoJELC0QQl2hbZ2pqoP/R7HyOxvealo5MBJcG4mGiWT Hsu9VXpYKDWn0GSXd3JmqYWH7tIwFMXispZ7pMlDLieypW3UpMYIbIquaePxOaRL yhlc0CLT7JDsFPx8Puv1fgKXaS3EfFyJlYx437bhCQ+K0k2+1n3EOhrzU/DQ4j8V D5qxlkZh6IK6brIJ54NivGzTxtzGGvIGXCrDPolX3cwoBtyO/pbecfEkRlN2xwxl P68l52+Jit3lK2Cljh4Kr1qFj8voHPjYUTXGas8ZkIVrx9l4fb6CHib2y3hy4bRR qwXT4keWc8bxtLQ7vtetGBAXp4UKJigziE4imhHAttBN9th2/Oy0qSQCNX3xELJC Jwgc+N+ON63QI2sP/8FWvmeUrJpASRITYl/Gr8uOR6n1PacrBhFT9OV4VMkte1ua jS/WF/7k21NZYqZca+thvN12wmw/gSEAEeCHH5kR3vPLeV6FdanhKLjufMNuMShc UKJCEZw1/Lyux1XkLqMPoZ4DCak8/HskupQoLNsekF1Uki8ObU4as7GERedxqkj6 i2+1QIQMqvviskOwT0QOWm4RFXjRQsIK8uUfXzHHWDMzDhvnGjB0eWVMLAj4/rTe 46yUP4kdarFkxwkDmLEyoogdD35wC4Xc8Y8IynzUTN77pOWID5QEyFZVaaBB4NR3 wNowUJGrNkxEYXwGSkjh =Zuw2 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmGbu3sACgkQKfRZwNnL tXMx4BAAksQ07tSoOku8zDwx2JpoiNApoYhMLlfJ4S3Mw+RYtbayAMRyA08GG56I U85XJB/Z2CzliYL/Nya1e3z6Gyn92V0iD9u7N1xEAPt8PdyiXqIBZn1rWoiCcnMO C3f2aRGhLZMVOZG0v7fgbh1PkhJt4MLcRQE9nn5ojPvFzW9bL0Iz7lc9IxHQtaU0 rANDcXdj3IhiOdEgjtO++Qhdeu3t2SBhT2xFnlJ3gXC2q/aY1a2C7BYdlSxtw0JU nKpxvBTsB7rINGcYxhXZlckui5YLL4BX11XqsYhUTMC+33vxE5HNty1ANc1+SNyO 0iHp0yc5J6MCLuiZ/2sBek2tC+KHCufb+qEIqPmBpcWPJRT8HjginLxj/HyL2TQc pLF9XxhYKvv0sm3Zr3Ima5kqWgayph3XhQ73hKs9f7SLfErr6qr4XaI8egZA4OTG 0QGmY/61UlAdsz5tUvIGRWYD5rqXyXIYnUprldPSQdeZ0o2GjX7T0GZ934O5uHfE Ne73GafGn8JaGxH9+AEHMJAVpkrzWR1wrExL3kGJ8NF40HlsYofIuhTkZqMKX3EH 7KfefSJW1NQAGeAEwjtvzhmUiM0cVoCWGd4COxX1G3oJ0o8gZ3RklDEA4Pa9C0rO pBW/KIckPpGieGvPaA3mqmXDjx6oOaxPi9wd5TniBHh43pgrASo= =KVce -----END PGP SIGNATURE----- Merge tag 'v1.2.1' into merge-release-1.2.1-branch Version 1.2.1	2021-11-22 10:47:04 -05:00
Tim Gross	fc1d4814d9	qemu: add `args_allowlist` to sandbox VM command line inputs The QEMU driver allows arbitrary command line options, but many of these options give access to host resources that operators may not want to expose such as devices. Add an optional allowlist to the plugin configuration so that operators can limit the resources for QEMU.	2021-11-19 11:11:52 -05:00
James Rasell	88cc158ae1	docs: add global query param to API job deregister endpoint.	2021-11-19 13:45:24 +01:00
Michael Schurter	cfe4922213	docs: clarify vault.token only required on servers While it is clarified toward the bottom of this page, I've seen people go to great lengths to configure tokens for clients anyway, so I think it's worth noting on the parameter's docs as well.	2021-11-18 16:34:59 -08:00
Luiz Aoqui	12feb598af	docs: add note about the Nomad APM autoscaling plugin and scaling cluster to zero (#11494 )	2021-11-16 11:58:26 -05:00
Luiz Aoqui	9a09fe160c	docs: remove mutual-exclusion between node class and datacenter in scaling policies (#11499 )	2021-11-16 11:58:14 -05:00
kfenech1	26a0158ead	docs: `nomad.client.unallocated.memory` is in Megabytes not bytes (#11468 )	2021-11-08 11:05:11 -05:00
Alessandro De Blasis	07c670fdc0	cli: show `host_network` in `nomad status` (#11432 ) Enhance the CLI in order to return the host network in two flavors (default, verbose) of the `node status` command. Fixes: #11223. Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2021-11-05 09:02:46 -04:00
James Rasell	503f201415	Merge pull request #11444 from hashicorp/b-update-apidocs-alloclist-sample-resp docs: update API alloc list sample response to be current.	2021-11-05 08:09:23 +01:00
Florian Apolloner	ef88795af3	Added a `-hcl2-strict` flag to allow for lenient hcl variable parsing. (#11284 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2021-11-04 16:33:09 +01:00
James Rasell	992abe6597	Merge pull request #11333 from hashicorp/assareh-patch-1 exactly one of ingress, terminating, or mesh must be configured	2021-11-04 11:13:04 +01:00
James Rasell	01ecb5b9ce	docs: update API alloc list same response to be current.	2021-11-04 10:22:21 +01:00
Michael Schurter	ef3fc79225	Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir client: never embed alloc_dir in chroot	2021-11-03 16:48:09 -07:00
Luiz Aoqui	4fb5b8b6e7	docs: update podman driver documentation (#11300 )	2021-11-03 11:07:44 -04:00
Luiz Aoqui	5d204c8ced	Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )" (#11433 )	2021-11-02 17:42:52 -04:00
James Rasell	163f2eadd0	Merge pull request #11425 from hashicorp/b-add-timeout-consul-docs docs: document Consul timeout config parameter.	2021-11-02 15:28:34 +01:00
James Rasell	c071efbd6b	Merge pull request #11411 from hashicorp/f-gh-11406 cli: add json and template flag opts to acl bootstrap command.	2021-11-02 09:48:25 +01:00
James Rasell	9d0fe24e25	docs: document Consul timeout config parameter.	2021-11-02 08:28:45 +01:00
James Rasell	46564ac579	docs: update acl bootstrap command to show json and template opts.	2021-10-29 09:01:58 +02:00
Pavel Alimpiev	068066cb0e	Fix typo in documentation	2021-10-29 03:31:53 +03:00
James Rasell	d6388db576	docs: clarify server data_dir config needs top-level data_dir cfg.	2021-10-28 13:07:37 +02:00
Dave May	509c74ce19	debug: update default node-id and docs (#11398 ) * debug: default node-id to all * debug: align cli help and website documentation	2021-10-27 13:43:56 -04:00

1 2 3 4 5 ...

549 commits