open-nomad

Author	SHA1	Message	Date
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
Luiz Aoqui	3dc701a8d0	docs: update Autoscaler AWS plugin with new ws_credential_provider config (#13779 )	2022-07-19 10:27:55 -04:00
Niklas Hambüchen	422c83e97a	docs: job-specification: Explain that priority has no effect on run order (#13835 ) Makes the issues from #9845 and #12792 less surprising to the user.	2022-07-19 08:55:29 -04:00
Andy Assareh	e49c021792	word typo digestible (#13772 )	2022-07-19 09:00:52 +02:00
Seth Hoenig	4dea14267d	Merge pull request #13813 from hashicorp/docs-move-checks docs: move checks into own page	2022-07-18 12:27:43 -05:00
Seth Hoenig	4459312541	docs: move checks into own page This PR creates a top-level 'check' page for job-specification docs. The content for checks is about half the content of the service page, and is about to increase in size when we add docs about Nomad service checks. Seemed like a good idea to just split the checks section out into its own thing (e.g. check_restart is already a topic). Doing the move first lets us backport this change without adding Nomad service check stuff yet. Mostly just a lift-and-shift but with some tweaked examples to de-emphasize the use of script checks.	2022-07-18 09:34:55 -05:00
Tim Gross	1e8978ca04	docs: ACL policy spec reference (#13787 ) The "Secure Nomad with Access Control" guide provides a tutorial for bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference guide just for the ACL policy specification.	2022-07-18 09:35:28 -04:00
Luiz Aoqui	730f869b6b	docs: update Podman docs to v0.4.0 (#13783 )	2022-07-15 18:01:35 -04:00
Michael Schurter	e97548b5f8	Improve metrics reference documentation (#13769 ) * docs: tighten up parameterized job metrics docs * docs: improve alloc status descriptions Remove `nomad.client.allocations.start` as it doesn't exist.	2022-07-15 14:22:39 -07:00
Michael Schurter	5414f49821	docs: clarify blocked_evals metrics (#13751 ) Related to #13740 - blocked_evals.total_blocked is the number of evals blocked for any reason - blocked_evals.total_quota_limit is the number of evals blocked by quota limits, but critically: their resources are not counted in the cpu/memory	2022-07-14 11:32:33 -07:00
Seth Hoenig	3a32220b3b	Merge pull request #13716 from hashicorp/docs-update-consul-warning docs: remove consul 1.12.0 warning	2022-07-14 08:45:56 -05:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Seth Hoenig	a9fa48f3db	docs: remove consul 1.12.0 warning	2022-07-12 09:53:17 -05:00
Tim Gross	fc4cd53cfb	docs: rename Internals to Concepts (#13696 )	2022-07-11 16:55:33 -04:00
Tim Gross	d49ff0175c	docs: move operator subcommands under their own trees (#13677 ) The sidebar navigation tree for the `operator` sub-sub commands is getting cluttered and we have a new set of commands coming to support secure variables keyring as well. Move these all under their own subtrees.	2022-07-11 14:00:24 -04:00
Seth Hoenig	ed2f2b1a75	docs: move upgrade docs for max_client_timeout Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-07 16:46:26 -05:00
Seth Hoenig	905e673553	docs: upgrade guide for client max_kill_timeout	2022-07-07 15:27:40 -05:00
Luiz Aoqui	03433dd8af	cli: improve output of eval commands (#13581 ) Use the same output format when listing multiple evals in the `eval list` command and when `eval status <prefix>` matches more than one eval. Include the eval namespace in all output formats and always include the job ID in `eval status` since, even `node-update` evals are related to a job. Add Node ID to the evals table output to help differentiate `node-update` evals. Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 13:13:34 -04:00
Ted Behling	6a032a54d2	driver/docker: Don't pull InfraImage if it exists (#13265 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 17:44:06 +02:00
Seth Hoenig	b9fe6c8d2c	docs: fixup from cr comments	2022-07-07 08:37:10 -05:00
Seth Hoenig	1c31ef285e	docs: add docs for simple load balancing nomad services This PR adds a section to template docs for simple load balancing with nomad servicse.	2022-07-06 17:34:30 -05:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Michelle Noorali	f227855de1	doc: explain permissions for Vault sys/capabilties-self	2022-07-06 10:01:30 -04:00
Yann Coleu	fe64f8cdd7	docs: typo on command word (#13582 )	2022-07-05 16:24:25 -04:00
Steven Collins	ab97650098	docs: Add 'serial' attribute to usb driver (#13547 )	2022-07-05 16:23:04 -04:00
Seth Hoenig	97726c2fd8	Merge pull request #12862 from hashicorp/f-choose-services api: enable selecting subset of services using rendezvous hashing	2022-06-30 15:17:40 -05:00
Derek Strickland	47e3b28dba	docs: update task leader to explain shutdown sequence. (#13498 ) * docs: update task leader to explain shutdown sequence.	2022-06-29 05:13:45 -04:00
James Rasell	d21e4abe3f	docs: fixup HCL2 index collection function documentation. (#13511 )	2022-06-28 18:27:38 +02:00
Andrew	3a87406f2f	Fix typo in Docker docs (#13497 )	2022-06-28 11:05:50 +02:00
Seth Hoenig	9467bc9eb3	api: enable selecting subset of services using rendezvous hashing This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint. The value of 'choose' is in the form '<number>\|<key>', number is the number of desired services and key is a value unique but consistent to the requester (e.g. allocID). Folks aren't really expected to use this API directly, but rather through consul-template which will soon be getting a new helper function making use of this query parameter. Example, curl 'localhost:4646/v1/service/redis?choose=2\|abc123' Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.	2022-06-25 10:37:37 -05:00
Seth Hoenig	91e08d5e23	core: remove support for raft protocol version 2 This PR checks server config for raft_protocol, which must now be set to 3 or unset (0). When unset, version 3 is used as the default.	2022-06-23 14:37:50 +00:00
Michael Schurter	7b7c72b21d	docs: clarify total_escaped is just an optimization (#13460 )	2022-06-22 11:39:56 -07:00
Elijah Voigt	665b198968	Lob.com uses Nomad too! (#13295 ) Lob.com has been ramping up our use of Nomad for ~6 months. Now that we've started blogging about it we'd love to be on the _official_ list.	2022-06-21 09:10:08 -04:00
Derek Strickland	a15cef689d	Improve Autoscaler overview (#13396 ) Improve Autoscaler overview documentation.	2022-06-17 05:15:22 -04:00
Nick Wales	3a8c8250f4	Merge pull request #13401 from nickwales/tls_typo Updates TLS documentation	2022-06-16 12:34:59 -05:00
Arthur Leclerc	d98a9b1d72	docs: Fix typo (#13389 )	2022-06-16 13:24:18 -04:00
Nick Wales	c964ae0135	Updates TLS documentation	2022-06-16 12:15:40 -05:00
James Hu	7e3d21646d	Fix spelling error (#13397 )	2022-06-16 12:41:49 -04:00
Luiz Aoqui	6598567725	docs: create volume spec page (#13353 ) In addition to jobs, there are other objects in Nomad that have a specific format and can be provided to commands and API endpoints. This commit creates a new menu section to hold the specification for volumes and update the command pages to point to the new centralized definition. Redirecting the previous entries is not possible with `redirect.js` because they are done server-side and URL fragments are not accessible to detect a match. So we provide hidden anchors with a link to the new page to guide users towards the new documentation. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-14 14:08:25 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Michael Schurter	f41ea0e5dc	docs: explain behavior of system gc command (#13342 )	2022-06-13 09:54:23 +02:00
Derek Strickland	5ebd06a8f9	template: improve default language for max_stale and wait (#13334 ) * template: improve default language for max_stale and wait Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-06-10 14:34:25 -04:00
Daniel Rossbach	8c52c03c8c	qemu driver: Add option to configure drive_interface (#11864 )	2022-06-10 10:03:51 -04:00
Raffaele Di Fazio	66938e0ef0	Update supplement.mdx with the right GitHub spelling (#13326 )	2022-06-10 11:46:19 +02:00
phreakocious	94a78597d2	Add `guest_agent` config option for QEMU driver (#12800 ) Add boolean 'guest_agent' config option for QEMU driver, which will create the socket file for the QEMU Guest Agent in the task dir when enabled.	2022-06-09 09:21:38 -04:00
Derek Strickland	34dea90d7a	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
Derek Strickland	13ea5ae87a	consul-template: Add fault tolerant defaults (#13041 ) consul-template: Add fault tolerant defaults Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-06-08 14:08:25 -04:00
Shantanu Gadgil	43d8baace0	`heartbeat_grace` is a `server` parameter (#13288 ) `heartbeat_grace` is a `server` parameter, not a `client` parameter.	2022-06-08 10:49:23 -04:00

1 2 3 4 5 ...

564 commits