open-nomad

Author	SHA1	Message	Date
Tim Gross	d11da1df5c	workload identity: use parent ID for dispatch/periodic jobs (#13748 ) Workload identities grant implicit access to policies, and operators will not want to craft separate policies for each invocation of a periodic or dispatch job. Use the parent job's ID as the JobID claim.	2022-07-21 09:05:54 -04:00
Tim Gross	9c43c28575	search: use secure vars ACL policy for secure vars context (#13788 ) The search RPC used a placeholder policy for searching within the secure variables context. Now that we have ACL policies built for secure variables, we can use them for search. Requires a new loose policy for checking if a token has any secure variables access within a namespace, so that we can filter on specific paths in the iterator.	2022-07-21 08:39:36 -04:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
Tim Gross	428e23043c	secure vars: limit maximum size of variable data (#13743 ) To discourage accidentally DoS'ing the cluster with secure variables data, we're providing a very low limit to the maximum size of a given secure variable. This currently matches the limit for dispatch payloads. In future versions, we may increase this limit or make it configurable, once we have better metrics from real-world operators.	2022-07-20 14:46:43 -04:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Tim Gross	49ad3dc3ba	docs: document secure variables server config options (#13695 )	2022-07-20 14:13:39 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
Phil Renaud	301c6e57f5	Add a title to the evals route (#13865 )	2022-07-20 13:28:06 -04:00
Phil Renaud	ab7de3886c	Reorder the select boxes on evals so namespaces are first (#13866 ) * Reorder the select boxes on evals so namespaces are first * Wrap evals buttons in a button-bar for consistent styling and spacing	2022-07-20 13:27:58 -04:00
Phil Renaud	0982ad1079	Change path-linked-variables to start with nomad/jobs/, instead of jobs/ (#13862 ) * Support pathLinkedEntities starting with nomad/jobs/ instead of jobs/ * links from jobs/groups/tasks to variables now look for nomad/jobs/ instead of jobs/ * Tests updated to reflect nomad/jobs/ change * Acceptance test for disallowing nomad/foo/, and hint text updates * Defensive logic in case path not yet set * Allow exactly nomad/jobs as a variable path	2022-07-20 12:19:01 -04:00
Seth Hoenig	fe5edfcd38	Merge pull request #13859 from hashicorp/exp-use-set cleanup: example refactoring out map[string]struct{} using set.Set	2022-07-20 11:02:18 -05:00
Seth Hoenig	bd2935ee54	cleanup: tweaks from cr feedback	2022-07-20 10:42:35 -05:00
Seth Hoenig	93cfeb177b	cleanup: example refactoring out map[string]struct{} using set.Set This PR is a little demo of using github.com/hashicorp/go-set to replace the use of map[T]struct{} as a make-shift set.	2022-07-19 22:50:49 -05:00
Tim Gross	ea38582b40	secure vars: rename automatically accessible vars path for jobs (#13848 ) Tasks are automatically granted access to variables on a path that matches their workload identity, with a well-known prefix. Change the prefix to `nomad/jobs` to allow for future prefixes like `nomad/volumes` or `nomad/plugins`. Reserve the prefix by emitting errors during validation.	2022-07-19 16:17:34 -04:00
dependabot[bot]	da3470733c	build(deps): bump @percy/cli from 1.1.0 to 1.6.1 in /ui (#13724 ) Bumps [@percy/cli](https://github.com/percy/cli/tree/HEAD/packages/cli) from 1.1.0 to 1.6.1. - [Release notes](https://github.com/percy/cli/releases) - [Commits](https://github.com/percy/cli/commits/v1.6.1/packages/cli) --- updated-dependencies: - dependency-name: "@percy/cli" dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-07-19 14:09:11 -04:00
Luiz Aoqui	3dc701a8d0	docs: update Autoscaler AWS plugin with new ws_credential_provider config (#13779 )	2022-07-19 10:27:55 -04:00
Phil Renaud	e9ac38c93b	Prettier-applied lint rules for secure variables test (#13841 )	2022-07-19 09:33:53 -04:00
Niklas Hambüchen	422c83e97a	docs: job-specification: Explain that priority has no effect on run order (#13835 ) Makes the issues from #9845 and #12792 less surprising to the user.	2022-07-19 08:55:29 -04:00
Andy Assareh	e49c021792	word typo digestible (#13772 )	2022-07-19 09:00:52 +02:00
Phil Renaud	b6f32386aa	Visual Diff tests for Secure Variables (#13689 ) * A smattering of snapshot tests for Secure Variables * Percy imports and linting	2022-07-18 17:00:45 -04:00
Tim Gross	cfa2cb140e	fsm: one-time token expiration should be deterministic (#13737 ) When applying a raft log to expire ACL tokens, we need to use a timestamp provided by the leader so that the result is deterministic across servers. Use leader's timestamp from RPC call	2022-07-18 14:19:29 -04:00
Seth Hoenig	4dea14267d	Merge pull request #13813 from hashicorp/docs-move-checks docs: move checks into own page	2022-07-18 12:27:43 -05:00
Seth Hoenig	4459312541	docs: move checks into own page This PR creates a top-level 'check' page for job-specification docs. The content for checks is about half the content of the service page, and is about to increase in size when we add docs about Nomad service checks. Seemed like a good idea to just split the checks section out into its own thing (e.g. check_restart is already a topic). Doing the move first lets us backport this change without adding Nomad service check stuff yet. Mostly just a lift-and-shift but with some tweaked examples to de-emphasize the use of script checks.	2022-07-18 09:34:55 -05:00
Tim Gross	1e8978ca04	docs: ACL policy spec reference (#13787 ) The "Secure Nomad with Access Control" guide provides a tutorial for bootstrapping Nomad ACLs, writing policies, and creating tokens. Add a reference guide just for the ACL policy specification.	2022-07-18 09:35:28 -04:00
Seth Hoenig	db84428a7c	Merge pull request #13786 from hashicorp/b-metrics-for-classless-blocked-evals metrics: classless blocked evals get metrics	2022-07-18 07:34:29 -05:00
Luiz Aoqui	730f869b6b	docs: update Podman docs to v0.4.0 (#13783 )	2022-07-15 18:01:35 -04:00
Michael Schurter	e97548b5f8	Improve metrics reference documentation (#13769 ) * docs: tighten up parameterized job metrics docs * docs: improve alloc status descriptions Remove `nomad.client.allocations.start` as it doesn't exist.	2022-07-15 14:22:39 -07:00
Kyle Penfound	8157f442c3	packaging: restart nomad service after package update (#13773 )	2022-07-15 14:20:04 -07:00
Seth Hoenig	c23da281a1	metrics: even classless blocked evals get metrics This PR fixes a bug where blocked evaluations with no class set would not have metrics exported at the dc:class scope. Fixes #13759	2022-07-15 14:12:44 -05:00
Tim Gross	05cd91155d	keyring: fix flake in replication-after-election test (#13749 ) The test for simulating a key rotation across leader elections was flaky because we weren't waiting for a leader election and was checking the server configs rather than raft for which server was currently the leader. Fixing the flake revealed a bug in the test that we weren't ensuring the new leader was running its own replication, so it wouldn't pick up the key material from the previous follower.	2022-07-15 11:09:09 -04:00
Tim Gross	aa15e0fe7e	secure vars: updates should reduce quota tracking if smaller (#13742 ) When secure variables are updated, we were adding the update to the existing quota tracking without first checking whether it was an update to an existing variable. In that case we need to add/subtract only the difference between the new and existing quota usage.	2022-07-15 11:08:53 -04:00
Seth Hoenig	7a2e1e3372	Merge pull request #13771 from hashicorp/e2e-nsd-simple-lb e2e: add nsd simple load balancing test	2022-07-15 08:48:19 -05:00
Seth Hoenig	634d84edec	e2e: add nsd simple load balancing test	2022-07-14 15:07:19 -05:00
Michael Schurter	5414f49821	docs: clarify blocked_evals metrics (#13751 ) Related to #13740 - blocked_evals.total_blocked is the number of evals blocked for any reason - blocked_evals.total_quota_limit is the number of evals blocked by quota limits, but critically: their resources are not counted in the cpu/memory	2022-07-14 11:32:33 -07:00
Tim Gross	0cf8a580c7	search: refactor OSS/ENT split for ACL checks (#13760 ) The split between OSS/ENT in ACL checks for the Search RPC has a lot of repeated code that results in merge conflicts. Move most of the logic into the shared code so that we can call out to thin functions for ENT checks.	2022-07-14 11:31:08 -04:00
Luiz Aoqui	d73d0aac21	Merge pull request #13752 from hashicorp/post-1.3.2-release Post 1.3.2 release	2022-07-14 10:38:52 -04:00
Seth Hoenig	3a32220b3b	Merge pull request #13716 from hashicorp/docs-update-consul-warning docs: remove consul 1.12.0 warning	2022-07-14 08:45:56 -05:00
Tim Gross	cc9fb1c876	keyring: upserting key metadata in FSM must be deterministic (#13733 )	2022-07-14 08:38:14 -04:00
Luiz Aoqui	7d4917ba83	Merge release 1.3.2 files	2022-07-13 19:35:54 -04:00
hc-github-team-nomad-core	c2f95eecc1	Prepare for next release	2022-07-13 19:34:32 -04:00
hc-github-team-nomad-core	fa09c13016	Generate files for 1.3.2 release	2022-07-13 19:33:41 -04:00
Tim Gross	111933043a	tests: add a space between node name and timestamp (#13750 )	2022-07-13 16:23:03 -04:00
dependabot[bot]	d3d1199b81	chore(deps): bump github.com/mitchellh/mapstructure from 1.4.3 to 1.5.0 in /api (#12725 ) * chore(deps): bump github.com/mitchellh/mapstructure in /api Bumps [github.com/mitchellh/mapstructure](https://github.com/mitchellh/mapstructure) from 1.4.3 to 1.5.0. - [Release notes](https://github.com/mitchellh/mapstructure/releases) - [Changelog](https://github.com/mitchellh/mapstructure/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitchellh/mapstructure/compare/v1.4.3...v1.5.0) --- updated-dependencies: - dependency-name: github.com/mitchellh/mapstructure dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Also bump mapstructure in main go.mod Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-13 11:57:16 -07:00
Michael Schurter	d54d90edfa	http: only log alloc/exec errors when non-nil (#13730 )	2022-07-13 09:44:51 -07:00
Phil Renaud	29af6c6ea0	13553 secure vars linked from jobs (#13708 ) * Vars from job prototype * singular linked variable from job * Links from task groups and tasks to their variables incl periodic and parameterized * Lintfix * Make sure they can list em before we list em * Tests from job/group/task to var	2022-07-13 11:40:13 -04:00
dependabot[bot]	4b7253b33f	build(deps): bump github.com/gorilla/websocket from 1.4.2 to 1.5.0 in /api (#12075 ) * build(deps): bump github.com/gorilla/websocket in /api Bumps [github.com/gorilla/websocket](https://github.com/gorilla/websocket) from 1.4.2 to 1.5.0. - [Release notes](https://github.com/gorilla/websocket/releases) - [Commits](https://github.com/gorilla/websocket/compare/v1.4.2...v1.5.0) --- updated-dependencies: - dependency-name: github.com/gorilla/websocket dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * deps: also bump websocket dep in main binary Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-07-12 16:49:31 -07:00
dependabot[bot]	14fea78c23	build(deps): bump github.com/docker/distribution (#12246 ) Bumps [github.com/docker/distribution](https://github.com/docker/distribution) from 2.7.1+incompatible to 2.8.1+incompatible. - [Release notes](https://github.com/docker/distribution/releases) - [Commits](https://github.com/docker/distribution/compare/v2.7.1...v2.8.1) --- updated-dependencies: - dependency-name: github.com/docker/distribution dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-07-12 16:31:56 -07:00
Michael Schurter	e44d6f09d2	Add semgrep rule to catch non-determinism in FSM (#13725 ) See `message:` in rule for details. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-07-12 15:44:24 -07:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00

1 2 3 4 5 ...

23339 commits