open-nomad

Author	SHA1	Message	Date
Luiz Aoqui	3f1ea9da4b	api: set last index and request time on alloc stop (#16319 ) Some of the methods in `Allocations()` incorrectly use the `putQuery` in API calls where `put` is more appropriate since they are not reading information back. These methods are also not returning request metadata such as `LastIndex` back to callers, which can be useful to have in some scenarios. They also provide poor developer experience as they take an `api.Allocation` struct when only the allocation ID is necessary. This can lead consumers to make unnecessary API calls to fetch the full allocation. Fixing these problems require updating the methods' signatures so they take `WriteOptions` instead of `QueryOptions` and return `WriteMeta`, but this is a breaking change that requires advanced notice to consumers. This commit adds a future breaking change notice and also fixes the `Stop` method so it properly returns request metadata in a backwards compatible way.	2023-03-03 15:52:41 -05:00
Tim Gross	3c0eaba9db	remove backcompat support for non-atomic job registration (#16305 ) In Nomad 0.12.1 we introduced atomic job registration/deregistration, where the new eval was written in the same raft entry. Backwards-compatibility checks were supposed to have been removed in Nomad 1.1.0, but we missed that. This is long safe to remove.	2023-03-03 15:52:22 -05:00
Luiz Aoqui	40494e64a9	docs: fix alloc stop `no_shutdown_delay` (#16282 )	2023-03-03 14:44:49 -05:00
Luiz Aoqui	1d051d834d	cli: use shared logic for resolving job prefix (#16306 ) Several `nomad job` subcommands had duplicate or slightly similar logic for resolving a job ID from a CLI argument prefix, while others did not have this functionality at all. This commit pulls the shared logic to the command Meta and updates all `nomad job` subcommands to use it.	2023-03-03 14:43:20 -05:00
Tim Gross	8747059b86	service: fix regression in task access to list/read endpoint (#16316 ) When native service discovery was added, we used the node secret as the auth token. Once Workload Identity was added in Nomad 1.4.x we needed to use the claim token for `template` blocks, and so we allowed valid claims to bypass the ACL policy check to preserve the existing behavior. (Invalid claims are still rejected, so this didn't widen any security boundary.) In reworking authentication for 1.5.0, we unintentionally removed this bypass. For WIs without a policy attached to their job, everything works as expected because the resulting `acl.ACL` is nil. But once a policy is attached to the job the `acl.ACL` is no longer nil and this causes permissions errors. Fix the regression by adding back the bypass for valid claims. In future work, we should strongly consider getting turning the implicit policies into real `ACLPolicy` objects (even if not stored in state) so that we don't have these kind of brittle exceptions to the auth code.	2023-03-03 11:41:19 -05:00
Dao Thanh Tung	62a69552c1	api: add new test case for force-leave (#16260 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-03-03 10:38:40 -05:00
Aofei Sheng	e81fecdd1f	docs: fix typos in task-api.mdx and workload-identity.mdx (#16309 )	2023-03-03 08:37:59 -05:00
Valentino	1f9d11feff	Add namespace argument to the job verification help text (#16243 )	2023-03-02 16:42:14 -05:00
Dao Thanh Tung	ed31e0a5f5	cli: sort Node value in `nomad operator raft list-peers` command (#16221 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-03-02 16:16:30 -05:00
Michael Schurter	4b01df1787	Merge pull request #16293 from hashicorp/post-1.5.0-release admin: Post 1.5.0 release	2023-03-02 12:44:49 -08:00
Phil Renaud	93574ce085	[ui, helios] Toast Component (#16099 ) * Template and styles * @type to @color on flash messages * Notifications service as wrapper * Test cases updated for new notifs	2023-03-02 13:52:16 -05:00
Tim Gross	0e1b554299	handle `FSM.Apply` errors in `raftApply` (#16287 ) The signature of the `raftApply` function requires that the caller unwrap the first returned value (the response from `FSM.Apply`) to see if it's an error. This puts the burden on the caller to remember to check two different places for errors, and we've done so inconsistently. Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error as the error value. Similar work was done in Consul in https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate and surfaces a few minor bugs in the process: * job deregistrations of already-GC'd jobs were still emitting evals * reconcile job summaries does not return scheduler errors * node updates did not report errors associated with inconsistent service discovery or CSI plugin states Note that although _most_ of the `FSM.Apply` functions return only errors (which makes it tempting to remove the first return value entirely), there are few that return `bool` for some reason and Variables relies on the response value for proper CAS checking.	2023-03-02 13:51:09 -05:00
Tim Gross	f3b5952c3e	deps: update go-plugin to 1.4.9 (#16292 ) Fixes #16288. An earlier version of `go-plugin` introduced a warning log if `SecureConfig` is unset. For Nomad and other applications that have "internal" `go-plugin` consumers where the application runs itself as a plugin, this causes spurious warn-level logs. For Nomad in particular this means every task driver and logmon invocation emits the log, which is our primary operation. The change was reverted upstream, so this changeset picks up the reverted version.	2023-03-02 13:39:57 -05:00
Farbod Ahmadian	629ac58763	tests: add functionality to skip a test if it's not running in CI and not with root user (#16222 )	2023-03-02 13:38:27 -05:00
Tim Gross	bb4880ec13	client: use RPC address and not serf after initial Consul discovery (#16217 ) Nomad servers can advertise independent IP addresses for `serf` and `rpc`. Somewhat unexpectedly, the `serf` address is also used for both Serf and server-to-server RPC communication (including Raft RPC). The address advertised for `rpc` is only used for client-to-server RPC. This split was introduced intentionally in Nomad 0.8. When clients are using Consul discovery for connecting to servers, they get an initial discovery set from Consul and use the correct `rpc` tag in Consul to get a list of adddresses for servers. The client then makes a `Status.Peers` RPC to get the list of those servers that are raft peers. But this endpoint is shared between servers and clients, and provides the address used for Raft. Most of the time this is harmless because servers will bind on 0.0.0.0 anyways., But in topologies where servers are on a private network and clients are on separate subnets (or even public subnets), clients will make initial contact with the server to get the list of peers but then populate their local server set with unreachable addresses. Cluster administrators can work around this problem by using `server_join` with specific IP addresses (or DNS names), because the `Node.UpdateStatus` endpoint returns the correct set of RPC addresses when updating the node. So once a client has registered, it will get the correct set of RPC addresses. This changeset updates the client logic to query `Status.Members` instead of `Status.Peers`, and then extract the correctly advertised address and port from the response body.	2023-03-02 13:36:45 -05:00
James Rasell	7eb9e12829	Merge release 1.5.0 files	2023-03-02 17:47:07 +00:00
hc-github-team-nomad-core	de3ee79482	Prepare for next release	2023-03-02 17:44:13 +00:00
hc-github-team-nomad-core	646bdf901c	Generate files for 1.5.0 release	2023-03-02 17:44:13 +00:00
James Rasell	19e5e74dee	prepare release 1.5.0	2023-03-02 17:44:13 +00:00
James Rasell	b57d37c070	Merge pull request #16284 from hashicorp/post-1.5.0-rc.1-release admin: post 1.5.0 rc.1 release	2023-03-02 09:31:08 +01:00
hc-github-team-nomad-core	410aad7847	Prepare for next release	2023-03-01 08:09:07 +00:00
hc-github-team-nomad-core	1f2a525341	Generate files for 1.5.0-rc.1 release	2023-03-01 08:09:07 +00:00
Michael Schurter	320c67643c	Prepare release 1.5.0-rc.1	2023-03-01 08:09:05 +00:00
Michael Schurter	bd7b60712e	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Daniel Bennett	39e3a1ac3e	build/cli: Add BuildDate (#16216 ) * build: add BuildDate to version info will be used in enterprise to compare to license expiration time * cli: multi-line version output, add BuildDate before: $ nomad version Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww) after: $ nomad version Nomad v1.5.0-dev BuildDate 2023-02-17T19:29:26Z Revision coolfakecommithashomgoshsuchacoolonewoww compare consul: $ consul version Consul v1.14.4 Revision dae670fe Build Date 2023-01-26T15:47:10Z Protocol 2 spoken by default, blah blah blah... and vault: $ vault version Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z * docs: update version command output	2023-02-27 11:27:40 -06:00
Tim Gross	79844048e6	populate Nomad token for task runner update hooks (#16266 ) The `TaskUpdateRequest` struct we send to task runner update hooks was not populating the Nomad token that we get from the task runner (which we do for the Vault token). This results in task runner hooks like the template hook overwriting the Nomad token with the zero value for the token. This causes in-place updates of a task to break templates (but not other uses that rely on identity but don't currently bother to update it, like the identity hook).	2023-02-27 10:48:13 -05:00
Tim Gross	4c9688271a	CSI: fix potential state store corruptions (#16256 ) The `CSIVolume` struct has references to allocations that are "denormalized"; we don't store them on the `CSIVolume` struct but hydrate them on read. Tests detecting potential state store corruptions found two locations where we're not copying the volume before denormalizing: * When garbage collecting CSI volume claims. * When checking if it's safe to force-deregister the volume. There are no known user-visible problems associated with these bugs but both have the potential of mutating volume claims outside of a FSM transaction. This changeset also cleans up state mutations in some CSI tests so as to avoid having working tests cover up potential future bugs.	2023-02-27 08:47:08 -05:00
Michael Schurter	62ac8c561d	agent: only reload HTTP servers that use TLS (#16250 ) * agent: only reload HTTP servers that use TLS * shutdown task api before client and improve names Fixes #16239	2023-02-23 12:03:44 -08:00
Seth Hoenig	61404b2551	services: Set Nomad's User-Agent by default on HTTP checks for nomad services (#16248 )	2023-02-23 08:10:42 -06:00
dependabot[bot]	ddb2c309e8	build(deps): bump golang.org/x/net from 0.5.0 to 0.7.0 (#16220 ) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.5.0 to 0.7.0. - [Release notes](https://github.com/golang/net/releases) - [Commits](https://github.com/golang/net/compare/v0.5.0...v0.7.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-22 11:13:25 -06:00
Dao Thanh Tung	ea54f46425	Fix missing query parameter in job doc (#16233 ) Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>	2023-02-22 10:28:32 -06:00
Seth Hoenig	804f9fdb93	services: ensure task group is set on service hook (#16240 ) This PR fixes a bug where the task group information was not being set on the serviceHook.AllocInfo struct, which is needed later on for calculating the CheckID of a nomad service check. The CheckID is calculated independently from multiple callsites, and the information being passed in must be consistent, including the group name. The workload.AllocInfo.Group was not set at this callsite, due to the bug fixed in this PR. https://github.com/hashicorp/nomad/blob/main/client/serviceregistration/nsd/nsd.go#L114	2023-02-22 10:22:48 -06:00
Seth Hoenig	c9ffd1274b	api: fix a panic and tweak some exported types (#16237 ) This PR - fixes a panic in GetItems when looking up a variable that does not exist. - deprecates GetItems in favor of GetVariableItems which avoids returning a pointer to a map - deprecates ErrVariableNotFound in favor of ErrVariablePathNotFound which is an actual error type - does some minor code cleanup to make linters happier	2023-02-22 08:17:22 -06:00
Michael Schurter	d9587b323a	Task API / Dynamic Node Metadata E2E test fixes (#16219 ) * taskapi: return Forbidden on bad credentials Prior to this change a "Server error" would be returned when ACLs are enabled which did not match when ACLs are disabled. * e2e: love love love datacenter wildcard default * e2e: skip windows nodes on linux only test The Logfs are a bit weird because they're most useful when converted to Printfs to make debugging the test much faster, but that makes CI noisy. In a perfect world Go would expose how many tests are being run and we could stream output live if there's only 1. For now I left these helpful lines in as basically glorified comments.	2023-02-21 10:53:10 -08:00
Tim Gross	e23ed85d57	E2E: add multi-home networking to test infrastructure (#16218 ) Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet we have provisioned in each AZ. Revise security groups as follows: * Split out client security groups from servers so that we can't have clients accidentally accessing serf addresses or other unexpected cross-talk. * Add new security groups for the secondary subnet that only allows communication within the security group so we can exercise behaviors with multiple IPs. This changeset doesn't include any Nomad configuration changes needed to take advantage of the extra network interface. I'll include those with testing for PR #16217.	2023-02-20 10:08:28 +01:00
Seth Hoenig	b9e2a4b483	docs: slight tidy up of var create example payload (#16212 )	2023-02-17 13:12:39 -06:00
Michael Schurter	f13f022176	docs: clarify sysbatch supports count (#16205 ) Also remove old version indicators. --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-02-17 10:51:38 -08:00
James Rasell	8295d0e516	acl: add validation to binding rule selector on upsert. (#16210 ) * acl: add validation to binding rule selector on upsert. * docs: add more information on binding rule selector escaping.	2023-02-17 15:38:55 +01:00
Phil Renaud	e54f80430d	Count and comments added to hello-world (#16162 )	2023-02-17 09:29:31 -05:00
Alessio Perugini	4e9ec24b22	Allow configurable range of Job priorities (#16084 )	2023-02-17 09:23:13 -05:00
Michele Degges	279929df38	[CI only] Prepare workflow rollout (#15600 )	2023-02-16 15:51:59 -08:00
Charlie Voiselle	c28c0eb6bc	[cli] var put - Add extension parsing to second argument when file (#16181 )	2023-02-16 13:43:01 -05:00
Michael Schurter	671d9f64ec	Minor post-1.5-beta1 API, code, and docs cleanups (#16193 ) * api: return error on parse failure * docs: clarify anonymous policy with task api	2023-02-16 10:32:21 -08:00
Tim Gross	27cc6a2ff9	fix test flake for RPC TLS enforcement test (#16199 ) The RPC TLS enforcement test was frequently failing with broken connections. The most likely cause was that the tests started to run before the server had started its RPC server. Wait until it self-elects to ensure that the RPC server is up. This seems to have corrected the error; I ran this 3 times without a failure (even accounting for `gotestsum` retries). Also, fix a minor test bug that didn't impact the test but showed an incorrect usage for `Status.Ping.`	2023-02-16 11:50:40 -05:00
Farbod Ahmadian	6e9ee969ad	build: correct Makefile to run smoke tests locally (#16137 )	2023-02-16 10:58:39 -05:00
visweshs123	fbc51dd190	csi: add option to configure CSIVolumeClaimGCInterval (#16195 )	2023-02-16 10:41:15 -05:00
dependabot[bot]	eaf0be1aba	build(deps): bump github.com/containerd/containerd from 1.6.12 to 1.6.18 (#16198 ) Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.12 to 1.6.18. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.6.12...v1.6.18) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-02-16 10:40:21 -05:00
Tim Gross	4fabad7f61	cli: `fmt -check` should return early on diff (#16174 ) The `nomad fmt -check` command incorrectly writes to file because we didn't return before writing the file on a diff. Fix this bug and update the command internals to differentiate between the write-to-file and write-to-stdout code paths, which are activated by different combinations of options and flags. The docstring for the `-list` and `-write` flags is also unclear and can be easily misread to be the opposite of the actual behavior. Clarify this and fix up the docs to match. This changeset also refactors the tests quite a bit so as to make the test outputs clear when something is incorrect.	2023-02-15 14:06:31 -05:00
Tim Gross	1bbabdea37	Merge release 1.4.4 changelog entries (#16190 )	2023-02-15 13:51:19 -05:00
Seth Hoenig	5d325decca	cgutil: handle panic from runc helper method (#16180 ) This PR wraps the cgroups.IsCgroup2UnifiedMode() helper method from runc in a defer/recover block because it might panic in some cases. Upstream fix in: https://github.com/opencontainers/runc/pull/3745 Closes #16179	2023-02-14 15:09:43 -06:00

... 3 4 5 6 7 ...

24555 commits