open-nomad

Commit Graph

Author	SHA1	Message	Date
Lance Haig	568da5918b	cli: tls certs not created with correct SANs (#16959 ) The `nomad tls cert` command did not create certificates with the correct SANs for them to work with non default domain and region names. This changset updates the code to support non default domains and regions in the certificates.	2023-05-22 09:31:56 -04:00
Roberto Hidalgo	2f702a9f11	allow periodic jobs to use workload identity ACL policies (#17018 ) When resolving ACL policies, we were not using the parent ID for the policy lookup for dispatch/periodic jobs, even though the claims were signed for that parent ID. This prevents all calls to the Task API (and other WI-authenticated API calls) from a periodically-dispatched job failing with 403. Fix this by using the parent job ID whenever it's available.	2023-05-22 09:19:16 -04:00
Phil Renaud	7e56ca62d1	[ui] Adds a "Scheduling" filter to the job.allocations page (#17227 ) * Basic filter concept * Make sure NextAllocation gets sent up with allocation stub	2023-05-18 16:24:41 -04:00
James Rasell	96f7c84e4e	variable: fixup metadata copy comment and remove unrequired type. (#17234 )	2023-05-18 13:49:41 +01:00
Piotr Kazmierczak	fe272c3686	refactor acl.UpsertTokens to avoid unnecessary RPC calls. (#17194 ) New RPC endpoints introduced during OIDC and JWT auth perform unnecessary many RPC calls when they upsert generated ACL tokens, as pointed out by @tgross. This PR moves the common logic from acl.UpsertTokens method into a helper method that contains common logic, and sidesteps authentication, metrics, etc.	2023-05-16 09:31:51 +02:00
Luiz Aoqui	389212bfda	node pool: initial base work (#17163 ) Implementation of the base work for the new node pools feature. It includes a new `NodePool` struct and its corresponding state store table. Upon start the state store is populated with two built-in node pools that cannot be modified nor deleted: * `all` is a node pool that always includes all nodes in the cluster. * `default` is the node pool where nodes that don't specify a node pool in their configuration are placed.	2023-05-15 10:49:08 -04:00
Seth Hoenig	81e36b3650	core: eliminate second index on job_submissions table (#17146 ) * core: eliminate second index on job_submissions table This PR refactors the job_submissions state store code to eliminate the use of a second index formerly used for purging all versions of a given job. In practice we ended up with duplicate entries on the table. Instead, use index prefix scanning on the primary index and tidy up any potential for creating (or removing) duplicates. * core: pr comments followup	2023-05-11 09:51:08 -05:00
Tim Gross	9ed75e1f72	client: de-duplicate alloc updates and gate during restore (#17074 ) When client nodes are restarted, all allocations that have been scheduled on the node have their modify index updated, including terminal allocations. There are several contributing factors: * The `allocSync` method that updates the servers isn't gated on first contact with the servers. This means that if a server updates the desired state while the client is down, the `allocSync` races with the `Node.ClientGetAlloc` RPC. This will typically result in the client updating the server with "running" and then immediately thereafter "complete". * The `allocSync` method unconditionally sends the `Node.UpdateAlloc` RPC even if it's possible to assert that the server has definitely seen the client state. The allocrunner may queue-up updates even if we gate sending them. So then we end up with a race between the allocrunner updating its internal state to overwrite the previous update and `allocSync` sending the bogus or duplicate update. This changeset adds tracking of server-acknowledged state to the allocrunner. This state gets checked in the `allocSync` before adding the update to the batch, and updated when `Node.UpdateAlloc` returns successfully. To implement this we need to be able to equality-check the updates against the last acknowledged state. We also need to add the last acknowledged state to the client state DB, otherwise we'd drop unacknowledged updates across restarts. The client restart test has been expanded to cover a variety of allocation states, including allocs stopped before shutdown, allocs stopped by the server while the client is down, and allocs that have been completely GC'd on the server while the client is down. I've also bench tested scenarios where the task workload is killed while the client is down, resulting in a failed restore. Fixes #16381	2023-05-11 09:05:24 -04:00
Seth Hoenig	74714272cc	api: set the job submission during job reversion (#17097 ) * api: set the job submission during job reversion This PR fixes a bug where the job submission would always be nil when a job goes through a reversion to a previous version. Basically we need to detect when this happens, lookup the submission of the job version being reverted to, and set that as the submission of the new job being created. * e2e: add e2e test for job submissions during reversion This e2e test ensures a reverted job inherits the job submission associated with the version of the job being reverted to.	2023-05-08 14:18:34 -05:00
Daniel Bennett	a7ed6f5c53	full task cleanup when alloc prerun hook fails (#17104 ) to avoid leaking task resources (e.g. containers, iptables) if allocRunner prerun fails during restore on client restart. now if prerun fails, TaskRunner.MarkFailedKill() will only emit an event, mark the task as failed, and cancel the tr's killCtx, so then ar.runTasks() -> tr.Run() can take care of the actual cleanup. removed from (formerly) tr.MarkFailedDead(), now handled by tr.Run(): * set task state as dead * save task runner local state * task stop hooks also done in tr.Run() now that it's not skipped: * handleKill() to kill tasks while respecting their shutdown delay, and retrying as needed * also includes task preKill hooks * clearDriverHandle() to destroy the task and associated resources * task exited hooks	2023-05-08 13:17:10 -05:00
stswidwinski	9c1c2cb5d2	Correct the status description and modify time of canceled evals. (#17071 ) Fix for #17070. Corrected the status description and modify time of evals which are canceled due to another eval having completed in the meantime.	2023-05-08 08:50:36 -04:00
Seth Hoenig	fff2eec625	connect: use heuristic to detect sidecar task driver (#17065 ) * connect: use heuristic to detect sidecar task driver This PR adds a heuristic to detect whether to use the podman task driver for the connect sidecar proxy. The podman driver will be selected if there is at least one task in the task group configured to use podman, and there are zero tasks in the group configured to use docker. In all other cases the task driver defaults to docker. After this change, we should be able to run typical Connect jobspecs (e.g. nomad job init [-short] -connect) on Clusters configured with the podman task driver, without modification to the job files. Closes #17042 * golf: cleanup driver detection logic	2023-05-05 10:19:30 -05:00
James Rasell	6ec4a69f47	scale: fixed a bug where evals could be created with wrong type. (#17092 ) The job scale RPC endpoint hard-coded the eval creation to use the type of service. This meant scaling events triggered on jobs of type batch would create evaluations with the wrong type, which does not seem to cause any problems, just confusion when correlating the two.	2023-05-05 14:46:10 +01:00
Tim Gross	17bd930ca9	logs: fix missing allocation logs after update to Nomad 1.5.4 (#17087 ) When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076	2023-05-04 16:01:18 -04:00
Michael Schurter	3b3b02b741	dep: update from jwt/v4 to jwt/v5 (#17062 ) Their release notes are here: https://github.com/golang-jwt/jwt/releases Seemed wise to upgrade before we do even more with JWTs. For example this upgrade would have mattered if we already implemented common JWT claims such as expiration. Since we didn't rely on any claim verification this upgrade is a noop... ...except for 1 test that called `Claims.Valid()`! Removing that assertion seems scary, but it didn't actually do anything because we didn't implement any of the standard claims it validated: https://github.com/golang-jwt/jwt/blob/v4.5.0/map_claims.go#L120-L151 So functionally this major upgrade is a noop.	2023-05-03 11:17:38 -07:00
Luiz Aoqui	7b5a8f1fb0	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.	2023-05-01 17:18:34 -04:00
Michael Schurter	d3b0bbc088	deps: update go-bexpr from 0.1.11 to 0.1.12 (#16991 ) Pulls in https://github.com/hashicorp/go-bexpr/pull/38 Fixes #16758	2023-04-27 09:01:42 -07:00
James Rasell	ac98c2ed40	vars: ensure struct reciever names are consistent. (#16995 )	2023-04-27 13:51:11 +01:00
James Rasell	4d2c1403c2	scale: do not allow scaling of jobs with type system. (#16969 )	2023-04-25 15:47:44 +01:00
Tim Gross	72cbe53f19	logs: allow disabling log collection in jobspec (#16962 ) Some Nomad users ship application logs out-of-band via syslog. For these users having `logmon` (and `docker_logger`) running is unnecessary overhead. Allow disabling the logmon and pointing the task's stdout/stderr to /dev/null. This changeset is the first of several incremental improvements to log collection short of full-on logging plugins. The next step will likely be to extend the internal-only task driver configuration so that cluster administrators can turn off log collection for the entire driver. --- Fixes: #11175 Co-authored-by: Thomas Weber <towe75@googlemail.com>	2023-04-24 10:00:27 -04:00
valodzka	379497a484	fix host port handling for ipv6 (#16723 )	2023-04-20 19:53:20 -07:00
James Rasell	367cfa6d93	rpc: use "+" concatination in hot path RPC rate limit metrics. (#16923 )	2023-04-18 13:41:34 +01:00
Ian Fijolek	619f49afcf	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
Tim Gross	62548616d4	client: allow `drain_on_shutdown` configuration (#16827 ) Adds a new configuration to clients to optionally allow them to drain their workloads on shutdown. The client sends the `Node.UpdateDrain` RPC targeting itself and then monitors the drain state as seen by the server until the drain is complete or the deadline expires. If it loses connection with the server, it will monitor local client status instead to ensure allocations are stopped before exiting.	2023-04-14 15:35:32 -04:00
Tim Gross	5a9abdc469	drain: use client status to determine drain is complete (#14348 ) If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`, the node drain is marked as complete prematurely, even though drain monitoring will continue to report allocation migrations. This impacts the UI or API clients that monitor node draining to shut down nodes. This changeset updates the behavior to wait until the client status of all drained allocs are terminal before marking the node as done draining.	2023-04-13 08:55:28 -04:00
James Rasell	b7a41fe48d	core: ensure all Server receiver names are consistent. (#16859 )	2023-04-12 14:03:07 +01:00
Juana De La Cuesta	8302085384	Deployment Status Command Does Not Respect -namespace Wildcard (#16792 ) * func: add namespace support for list deployment * func: add wildcard to namespace filter for deployments * Update deployment_endpoint.go * style: use must instead of require or asseert * style: rename paginator to avoid clash with import * style: add changelog entry * fix: add missing parameter for upsert jobs	2023-04-12 11:02:14 +02:00
Tim Gross	a9a350cfdb	drainer: fix codec race condition in integration test (#16845 ) msgpackrpc codec handles are specific to a connection and cannot be shared between goroutines; this can cause corrupted decoding. Fix the drainer integration test so that we create separate codecs for the goroutines that the test helper spins up to simulate client updates. This changeset also refactors the drainer integration test to bring it up to current idioms and library usages, make assertions more clear, and reduce duplication.	2023-04-11 14:31:13 -04:00
Seth Hoenig	ba728f8f97	api: enable support for setting original job source (#16763 ) * api: enable support for setting original source alongside job This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped. * api: avoid writing var content to disk for parsing * api: move submission validation into RPC layer * api: return an error if updating a job submission without namespace or job id * api: be exact about the job index we associate a submission with (modify) * api: reword api docs scheduling * api: prune all but the last 6 job submissions * api: protect against nil job submission in job validation * api: set max job source size in test server * api: fixups from pr	2023-04-11 08:45:08 -05:00
Piotr Kazmierczak	dea8b1a093	acl: bump JWT auth gate to 1.5.4 (#16838 )	2023-04-11 10:07:45 +02:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	1335543731	ephemeral disk: `migrate` should imply `sticky` (#16826 ) The `ephemeral_disk` block's `migrate` field allows for best-effort migration of the ephemeral disk data to new nodes. The documentation says the `migrate` field is only respected if `sticky=true`, but in fact if client ACLs are not set the data is migrated even if `sticky=false`. The existing behavior when client ACLs are disabled has existed since the early implementation, so "fixing" that case now would silently break backwards compatibility. Additionally, having `migrate` not imply `sticky` seems nonsensical: it suggests that if we place on a new node we migrate the data but if we place on the same node, we throw the data away! Update so that `migrate=true` implies `sticky=true` as follows: * The failure mode when client ACLs are enabled comes from the server not passing along a migration token. Update the server so that the server provides a migration token whenever `migrate=true` and not just when `sticky=true` too. * Update the scheduler so that `migrate` implies `sticky`. * Update the client so that we check for `migrate \|\| sticky` where appropriate. * Refactor the E2E tests to move them off the old framework and make the intention of the test more clear.	2023-04-07 16:33:45 -04:00
hc-github-team-nomad-core	3578078caf	Prepare for next release	2023-04-05 12:31:42 -04:00
hc-github-team-nomad-core	b64ee2726d	Generate files for 1.5.3 release	2023-04-05 12:31:30 -04:00
Tim Gross	8278f23042	acl: fix ACL bypass for anon requests that pass thru client HTTP Requests without an ACL token that pass thru the client's HTTP API are treated as though they come from the client itself. This allows bypass of ACLs on RPC requests where ACL permissions are checked (like `Job.Register`). Invalid tokens are correctly rejected. Fix the bypass by only setting a client ID on the identity if we have a valid node secret. Note that this changeset will break rate metrics for RPCs sent by clients without a client secret such as `Node.GetClientAllocs`; these requests will be recorded as anonymous. Future work should: * Ensure the node secret is sent with all client-driven RPCs except `Node.Register` which is TOFU. * Create a new `acl.ACL` object from client requests so that we can enforce ACLs for all endpoints in a uniform way that's less error-prone.~	2023-04-05 12:17:51 -04:00
Juana De La Cuesta	9b4871fece	Prevent kill_timeout greater than progress_deadline (#16761 ) * func: add validation for kill timeout smaller than progress dealine * style: add changelog * style: typo in changelog * style: remove refactored test * Update .changelog/16761.txt Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/structs/structs.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-04-04 18:17:10 +02:00
Juana De La Cuesta	1fc13b83d8	style: update documentation (#16729 )	2023-03-31 16:38:16 +02:00
Daniel Bennett	c42950e342	ent: move all license info into LicenseConfig{} (#16738 ) and add new TestConfigForServer() to get a valid nomad.Config to use in tests Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-03-30 16:15:05 -05:00
Horacio Monsalvo	20372b1721	connect: add meta on ConsulSidecarService (#16705 ) Co-authored-by: Sol-Stiep <sol.stiep@southworks.com>	2023-03-30 16:09:28 -04:00
Piotr Kazmierczak	1a5eba24a6	acl: set minACLJWTAuthMethodVersion to 1.5.3 and adjust code comment	2023-03-30 15:30:42 +02:00
Piotr Kazmierczak	d98c8f6759	acl: rebased on main and changed the gate to 1.5.3-dev	2023-03-30 09:40:12 +02:00
Piotr Kazmierczak	16b6bd9ff2	acl: fix canonicalization of JWT auth method mock (#16531 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	2b353902a1	acl: HTTP endpoints for JWT auth (#16519 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	e48c48e89b	acl: RPC endpoints for JWT auth (#15918 )	2023-03-30 09:39:56 +02:00
Piotr Kazmierczak	a9230fb0b7	acl: JWT auth method	2023-03-30 09:39:56 +02:00
Juana De La Cuesta	320884b8ee	Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true (#16583 ) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * Update nomad/leader_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <jrasell@hashicorp.com> Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-03-27 17:25:05 +02:00
Juana De La Cuesta	21b675244e	style: rename ForceRun to ForceEval, for clarity (#16617 )	2023-03-27 15:38:48 +02:00
Tim Gross	977c88dcea	drainer: test refactoring to clarify behavior around delete/down nodes (#16612 ) This changeset refactors the tests of the draining node watcher so that we don't mock the node watcher's `Remove` and `Update` methods for its own tests. Instead we'll mock the node watcher's dependencies (the job watcher and deadline notifier) and now unit tests can cover the real code. This allows us to remove a bunch of TODOs in `watch_nodes.go` around testing and clarify some important behaviors: * Nodes that are down or disconnected will still be watched until the scheduler decides what to do with their allocations. This will drive the job watcher but not the node watcher, and that lets the node watcher gracefully handle cases where a heartbeat fails but the node heartbeats again before its allocs can be evicted. * Stop watching nodes that have been deleted. The blocking query for nodes set the maximum index to the highest index of a node it found, rather than the index of the nodes table. This misses updates to the index from deleting nodes. This was done as an performance optimization to avoid excessive unblocking, but because the query is over all nodes anyways there's no optimization to be had here. Remove the optimization so we can detect deleted nodes without having to wait for an update to an unrelated node.	2023-03-23 14:07:09 -04:00
Michael Schurter	f8884d8b52	client/metadata: fix crasher caused by AllowStale = false (#16549 ) Fixes #16517 Given a 3 Server cluster with at least 1 Client connected to Follower 1: If a NodeMeta.{Apply,Read} for the Client request is received by Follower 1 with `AllowStale = false` the Follower will forward the request to the Leader. The Leader, not being connected to the target Client, will forward the RPC to Follower 1. Follower 1, seeing AllowStale=false, will forward the request to the Leader. The Leader, not being connected to... well hoppefully you get the picture: an infinite loop occurs.	2023-03-20 16:32:32 -07:00
Seth Hoenig	d6dcc53c0a	tls enforcement flaky tests (#16543 ) * tests: add WaitForLeaders helpers using must/wait timings * tests: start servers for mtls tests together Fixes #16253 (hopefully)	2023-03-17 14:11:13 -05:00
Piotr Kazmierczak	14927e93bc	acl: fix canonicalization of OIDC auth method mock (#16534 )	2023-03-17 15:37:54 +01:00
Seth Hoenig	ed7177de76	scheduler: annotate tasksUpdated with reason and purge DeepEquals (#16421 ) * scheduler: annotate tasksUpdated with reason and purge DeepEquals * cr: move opaque into helper * cr: swap affinity/spread hashing for slice equal * contributing: update checklist-jobspec with notes about struct methods * cr: add more cases to wait config equal method * cr: use reflect when comparing envoy config blocks * cl: add cl	2023-03-14 09:46:00 -05:00
Luiz Aoqui	adf147cb36	acl: update job eval requirement to `submit-job` (#16463 ) The job evaluate endpoint creates a new evaluation for the job which is a write operation. This change modifies the necessary capability from `read-job` to `submit-job` to better reflect this.	2023-03-13 17:13:54 -04:00
Tim Gross	9dfb51579c	scheduler: refactor system util tests (#16416 ) The tests for the system allocs reconciling code path (`diffSystemAllocs`) include many impossible test environments, such as passing allocs for the wrong node into the function. This makes the test assertions nonsensible for use in walking yourself through the correct behavior. I've pulled this changeset out of PR #16097 so that we can merge these improvements and revisit the right approach to fix the problem in #16097 with less urgency now that the PFNR bug fix has been merged. This changeset breaks up a couple of tests, expands test coverage, and makes test assertions more clear. It also corrects one bit of production code that behaves fine in production because of canonicalization, but forces us to remember to set values in tests to compensate.	2023-03-13 11:59:31 -04:00
Seth Hoenig	630bd8eb68	scheduler: add simple benchmark for tasksUpdated (#16422 ) In preperation for some refactoring to tasksUpdated, add a benchmark to the old code so it's easy to compare with the changes, making sure nothing goes off the rails for performance.	2023-03-13 10:44:14 -05:00
hc-github-team-nomad-core	2d1a4d90e9	Prepare for next release	2023-03-13 11:13:27 -04:00
hc-github-team-nomad-core	35167e692a	Generate files for 1.5.1 release	2023-03-13 11:13:27 -04:00
Tim Gross	1cf28996e7	acl: prevent privilege escalation via workload identity ACL policies can be associated with a job so that the job's Workload Identity can have expanded access to other policy objects, including other variables. Policies set on the variables the job automatically has access to were ignored, but this includes policies with `deny` capabilities. Additionally, when resolving claims for a workload identity without any attached policies, the `ResolveClaims` method returned a `nil` ACL object, which is treated similarly to a management token. While this was safe in Nomad 1.4.x, when the workload identity token was exposed to the task via the `identity` block, this allows a user with `submit-job` capabilities to escalate their privileges. We originally implemented automatic workload access to Variables as a separate code path in the Variables RPC endpoint so that we don't have to generate on-the-fly policies that blow up the ACL policy cache. This is fairly brittle but also the behavior around wildcard paths in policies different from the rest of our ACL polices, which is hard to reason about. Add an `ACLClaim` parameter to the `AllowVariableOperation` method so that we can push all this logic into the `acl` package and the behavior can be consistent. This will allow a `deny` policy to override automatic access (and probably speed up checks of non-automatic variable access).	2023-03-13 11:13:27 -04:00
Luiz Aoqui	7305a374e3	allocrunner: fix health check monitoring for Consul services (#16402 ) Services must be interpolated to replace runtime variables before they can be compared against the values returned by Consul.	2023-03-10 14:43:31 -05:00
Tim Gross	99d46e5a49	scheduling: prevent self-collision in dynamic port network offerings (#16401 ) When the scheduler tries to find a placement for a new allocation, it iterates over a subset of nodes. For each node, we populate a `NetworkIndex` bitmap with the ports of all existing allocations and any other allocations already proposed as part of this same evaluation via its `SetAllocs` method. Then we make an "ask" of the `NetworkIndex` in `AssignPorts` for any ports we need and receive an "offer" in return. The offer will include both static ports and any dynamic port assignments. The `AssignPorts` method was written to support group networks, and it shares code that selects dynamic ports with the original `AssignTaskNetwork` code. `AssignTaskNetwork` can request multiple ports from the bitmap at a time. But `AssignPorts` requests them one at a time and does not account for possible collisions, and doesn't return an error in that case. What happens next varies: 1. If the scheduler doesn't place the allocation on that node, the port conflict is thrown away and there's no problem. 2. If the node is picked and this is the only allocation (or last allocation), the plan applier will reject the plan when it calls `SetAllocs`, as we'd expect. 3. If the node is picked and there are additional allocations in the same eval that iterate over the same node, their call to `SetAllocs` will detect the impossible state and the node will be rejected. This can have the puzzling behavior where a second task group for the job without any networking at all can hit a port collision error! It looks like this bug has existed since we implemented group networks, but there are several factors that add up to making the issue rare for many users yet frustratingly frequent for others: * You're more likely to hit this bug the more tightly packed your range for dynamic ports is. With 12000 ports in the range by default, many clusters can avoid this for a long time. * You're more likely to hit case (3) for jobs with lots of allocations or if a scheduler has to iterate over a large number of nodes, such as with system jobs, jobs with `spread` blocks, or (sometimes) jobs using `unique` constraints. For unlucky combinations of these factors, it's possible that case (3) happens repeatedly, preventing scheduling of a given job until a client state change (ex. restarting the agent so all its allocations are rescheduled elsewhere) re-opens the range of dynamic ports available. This changeset: * Fixes the bug by accounting for collisions in dynamic port selection in `AssignPorts`. * Adds test coverage for `AssignPorts`, expands coverage of this case for the deprecated `AssignTaskNetwork`, and tightens the dynamic port range in a scheduler test for spread scheduling to more easily detect this kind of problem in the future. * Adds a `String()` method to `Bitmap` so that any future "screaming" log lines have a human-readable list of used ports.	2023-03-09 10:09:54 -05:00
Lance Haig	e89c3d3b36	Update ioutil library references to os and io respectively for e2e helper nomad (#16332 ) No user facing changes so I assume no change log is required	2023-03-08 09:39:03 -06:00
Tim Gross	a2ceab3d8c	scheduler: correctly detect inplace update with wildcard datacenters (#16362 ) Wildcard datacenters introduced a bug where a job with any wildcard datacenters will always be treated as a destructive update when we check whether a datacenter has been removed from the jobspec. Includes updating the helper so that callers don't have to loop over the job's datacenters.	2023-03-07 10:05:59 -05:00
Luiz Aoqui	2a1a790820	client: don't emit task shutdown delay event if not waiting (#16281 )	2023-03-03 18:22:06 -05:00
Tim Gross	3c0eaba9db	remove backcompat support for non-atomic job registration (#16305 ) In Nomad 0.12.1 we introduced atomic job registration/deregistration, where the new eval was written in the same raft entry. Backwards-compatibility checks were supposed to have been removed in Nomad 1.1.0, but we missed that. This is long safe to remove.	2023-03-03 15:52:22 -05:00
Tim Gross	8747059b86	service: fix regression in task access to list/read endpoint (#16316 ) When native service discovery was added, we used the node secret as the auth token. Once Workload Identity was added in Nomad 1.4.x we needed to use the claim token for `template` blocks, and so we allowed valid claims to bypass the ACL policy check to preserve the existing behavior. (Invalid claims are still rejected, so this didn't widen any security boundary.) In reworking authentication for 1.5.0, we unintentionally removed this bypass. For WIs without a policy attached to their job, everything works as expected because the resulting `acl.ACL` is nil. But once a policy is attached to the job the `acl.ACL` is no longer nil and this causes permissions errors. Fix the regression by adding back the bypass for valid claims. In future work, we should strongly consider getting turning the implicit policies into real `ACLPolicy` objects (even if not stored in state) so that we don't have these kind of brittle exceptions to the auth code.	2023-03-03 11:41:19 -05:00
Tim Gross	0e1b554299	handle `FSM.Apply` errors in `raftApply` (#16287 ) The signature of the `raftApply` function requires that the caller unwrap the first returned value (the response from `FSM.Apply`) to see if it's an error. This puts the burden on the caller to remember to check two different places for errors, and we've done so inconsistently. Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error as the error value. Similar work was done in Consul in https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate and surfaces a few minor bugs in the process: * job deregistrations of already-GC'd jobs were still emitting evals * reconcile job summaries does not return scheduler errors * node updates did not report errors associated with inconsistent service discovery or CSI plugin states Note that although _most_ of the `FSM.Apply` functions return only errors (which makes it tempting to remove the first return value entirely), there are few that return `bool` for some reason and Variables relies on the response value for proper CAS checking.	2023-03-02 13:51:09 -05:00
Michael Schurter	bd7b60712e	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Daniel Bennett	39e3a1ac3e	build/cli: Add BuildDate (#16216 ) * build: add BuildDate to version info will be used in enterprise to compare to license expiration time * cli: multi-line version output, add BuildDate before: $ nomad version Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww) after: $ nomad version Nomad v1.5.0-dev BuildDate 2023-02-17T19:29:26Z Revision coolfakecommithashomgoshsuchacoolonewoww compare consul: $ consul version Consul v1.14.4 Revision dae670fe Build Date 2023-01-26T15:47:10Z Protocol 2 spoken by default, blah blah blah... and vault: $ vault version Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z * docs: update version command output	2023-02-27 11:27:40 -06:00
Tim Gross	4c9688271a	CSI: fix potential state store corruptions (#16256 ) The `CSIVolume` struct has references to allocations that are "denormalized"; we don't store them on the `CSIVolume` struct but hydrate them on read. Tests detecting potential state store corruptions found two locations where we're not copying the volume before denormalizing: * When garbage collecting CSI volume claims. * When checking if it's safe to force-deregister the volume. There are no known user-visible problems associated with these bugs but both have the potential of mutating volume claims outside of a FSM transaction. This changeset also cleans up state mutations in some CSI tests so as to avoid having working tests cover up potential future bugs.	2023-02-27 08:47:08 -05:00
James Rasell	8295d0e516	acl: add validation to binding rule selector on upsert. (#16210 ) * acl: add validation to binding rule selector on upsert. * docs: add more information on binding rule selector escaping.	2023-02-17 15:38:55 +01:00
Alessio Perugini	4e9ec24b22	Allow configurable range of Job priorities (#16084 )	2023-02-17 09:23:13 -05:00
Michael Schurter	671d9f64ec	Minor post-1.5-beta1 API, code, and docs cleanups (#16193 ) * api: return error on parse failure * docs: clarify anonymous policy with task api	2023-02-16 10:32:21 -08:00
Tim Gross	27cc6a2ff9	fix test flake for RPC TLS enforcement test (#16199 ) The RPC TLS enforcement test was frequently failing with broken connections. The most likely cause was that the tests started to run before the server had started its RPC server. Wait until it self-elects to ensure that the RPC server is up. This seems to have corrected the error; I ran this 3 times without a failure (even accounting for `gotestsum` retries). Also, fix a minor test bug that didn't impact the test but showed an incorrect usage for `Status.Ping.`	2023-02-16 11:50:40 -05:00
Will Nicholson	4dc83757a6	eventstream: Handle missing policy documents in event streams (#15495 ) Fixes https://github.com/hashicorp/nomad/issues/15493 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-02-14 11:27:39 -05:00
Seth Hoenig	165791dd89	artifact: protect against unbounded artifact decompression (1.5.0) (#16151 ) * artifact: protect against unbounded artifact decompression Starting with 1.5.0, set defaut values for artifact decompression limits. artifact.decompression_size_limit (default "100GB") - the maximum amount of data that will be decompressed before triggering an error and cancelling the operation artifact.decompression_file_count_limit (default 4096) - the maximum number of files that will be decompressed before triggering an error and cancelling the operation. * artifact: assert limits cannot be nil in validation	2023-02-14 09:28:39 -06:00
Charlie Voiselle	d93ba0cf32	Add warnings to `var put` for non-alphanumeric keys. (#15933 ) * Warn when Items key isn't directly accessible Go template requires that map keys are alphanumeric for direct access using the dotted reference syntax. This warns users when they create keys that run afoul of this requirement. - cli: use regex to detect invalid indentifiers in var keys - test: fix slash in escape test case - api: share warning formatting function between API and CLI - ui: warn if var key has characters other than _, letter, or number --------- Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-02-13 16:14:59 -05:00
Seth Hoenig	df3e8f82da	deps: update go-set, go-landlock (#16146 ) Made a breaking change in go-set (String() signature), need to update both these dependencies together and also fix a thing in structs.go	2023-02-13 08:26:30 -06:00
Charlie Voiselle	65ce3ec8de	[core] Do not start the plugin loader on non-clients (#16111 ) The plugin loader loads task and device driver plugins which are not used on server nodes.	2023-02-10 15:33:16 -05:00
Tim Gross	65c7e149d3	eval broker: use write lock when reaping cancelable evals (#16112 ) The eval broker's `Cancelable` method used by the cancelable eval reaper mutates the slice of cancelable evals by removing a batch at a time from the slice. But this method unsafely uses a read lock despite this mutation. Under normal workloads this is likely to be safe but when the eval broker is under the heavy load this feature is intended to fix, we're likely to have a race condition. Switch this to a write lock, like the other locks that mutate the eval broker state. This changeset also adjusts the timeout to allow poorly-sized Actions runners more time to schedule the appropriate goroutines. The test has also been updated to use `shoenig/test/wait` so we can have sensible reporting of the results rather than just a timeout error when things go wrong.	2023-02-10 10:40:41 -05:00
Tim Gross	c2bd829fe2	tests: don't mutate global structs in core scheduler tests (#16120 ) Some of the core scheduler tests need the maximum batch size for writes to be smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by unsafely modifying the global struct, which creates test flakes in other tests. Modify the functions under test to take a batch size parameter. Production code will pass the global while the tests can inject smaller values. Turn the `structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for avoiding this kind of thing in the future.	2023-02-10 09:26:00 -05:00
Tim Gross	69a2040e82	acl: never return auth errors for `ACL.Bootstrap` RPC (#16108 ) In #15901 we introduced pre-forwarding authentication for RPCs so that we can grab the identity for rate metrics. The `ACL.Bootstrap` RPC is an unauthenticated endpoint, so any error message from authentication is not particularly useful. This would be harmless, but if you try to bootstrap with your `NOMAD_TOKEN` already set (perhaps because you were talking to another cluster previously from the same shell session), you'll get an authentication error instead of just having the token be ignored. This is a regression from the existing behavior, so have this endpoint ignore auth errors the same way we do for every other unauthenticated endpoint (ex `Status.Peers`)	2023-02-09 10:02:56 -05:00
Seth Hoenig	0e7bf87ee1	deps: upgrade to hashicorp/golang-lru/v2 (#16085 )	2023-02-08 15:20:33 -06:00
James Rasell	53e0f424e9	rcp: bump SSO feature gate version. (#16080 )	2023-02-07 15:45:07 -08:00
Michael Schurter	35d65c7c7e	Dynamic Node Metadata (#15844 ) Fixes #14617 Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted. Includes new UI for editing metadata as well. --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>	2023-02-07 14:42:25 -08:00
Seth Hoenig	590ae08752	main: remove deprecated uses of rand.Seed (#16074 ) * main: remove deprecated uses of rand.Seed go1.20 deprecates rand.Seed, and seeds the rand package automatically. Remove cases where we seed the random package, and cleanup the one case where we intentionally create a known random source. * cl: update cl * mod: update go mod	2023-02-07 09:19:38 -06:00
Michael Schurter	0a496c845e	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Phil Renaud	d3c351d2d2	Label for the Web UI (#16006 ) * Demoable state * Demo mirage color * Label as a block with foreground and background colours * Test mock updates * Go test updated * Documentation update for label support	2023-02-02 16:29:04 -05:00
Tim Gross	19a2c065f4	System and sysbatch jobs always have zero index (#16030 ) Service jobs should have unique allocation Names, derived from the Job.ID. System jobs do not have unique allocation Names because the index is intended to indicated the instance out of a desired count size. Because system jobs do not have an explicit count but the results are based on the targeted nodes, the index is less informative and this was intentionally omitted from the original design. Update docs to make it clear that NOMAD_ALLOC_INDEX is always zero for system/sysbatch jobs Validate that `volume.per_alloc` is incompatible with system/sysbatch jobs. System and sysbatch jobs always have a `NOMAD_ALLOC_INDEX` of 0. So interpolation via `per_alloc` will not work as soon as there's more than one allocation placed. Validate against this on job submission.	2023-02-02 16:18:01 -05:00
Daniel Bennett	335f0a5371	docs: how to troubleshoot consul connect envoy (#15908 ) * largely a doc-ification of this commit message: d47678074bf8ae9ff2da3c91d0729bf03aee8446 this doesn't spell out all the possible failure modes, but should be a good starting point for folks. * connect: add doc link to envoy bootstrap error * add Unwrap() to RecoverableError mainly for easier testing	2023-02-02 14:20:26 -06:00
Charlie Voiselle	cc6f4719f1	Add option to expose workload token to task (#15755 ) Add `identity` jobspec block to expose workload identity tokens to tasks. --------- Co-authored-by: Anders <mail@anars.dk> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-02-02 10:59:14 -08:00
Phil Renaud	3db9f11c37	[feat] Nomad Job Templates (#15746 ) * Extend variables under the nomad path prefix to allow for job-templates (#15570) * Extend variables under the nomad path prefix to allow for job-templates * Add job-templates to error message hinting * RadioCard component for Job Templates (#15582) * chore: add * test: component API * ui: component template * refact: remove bc naming collission * styles: remove SASS var causing conflicts * Disallow specific variable at nomad/job-templates (#15681) * Disallows variables at exactly nomad/job-templates * idiomatic refactor * Expanding nomad job init to accept a template flag (#15571) * Adding a string flag for templates on job init * data-down actions-up version of a custom template editor within variable * Dont force grid on job template editor * list-templates flag started * Correctly slice from end of path name * Pre-review cleanup * Variable form acceptance test for job template editing * Some review cleanup * List Job templates test * Example from template test * Using must.assertions instead of require etc * ui: add choose template button (#15596) * ui: add new routes * chore: update file directory * ui: add choose template button * test: button and page navigation * refact: update var name * ui: use `Button` component from `HDS` (#15607) * ui: integrate buttons * refact: remove helper * ui: remove icons on non-tertiary buttons * refact: update normalize method for key/value pairs (#15612) * `revert`: `onCancel` for `JobDefinition` The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button. * style: increase button size * style: keep lime green * ui: select template (#15613) * ui: deprecate unused component * ui: deprecate tests * ui: jobs.run.templates.index * ui: update logic to handle templates * refact: revert key/value changes * style: padding for cards + buttons * temp: fixtures for mirage testing * Revert "refact: revert key/value changes" This reverts commit 124e95d12140be38fc921f7e15243034092c4063. * ui: guard template for unsaved job * ui: handle reading template variable * Revert "refact: update normalize method for key/value pairs (#15612)" This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74. * revert: remove test fixtures * revert: prettier problems * refact: test doesnt need filter expression * styling: button sizes and responsive cards * refact: remove route guarding * ui: update variable adapter * refact: remove model editing behavior * refact: model should query variables to populate editor * ui: clear qp on exit * refact: cleanup deprecated API * refact: query all namespaces * refact: deprecate action * ui: rely on collection * refact: patch deprecate transition API * refact: patch test to expect namespace qp * styling: padding, conditionals * ui: flashMessage on 404 * test: update for o(n+1) query * ui: create new job template (#15744) * refact: remove unused code * refact: add type safety * test: select template flow * test: add data-test attrs * chore: remove dead code * test: create new job flow * ui: add create button * ui: create job template * refact: no need for wildcard * refact: record instead of delete * styling: spacing * ui: add error handling and form validation to job create template (#15767) * ui: handle server side errors * ui: show error to prevent duplicate * refact: conditional namespace * ui: save as template flow (#15787) * bug: patches failing tests associated with `pretender` (#15812) * refact: update assertion * refact: test set-up * ui: job templates manager view (#15815) * ui: manager list view * test: edit flow * refact: deprecate column-helper * ui: template edit and delete flow (#15823) * ui: manager list view * refact: update title * refact: update permissions * ui: template edit page * bug: typo * refact: update toast messages * bug: clear selections on exit (#15827) * bug: clear controllers on exit * test: mirage config changes (#15828) * refact: deprecate column-helper * style: update z-index for HDS * Revert "style: update z-index for HDS" This reverts commit d3d87ceab6d083f7164941587448607838944fc1. * refact: update delete button * refact: edit redirect * refact: patch reactivity issues * styling: fixed width * refact: override defaults * styling: edit text causing overflow * styling: add inline text Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * bug: edit `text` to `template` Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * test: delete flow job templates (#15896) * refact: edit names * bug: set correct ref to store * chore: trim whitespace: * test: delete flow * bug: reactively update view (#15904) * Initialized default jobs (#15856) * Initialized default jobs * More jobs scaffolded * Better commenting on a couple example job specs * Adapter doing the work * fall back to epic config * Label format helper and custom serialization logic * Test updates to account for a never-empty state * Test suite uses settled and maintain RecordArray in adapter return * Updates to hello-world and variables example jobspecs * Parameterized job gets optional payload output * Formatting changes for param and service discovery job templates * Multi-group service discovery job * Basic test for default templates (#15965) * Basic test for default templates * Percy snapshot for manage page * Some late-breaking design changes * Some copy edits to the header paragraphs for job templates (#15967) * Added some init options for job templates (#15994) * Async method for populating default job templates from the variable adapter --------- Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>	2023-02-02 10:37:40 -05:00
jmwilkinson	37834dffda	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00
Seth Hoenig	ca7ead191e	consul: restore consul token when reverting a job (#15996 ) * consul: reset consul token on job during registration of a reversion * e2e: add test for reverting a job with a consul service * cl: fixup cl entry	2023-02-01 14:02:45 -06:00
James Rasell	9e8325d63c	acl: fix a bug in token creation when parsing expiration TTLs. (#15999 ) The ACL token decoding was not correctly handling time duration syntax such as "1h" which forced people to use the nanosecond representation via the HTTP API. The change adds an unmarshal function which allows this syntax to be used, along with other styles correctly.	2023-02-01 17:43:41 +01:00
James Rasell	67acfd9f6b	acl: return 400 not 404 code when creating an invalid policy. (#16000 )	2023-02-01 17:40:15 +01:00
Mike Nomitch	80848b202e	Increases max variable size to 64KiB from 16KiB (#15983 )	2023-01-31 13:32:36 -05:00
stswidwinski	16eefbbf4d	GC: ensure no leakage of evaluations for batch jobs. (#15097 ) Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory. Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.	2023-01-31 13:32:14 -05:00
Jorge Marey	d1c9aad762	Rename fields on proxyConfig (#15541 ) * Change api Fields for expose and paths * Add changelog entry * changelog: add deprecation notes about connect fields * api: minor style tweaks --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-01-30 09:31:16 -06:00
Piotr Kazmierczak	14b53df3b6	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Seth Hoenig	074b76e3bf	consul: check for acceptable service identity on consul tokens (#15928 ) When registering a job with a service and 'consul.allow_unauthenticated=false', we scan the given Consul token for an acceptable policy or role with an acceptable policy, but did not scan for an acceptable service identity (which is backed by an acceptable virtual policy). This PR updates our consul token validation to also accept a matching service identity when registering a service into Consul. Fixes #15902	2023-01-27 18:15:51 -06:00
Tim Gross	881a4cfaff	metrics: Add remaining server RPC rate metrics (#15901 )	2023-01-27 08:29:53 -05:00
Tim Gross	ce3eef8037	metrics: Add rate metrics to Client CSI endpoints (#15905 ) Also tightens up authentication for these endpoints by enforcing the server certificate name is valid. We protect these endpoints currently by mTLS and can't use an auth token because these endpoints are (uniquely) called by the leader and followers for a given node won't have the leader's ephemeral ACL token. Add a certificate name check that requests come from a server and not a client, because no client should ever send these RPCs directly.	2023-01-26 16:40:58 -05:00
Tim Gross	bed8716e44	metrics: Add metrics to unauthenticated endpoints (#15899 )	2023-01-26 15:05:51 -05:00
Tim Gross	5e75ea9fb3	metrics: Add RPC rate metrics to endpoints that validate TLS names (#15900 )	2023-01-26 15:04:25 -05:00
Yorick Gersie	2a5c423ae0	Allow per_alloc to be used with host volumes (#15780 ) Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. Ensure host volumes understand the concept of per_alloc	2023-01-26 09:14:47 -05:00
Piotr Kazmierczak	f4d6efe69f	acl: make auth method default across all types (#15869 )	2023-01-26 14:17:11 +01:00
James Rasell	5d33891910	sso: allow binding rules to create management ACL tokens. (#15860 ) * sso: allow binding rules to create management ACL tokens. * docs: update binding rule docs to detail management type addition.	2023-01-26 09:57:44 +01:00
Tim Gross	6677a103c2	metrics: measure rate of RPC requests that serve API (#15876 ) This changeset configures the RPC rate metrics that were added in #15515 to all the RPCs that support authenticated HTTP API requests. These endpoints already configured with pre-forwarding authentication in #15870, and a handful of others were done already as part of the proof-of-concept work. So this changeset is entirely copy-and-pasting one method call into a whole mess of handlers. Upcoming PRs will wire up pre-forwarding auth and rate metrics for the remaining set of RPCs that have no API consumers or aren't authenticated, in smaller chunks that can be more thoughtfully reviewed.	2023-01-25 16:37:24 -05:00
Luiz Aoqui	3479e2231f	core: enforce strict steps for clients reconnect (#15808 ) When a Nomad client that is running an allocation with `max_client_disconnect` set misses a heartbeat the Nomad server will update its status to `disconnected`. Upon reconnecting, the client will make three main RPC calls: - `Node.UpdateStatus` is used to set the client status to `ready`. - `Node.UpdateAlloc` is used to update the client-side information about allocations, such as their `ClientStatus`, task states etc. - `Node.Register` is used to upsert the entire node information, including its status. These calls are made concurrently and are also running in parallel with the scheduler. Depending on the order they run the scheduler may end up with incomplete data when reconciling allocations. For example, a client disconnects and its replacement allocation cannot be placed anywhere else, so there's a pending eval waiting for resources. When this client comes back the order of events may be: 1. Client calls `Node.UpdateStatus` and is now `ready`. 2. Scheduler reconciles allocations and places the replacement alloc to the client. The client is now assigned two allocations: the original alloc that is still `unknown` and the replacement that is `pending`. 3. Client calls `Node.UpdateAlloc` and updates the original alloc to `running`. 4. Scheduler notices too many allocs and stops the replacement. This creates unnecessary placements or, in a different order of events, may leave the job without any allocations running until the whole state is updated and reconciled. To avoid problems like this clients must update _all_ of its relevant information before they can be considered `ready` and available for scheduling. To achieve this goal the RPC endpoints mentioned above have been modified to enforce strict steps for nodes reconnecting: - `Node.Register` does not set the client status anymore. - `Node.UpdateStatus` sets the reconnecting client to the `initializing` status until it successfully calls `Node.UpdateAlloc`. These changes are done server-side to avoid the need of additional coordination between clients and servers. Clients are kept oblivious of these changes and will keep making these calls as they normally would. The verification of whether allocations have been updates is done by storing and comparing the Raft index of the last time the client missed a heartbeat and the last time it updated its allocations.	2023-01-25 15:53:59 -05:00
Tim Gross	f3f64af821	WI: allow workloads to use RPCs associated with HTTP API (#15870 ) This changeset allows Workload Identities to authenticate to all the RPCs that support HTTP API endpoints, for use with PR #15864. * Extends the work done for pre-forwarding authentication to all RPCs that support a HTTP API endpoint. * Consolidates the auth helpers used by the CSI, Service Registration, and Node endpoints that are currently used to support both tokens and client secrets. Intentionally excluded from this changeset: * The Variables endpoint still has custom handling because of the implicit policies. Ideally we'll figure out an efficient way to resolve those into real policies and then we can get rid of that custom handling. * The RPCs that don't currently support auth tokens (i.e. those that don't support HTTP endpoints) have not been updated with the new pre-forwarding auth We'll be doing this under a separate PR to support RPC rate metrics.	2023-01-25 14:33:06 -05:00
Tim Gross	cf9e5f3327	acl: Fix panic when bogus token is passed (#15863 ) If a consumer of the new `Authenticate` method gets passed a bogus token that's a correctly-shaped UUID, it will correctly get an identity without a ACL token. But most consumers will then panic when they consume this nil `ACLToken` for authorization. Because no API client should ever send a bogus auth token, update the `Authenticate` method to create the identity with remote IP (for metrics tracking) but also return an `ErrPermissionDenied`.	2023-01-25 10:03:17 -05:00
Tim Gross	055434cca9	add metric for count of RPC requests (#15515 ) Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place. Includes telemetry configuration to omit identity labels.	2023-01-24 11:54:20 -05:00
Tim Gross	2030d62920	implement pre-forwarding auth on select RPCs (#15513 ) In #15417 we added a new `Authenticate` method to the server that returns an `AuthenticatedIdentity` struct. This changeset implements this method for a small number of RPC endpoints that together represent all the various ways in which RPCs are sent, so that we can validate that we're happy with this approach.	2023-01-24 10:52:07 -05:00
Michael Schurter	ace5faf948	core: backoff considerably when worker is behind raft (#15523 ) Upon dequeuing an evaluation workers snapshot their state store at the eval's wait index or later. This ensures we process an eval at a point in time after it was created or updated. Processing an eval on an old snapshot could cause any number of problems such as: 1. Since job registration atomically updates an eval and job in a single raft entry, scheduling against indexes before that may not have the eval's job or may have an older version. 2. The older the scheduler's snapshot, the higher the likelihood something has changed in the cluster state which will cause the plan applier to reject the scheduler's plan. This could waste work or even cause eval's to be failed needlessly. However, the workers run in parallel with a new server pulling the cluster state from a peer. During this time, which may be many minutes long, the state store is likely far behind the minimum index required to process evaluations. This PR addresses this by adding an additional long backoff period after an eval is nacked. If the scheduler's indexes catches up within the additional backoff, it will unblock early to dequeue the next eval. When the server shuts down we'll get a `context.Canceled` error from the state store method. We need to bubble this error up so that other callers can detect it. Handle this case separately when waiting after dequeue so that we can warn on shutdown instead of throwing an ambiguous error message with just the text "canceled." While there may be more precise ways to block scheduling until the server catches up, this approach adds little risk and covers additional cases where a server may be temporarily behind due to a spike in load or a saturated network. For testing, we make the `raftSyncLimit` into a parameter on the worker's `run` method so that we can run backoff tests without waiting 30+ seconds. We haven't followed thru and made all the worker globals into worker parameters, because there isn't much use outside of testing, but we can consider that in the future. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-01-24 08:56:35 -05:00
Tim Gross	a51149736d	Rename `nomad.broker.total_blocked` metric (#15835 ) This changeset fixes a long-standing point of confusion in metrics emitted by the eval broker. The eval broker has a queue of "blocked" evals that are waiting for an in-flight ("unacked") eval of the same job to be completed. But this "blocked" state is not the same as the `blocked` status that we write to raft and expose in the Nomad API to end users. There's a second metric `nomad.blocked_eval.total_blocked` that refers to evaluations in that state. This has caused ongoing confusion in major customer incidents and even in our own documentation! (Fixed in this PR.) There's little functional change in this PR aside from the name of the metric emitted, but there's a bit refactoring to clean up the names in `eval_broker.go` so that there aren't name collisions and multiple names for the same state. Changes included are: * Everything that was previously called "pending" referred to entities that were associated witht he "ready" metric. These are all now called "ready" to match the metric. * Everything named "blocked" in `eval_broker.go` is now named "pending", except for a couple of comments that actually refer to blocked RPCs. * Added a note to the upgrade guide docs for 1.5.0. * Fixed the scheduling performance metrics docs because the description for `nomad.broker.total_blocked` was actually the description for `nomad.blocked_eval.total_blocked`.	2023-01-20 14:23:56 -05:00
Charlie Voiselle	5ea1d8a970	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00
Seth Hoenig	d2d8ebbeba	consul: correctly interpret missing consul checks as unhealthy (#15822 ) * consul: correctly understand missing consul checks as unhealthy This PR fixes a bug where Nomad assumed any registered Checks would exist in the service registration coming back from Consul. In some cases, the Consul may be slow in processing the check registration, and the response object would not contain checks. Nomad would then scan the empty response looking for Checks with failing health status, finding none, and then marking a task/alloc as healthy. In reality, we must always use Nomad's view of what checks should exist as the source of truth, and compare that with the response Consul gives us, making sure they match, before scanning the Consul response for failing check statuses. Fixes #15536 * consul: minor CR refactor using maps not sets * consul: observe transition from healthy to unhealthy checks * consul: spell healthy correctly	2023-01-19 14:01:12 -06:00
James Rasell	fad9b40e53	Merge branch 'main' into sso/gh-13120-oidc-login	2023-01-18 10:05:31 +00:00
Phil Renaud	98c5259f3e	[sso] OIDC Updates for the UI (#15804 ) * Updated UI to handle OIDC method changes * Remove redundant store unload call	2023-01-17 17:01:47 -05:00
James Rasell	abe8e1cf29	updates based on code review from @tgross.	2023-01-17 08:45:17 +00:00
James Rasell	d29d3412d8	rpc: add OIDC login related endpoints. This adds new OIDC endpoints on the RPC endpoint. These two RPCs handle generating the OIDC provider URL and then completing the login by exchanging the provider token with an internal Nomad token. The RPC endpoints both do double forwarding. The initial forward is to ensure we are talking to the regional leader; the second then takes into account whether the auth method generates local or global tokens. If it creates global tokens, we must then forward onto the federated regional leader.	2023-01-13 13:14:29 +00:00
Seth Hoenig	fe7795ce16	consul/connect: support for proxy upstreams opaque config (#15761 ) This PR adds support for configuring `proxy.upstreams[].config` for Consul Connect upstreams. This is an opaque config value to Nomad - the data is passed directly to Consul and is unknown to Nomad.	2023-01-12 08:20:54 -06:00
Anthony Davis	1c32471805	Fix rejoin_after_leave behavior (#15552 )	2023-01-11 16:39:24 -05:00
Daniel Bennett	7d1059b5ae	connect: ingress gateway validation for http hosts and wildcards (#15749 ) * connect: fix non-"tcp" ingress gateway validation changes apply to http, http2, and grpc: * if "hosts" is excluded, consul will use its default domain e.g. <service-name>.ingress.dc1.consul * can't set hosts with "" service name test http2 and grpc too	2023-01-11 11:52:32 -06:00
Seth Hoenig	719eee8112	consul: add client configuration for grpc_ca_file (#15701 ) * [no ci] first pass at plumbing grpc_ca_file * consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+ This PR adds client config to Nomad for specifying consul.grpc_ca_file These changes combined with https://github.com/hashicorp/consul/pull/15913 should finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections. * consul: add cl entgry for grpc_ca_file * docs: mention grpc_tls changes due to Consul 1.14	2023-01-11 09:34:28 -06:00
Seth Hoenig	83450c8762	vault: configure user agent on Nomad vault clients (#15745 ) * vault: configure user agent on Nomad vault clients This PR attempts to set the User-Agent header on each Vault API client created by Nomad. Still need to figure a way to set User-Agent on the Vault client created internally by consul-template. * vault: fixup find-and-replace gone awry	2023-01-10 10:39:45 -06:00
Piotr Kazmierczak	be36a1924f	acl: binding rules evaluation (#15697 ) Binder provides an interface for binding claims and ACL roles/policies of Nomad.	2023-01-10 16:08:08 +01:00
Tim Gross	32f6ce1c54	Authenticate method improvements (#15734 ) This changeset covers a sidebar discussion that @schmichael and I had around the design for pre-forwarding auth. This includes some changes extracted out of #15513 to make it easier to review both and leave a clean history. * Remove fast path for NodeID. Previously-connected clients will have a NodeID set on the context, and because this is a large portion of the RPCs sent we fast-pathed it at the top of the `Authenticate` method. But the context is shared for all yamux streams over the same yamux session (and TCP connection). This lets an authenticated HTTP request to a client use the NodeID for authentication, which is a privilege escalation. Remove the fast path and annotate it so that we don't break it again. * Add context to decisions around AuthenticatedIdentity. The `Authenticate` method taken on its own looks like it wants to return an `acl.ACL` that folds over all the various identity types (creating an ephemeral ACL on the fly if neccessary). But keeping these fields idependent allows RPC handlers to differentiate between internal and external origins so we most likely want to avoid this. Leave some docstrings as a warning as to why this is built the way it is. * Mutate the request rather than returning. When reviewing #15513 we decided that forcing the request handler to call `SetIdentity` was repetitive and error prone. Instead, the `Authenticate` method mutates the request by setting its `AuthenticatedIdentity`.	2023-01-10 09:46:38 -05:00
Seth Hoenig	84cb5fb03d	deps: update shoenig/test to v0.6.0 (#15715 ) Adds support for custom cmp.Options; need to fix one minor thing causing api breakage.	2023-01-09 09:37:08 -06:00
Seth Hoenig	7214e21402	ci: swap freeport for portal in packages (#15661 )	2023-01-03 11:25:20 -06:00
Seth Hoenig	266ca25a81	cleanup: remove usage of consul/sdk/testutil/retry (#15609 ) This PR removes usages of `consul/sdk/testutil/retry`, as part of the ongoing effort to remove use of any non-API module from Consul. There is one remanining usage in the helper/freeport package, but that will get removed as part of #15589	2023-01-02 08:06:20 -06:00
Piotr Kazmierczak	3fd9fbeece	fix failing UpsertBindingRules unit test (#15604 ) UpsertBindingRules RPC changed in eacecb8, validation happens after the ID check now, because we don't want validation to fail for update payloads which may contain incomplete objects.	2022-12-21 15:19:09 +01:00
Piotr Kazmierczak	b500dfa969	bugfix: unit test for GetACLBindingRules (#15583 ) Unit test for GetACLBindingRules state store method would fail because we'd expect order of returned items.	2022-12-20 15:06:09 +01:00
Piotr Kazmierczak	20a01a0bba	acl: modify update endpoints behavior (#15580 ) API and RPC endpoints for ACLAuthMethods and ACLBindingRules should allow users to send incomplete objects in order to, e.g., update single fields. This PR provides "merging" functionality for these endpoints.	2022-12-20 11:22:19 +01:00
James Rasell	b8aa53d09f	core: add ACL binding rule to replication system. (#15555 ) ACL binding rule create and deletes are always forwarded to the authoritative region. In order to make these available in federated regions, the leaders in these regions need to replicate from the authoritative.	2022-12-16 09:08:00 +01:00
James Rasell	95c9ffa505	ACL: add ACL binding rule RPC and HTTP API handlers. (#15529 ) This change add the RPC ACL binding rule handlers. These handlers are responsible for the creation, updating, reading, and deletion of binding rules. The write handlers are feature gated so that they can only be used when all federated servers are running the required version. The HTTP API handlers and API SDK have also been added where required. This allows the endpoints to be called from the API by users and clients.	2022-12-15 09:18:55 +01:00
James Rasell	13f207ea78	events: add ACL binding rules to core events stream topics. (#15544 )	2022-12-14 14:49:49 +01:00
James Rasell	3c941c6bc3	acl: add binding rule object state schema and functionality. (#15511 ) This change adds a new table that will store ACL binding rule objects. The two indexes allow fast lookups by their ID, or by which auth method they are linked to. Snapshot persist and restore functionality ensures this table can be saved and restored from snapshots. In order to write and delete the object to state, new Raft messages have been added. All RPC request and response structs, along with object functions such as diff and canonicalize have been included within this work as it is nicely separated from the other areas of work.	2022-12-14 08:48:18 +01:00
Seth Hoenig	be3f89b5f9	artifact: enable inheriting environment variables from client (#15514 ) * artifact: enable inheriting environment variables from client This PR adds client configuration for specifying environment variables that should be inherited by the artifact sandbox process from the Nomad Client agent. Most users should not need to set these values but the configuration is provided to ensure backwards compatability. Configuration of go-getter should ideally be done through the artifact block in a jobspec task. e.g. ```hcl client { artifact { set_environment_variables = "TMPDIR,GIT_SSH_OPTS" } } ``` Closes #15498 * website: update set_environment_variables text to mention PATH	2022-12-09 15:46:07 -06:00
Piotr Kazmierczak	db98e26375	bugfix: acl sso auth methods test failures (#15512 ) This PR fixes unit test failures introduced in f4e89e2	2022-12-09 18:47:32 +01:00
Piotr Kazmierczak	08f50f7dbf	acl: make sure there is only one default Auth Method per type (#15504 ) This PR adds a check that makes sure we don't insert a duplicate default ACL auth method for a given type.	2022-12-09 14:46:54 +01:00
Seth Hoenig	825c5cc65e	artifact: add client toggle to disable filesystem isolation (#15503 ) This PR adds the client config option for turning off filesystem isolation, applicable on Linux systems where filesystem isolation is possible and enabled by default. ```hcl client{ artifact { disable_filesystem_isolation = <bool:false> } } ``` Closes #15496	2022-12-08 12:29:23 -06:00
Piotr Kazmierczak	1cb45630f0	acl: canonicalize ACL Auth Method object (#15492 )	2022-12-08 14:05:46 +01:00
Piotr Kazmierczak	10f80f7d9a	bugfix: make sure streaming endpoints are only registered once (#15484 ) Streaming RPCs should only be registered once, not on every RPC call, because they set keys in StreamingRpcRegistry.registry map. This PR fixes it by checking whether endpoints are already registered before calling .register() method. Fixes #15474 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-12-07 17:01:45 +01:00
Tim Gross	e0fddee386	Pre forwarding authentication (#15417 ) Upcoming work to instrument the rate of RPC requests by consumer (and eventually rate limit) require that we authenticate a RPC request before forwarding. Add a new top-level `Authenticate` method to the server and have it return an `AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of this identity for performing authorization. This changeset includes: * The main implementation of `Authenticate` * Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At some point we might want to give this an equivalent HTTP endpoint but I didn't want to add that to our public API until some of the other Workload Identity work is solidified, especially if we don't need it yet. * A full coverage test of the `Authenticate` method. This sets up two server nodes with mTLS and ACLs, some tokens, and some allocations with workload identities. * Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and see how authorization happens after forwarding. * A new semgrep rule for `Authenticate`, which we'll need to update once we're ready to wire up more RPC endpoints with authorization steps.	2022-12-06 14:44:03 -05:00
Piotr Kazmierczak	d3aac1fe08	acl: sso auth method snapshot restore test (#15482 )	2022-12-06 15:15:29 +01:00
Piotr Kazmierczak	777173e8da	acl: added type to ACL Auth Method stub (#15480 )	2022-12-06 14:47:05 +01:00
Tim Gross	dfed1ba5bc	remove most static RPC handlers (#15451 ) Nomad server components that aren't in the `nomad` package like the deployment watcher and volume watcher need to make RPC calls but can't import the Server struct to do so because it creates a circular reference. These components have a "shim" object that gets populated to pass a "static" handler that has no RPC context. Most RPC handlers are never used in this way, but during server setup we were constructing a set of static handlers for most RPC endpoints anyways. This is slightly wasteful but also confusing to developers who end up being encouraged to just copy what was being done for previous RPCs. This changeset includes the following refactorings: * Remove the static handlers field on the server * Instead construct just the specific static handlers we need to pass into the deployment watcher and volume watcher. * Remove the unnecessary static handler from heartbeater * Update various tests to avoid needing the static endpoints and have them use a endpoint constructed on the spot. Follow-up work will examine whether we can remove the RPCs from deployment watcher and volume watcher entirely, falling back to raft applies like node drainer does currently.	2022-12-02 10:12:05 -05:00
Tim Gross	33f32d526e	fix enterprise endpoint registration (#15446 ) In #15430 we refactored the RPC endpoint configuration to make adding the RPC context easier. But when implementing the change on the Enterprise side, I discovered that the registration of enterprise endpoints was being done incorrectly -- this doesn't show up on OSS because the registration is always a no-op here.	2022-12-01 10:44:33 -05:00
Tim Gross	f61f801e77	provide `RPCContext` to all RPC handlers (#15430 ) Upcoming work to instrument the rate of RPC requests by consumer (and eventually rate limit) requires that we thread the `RPCContext` through all RPC handlers so that we can access the underlying connection. This changeset adds the context to everywhere we intend to initially support it and intentionally excludes streaming RPCs and client RPCs. To improve the ergonomics of adding the context everywhere its needed and to clarify the requirements of dynamic vs static handlers, I've also done a good bit of refactoring here: * canonicalized the RPC handler fields so they're as close to identical as possible without introducing unused fields (i.e. I didn't add loggers if the handler doesn't use them already). * canonicalized the imports in the handler files. * added a `NewExampleEndpoint` function for each handler that ensures we're constructing the handlers with the required arguments. * reordered the registration in server.go to match the order of the files (to make it easier to see if we've missed one), and added a bunch of commentary there as to what the difference between static and dynamic handlers is.	2022-12-01 10:05:15 -05:00

1 2 3 4 5 ...

4437 Commits