open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	bd7b60712e	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Daniel Bennett	39e3a1ac3e	build/cli: Add BuildDate (#16216 ) * build: add BuildDate to version info will be used in enterprise to compare to license expiration time * cli: multi-line version output, add BuildDate before: $ nomad version Nomad v1.4.3 (coolfakecommithashomgoshsuchacoolonewoww) after: $ nomad version Nomad v1.5.0-dev BuildDate 2023-02-17T19:29:26Z Revision coolfakecommithashomgoshsuchacoolonewoww compare consul: $ consul version Consul v1.14.4 Revision dae670fe Build Date 2023-01-26T15:47:10Z Protocol 2 spoken by default, blah blah blah... and vault: $ vault version Vault v1.12.3 (209b3dd99fe8ca320340d08c70cff5f620261f9b), built 2023-02-02T09:07:27Z * docs: update version command output	2023-02-27 11:27:40 -06:00
Tim Gross	4c9688271a	CSI: fix potential state store corruptions (#16256 ) The `CSIVolume` struct has references to allocations that are "denormalized"; we don't store them on the `CSIVolume` struct but hydrate them on read. Tests detecting potential state store corruptions found two locations where we're not copying the volume before denormalizing: * When garbage collecting CSI volume claims. * When checking if it's safe to force-deregister the volume. There are no known user-visible problems associated with these bugs but both have the potential of mutating volume claims outside of a FSM transaction. This changeset also cleans up state mutations in some CSI tests so as to avoid having working tests cover up potential future bugs.	2023-02-27 08:47:08 -05:00
James Rasell	8295d0e516	acl: add validation to binding rule selector on upsert. (#16210 ) * acl: add validation to binding rule selector on upsert. * docs: add more information on binding rule selector escaping.	2023-02-17 15:38:55 +01:00
Alessio Perugini	4e9ec24b22	Allow configurable range of Job priorities (#16084 )	2023-02-17 09:23:13 -05:00
Michael Schurter	671d9f64ec	Minor post-1.5-beta1 API, code, and docs cleanups (#16193 ) * api: return error on parse failure * docs: clarify anonymous policy with task api	2023-02-16 10:32:21 -08:00
Tim Gross	27cc6a2ff9	fix test flake for RPC TLS enforcement test (#16199 ) The RPC TLS enforcement test was frequently failing with broken connections. The most likely cause was that the tests started to run before the server had started its RPC server. Wait until it self-elects to ensure that the RPC server is up. This seems to have corrected the error; I ran this 3 times without a failure (even accounting for `gotestsum` retries). Also, fix a minor test bug that didn't impact the test but showed an incorrect usage for `Status.Ping.`	2023-02-16 11:50:40 -05:00
Will Nicholson	4dc83757a6	eventstream: Handle missing policy documents in event streams (#15495 ) Fixes https://github.com/hashicorp/nomad/issues/15493 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-02-14 11:27:39 -05:00
Seth Hoenig	165791dd89	artifact: protect against unbounded artifact decompression (1.5.0) (#16151 ) * artifact: protect against unbounded artifact decompression Starting with 1.5.0, set defaut values for artifact decompression limits. artifact.decompression_size_limit (default "100GB") - the maximum amount of data that will be decompressed before triggering an error and cancelling the operation artifact.decompression_file_count_limit (default 4096) - the maximum number of files that will be decompressed before triggering an error and cancelling the operation. * artifact: assert limits cannot be nil in validation	2023-02-14 09:28:39 -06:00
Charlie Voiselle	d93ba0cf32	Add warnings to `var put` for non-alphanumeric keys. (#15933 ) * Warn when Items key isn't directly accessible Go template requires that map keys are alphanumeric for direct access using the dotted reference syntax. This warns users when they create keys that run afoul of this requirement. - cli: use regex to detect invalid indentifiers in var keys - test: fix slash in escape test case - api: share warning formatting function between API and CLI - ui: warn if var key has characters other than _, letter, or number --------- Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-02-13 16:14:59 -05:00
Seth Hoenig	df3e8f82da	deps: update go-set, go-landlock (#16146 ) Made a breaking change in go-set (String() signature), need to update both these dependencies together and also fix a thing in structs.go	2023-02-13 08:26:30 -06:00
Charlie Voiselle	65ce3ec8de	[core] Do not start the plugin loader on non-clients (#16111 ) The plugin loader loads task and device driver plugins which are not used on server nodes.	2023-02-10 15:33:16 -05:00
Tim Gross	65c7e149d3	eval broker: use write lock when reaping cancelable evals (#16112 ) The eval broker's `Cancelable` method used by the cancelable eval reaper mutates the slice of cancelable evals by removing a batch at a time from the slice. But this method unsafely uses a read lock despite this mutation. Under normal workloads this is likely to be safe but when the eval broker is under the heavy load this feature is intended to fix, we're likely to have a race condition. Switch this to a write lock, like the other locks that mutate the eval broker state. This changeset also adjusts the timeout to allow poorly-sized Actions runners more time to schedule the appropriate goroutines. The test has also been updated to use `shoenig/test/wait` so we can have sensible reporting of the results rather than just a timeout error when things go wrong.	2023-02-10 10:40:41 -05:00
Tim Gross	c2bd829fe2	tests: don't mutate global structs in core scheduler tests (#16120 ) Some of the core scheduler tests need the maximum batch size for writes to be smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by unsafely modifying the global struct, which creates test flakes in other tests. Modify the functions under test to take a batch size parameter. Production code will pass the global while the tests can inject smaller values. Turn the `structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for avoiding this kind of thing in the future.	2023-02-10 09:26:00 -05:00
Tim Gross	69a2040e82	acl: never return auth errors for `ACL.Bootstrap` RPC (#16108 ) In #15901 we introduced pre-forwarding authentication for RPCs so that we can grab the identity for rate metrics. The `ACL.Bootstrap` RPC is an unauthenticated endpoint, so any error message from authentication is not particularly useful. This would be harmless, but if you try to bootstrap with your `NOMAD_TOKEN` already set (perhaps because you were talking to another cluster previously from the same shell session), you'll get an authentication error instead of just having the token be ignored. This is a regression from the existing behavior, so have this endpoint ignore auth errors the same way we do for every other unauthenticated endpoint (ex `Status.Peers`)	2023-02-09 10:02:56 -05:00
Seth Hoenig	0e7bf87ee1	deps: upgrade to hashicorp/golang-lru/v2 (#16085 )	2023-02-08 15:20:33 -06:00
James Rasell	53e0f424e9	rcp: bump SSO feature gate version. (#16080 )	2023-02-07 15:45:07 -08:00
Michael Schurter	35d65c7c7e	Dynamic Node Metadata (#15844 ) Fixes #14617 Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted. Includes new UI for editing metadata as well. --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>	2023-02-07 14:42:25 -08:00
Seth Hoenig	590ae08752	main: remove deprecated uses of rand.Seed (#16074 ) * main: remove deprecated uses of rand.Seed go1.20 deprecates rand.Seed, and seeds the rand package automatically. Remove cases where we seed the random package, and cleanup the one case where we intentionally create a known random source. * cl: update cl * mod: update go mod	2023-02-07 09:19:38 -06:00
Michael Schurter	0a496c845e	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Phil Renaud	d3c351d2d2	Label for the Web UI (#16006 ) * Demoable state * Demo mirage color * Label as a block with foreground and background colours * Test mock updates * Go test updated * Documentation update for label support	2023-02-02 16:29:04 -05:00
Tim Gross	19a2c065f4	System and sysbatch jobs always have zero index (#16030 ) Service jobs should have unique allocation Names, derived from the Job.ID. System jobs do not have unique allocation Names because the index is intended to indicated the instance out of a desired count size. Because system jobs do not have an explicit count but the results are based on the targeted nodes, the index is less informative and this was intentionally omitted from the original design. Update docs to make it clear that NOMAD_ALLOC_INDEX is always zero for system/sysbatch jobs Validate that `volume.per_alloc` is incompatible with system/sysbatch jobs. System and sysbatch jobs always have a `NOMAD_ALLOC_INDEX` of 0. So interpolation via `per_alloc` will not work as soon as there's more than one allocation placed. Validate against this on job submission.	2023-02-02 16:18:01 -05:00
Daniel Bennett	335f0a5371	docs: how to troubleshoot consul connect envoy (#15908 ) * largely a doc-ification of this commit message: d47678074bf8ae9ff2da3c91d0729bf03aee8446 this doesn't spell out all the possible failure modes, but should be a good starting point for folks. * connect: add doc link to envoy bootstrap error * add Unwrap() to RecoverableError mainly for easier testing	2023-02-02 14:20:26 -06:00
Charlie Voiselle	cc6f4719f1	Add option to expose workload token to task (#15755 ) Add `identity` jobspec block to expose workload identity tokens to tasks. --------- Co-authored-by: Anders <mail@anars.dk> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-02-02 10:59:14 -08:00
Phil Renaud	3db9f11c37	[feat] Nomad Job Templates (#15746 ) * Extend variables under the nomad path prefix to allow for job-templates (#15570) * Extend variables under the nomad path prefix to allow for job-templates * Add job-templates to error message hinting * RadioCard component for Job Templates (#15582) * chore: add * test: component API * ui: component template * refact: remove bc naming collission * styles: remove SASS var causing conflicts * Disallow specific variable at nomad/job-templates (#15681) * Disallows variables at exactly nomad/job-templates * idiomatic refactor * Expanding nomad job init to accept a template flag (#15571) * Adding a string flag for templates on job init * data-down actions-up version of a custom template editor within variable * Dont force grid on job template editor * list-templates flag started * Correctly slice from end of path name * Pre-review cleanup * Variable form acceptance test for job template editing * Some review cleanup * List Job templates test * Example from template test * Using must.assertions instead of require etc * ui: add choose template button (#15596) * ui: add new routes * chore: update file directory * ui: add choose template button * test: button and page navigation * refact: update var name * ui: use `Button` component from `HDS` (#15607) * ui: integrate buttons * refact: remove helper * ui: remove icons on non-tertiary buttons * refact: update normalize method for key/value pairs (#15612) * `revert`: `onCancel` for `JobDefinition` The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button. * style: increase button size * style: keep lime green * ui: select template (#15613) * ui: deprecate unused component * ui: deprecate tests * ui: jobs.run.templates.index * ui: update logic to handle templates * refact: revert key/value changes * style: padding for cards + buttons * temp: fixtures for mirage testing * Revert "refact: revert key/value changes" This reverts commit 124e95d12140be38fc921f7e15243034092c4063. * ui: guard template for unsaved job * ui: handle reading template variable * Revert "refact: update normalize method for key/value pairs (#15612)" This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74. * revert: remove test fixtures * revert: prettier problems * refact: test doesnt need filter expression * styling: button sizes and responsive cards * refact: remove route guarding * ui: update variable adapter * refact: remove model editing behavior * refact: model should query variables to populate editor * ui: clear qp on exit * refact: cleanup deprecated API * refact: query all namespaces * refact: deprecate action * ui: rely on collection * refact: patch deprecate transition API * refact: patch test to expect namespace qp * styling: padding, conditionals * ui: flashMessage on 404 * test: update for o(n+1) query * ui: create new job template (#15744) * refact: remove unused code * refact: add type safety * test: select template flow * test: add data-test attrs * chore: remove dead code * test: create new job flow * ui: add create button * ui: create job template * refact: no need for wildcard * refact: record instead of delete * styling: spacing * ui: add error handling and form validation to job create template (#15767) * ui: handle server side errors * ui: show error to prevent duplicate * refact: conditional namespace * ui: save as template flow (#15787) * bug: patches failing tests associated with `pretender` (#15812) * refact: update assertion * refact: test set-up * ui: job templates manager view (#15815) * ui: manager list view * test: edit flow * refact: deprecate column-helper * ui: template edit and delete flow (#15823) * ui: manager list view * refact: update title * refact: update permissions * ui: template edit page * bug: typo * refact: update toast messages * bug: clear selections on exit (#15827) * bug: clear controllers on exit * test: mirage config changes (#15828) * refact: deprecate column-helper * style: update z-index for HDS * Revert "style: update z-index for HDS" This reverts commit d3d87ceab6d083f7164941587448607838944fc1. * refact: update delete button * refact: edit redirect * refact: patch reactivity issues * styling: fixed width * refact: override defaults * styling: edit text causing overflow * styling: add inline text Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * bug: edit `text` to `template` Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com> * test: delete flow job templates (#15896) * refact: edit names * bug: set correct ref to store * chore: trim whitespace: * test: delete flow * bug: reactively update view (#15904) * Initialized default jobs (#15856) * Initialized default jobs * More jobs scaffolded * Better commenting on a couple example job specs * Adapter doing the work * fall back to epic config * Label format helper and custom serialization logic * Test updates to account for a never-empty state * Test suite uses settled and maintain RecordArray in adapter return * Updates to hello-world and variables example jobspecs * Parameterized job gets optional payload output * Formatting changes for param and service discovery job templates * Multi-group service discovery job * Basic test for default templates (#15965) * Basic test for default templates * Percy snapshot for manage page * Some late-breaking design changes * Some copy edits to the header paragraphs for job templates (#15967) * Added some init options for job templates (#15994) * Async method for populating default job templates from the variable adapter --------- Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>	2023-02-02 10:37:40 -05:00
jmwilkinson	37834dffda	Allow wildcard datacenters to be specified in job file (#11170 ) Also allows for default value of `datacenters = ["*"]`	2023-02-02 09:57:45 -05:00
Seth Hoenig	ca7ead191e	consul: restore consul token when reverting a job (#15996 ) * consul: reset consul token on job during registration of a reversion * e2e: add test for reverting a job with a consul service * cl: fixup cl entry	2023-02-01 14:02:45 -06:00
James Rasell	9e8325d63c	acl: fix a bug in token creation when parsing expiration TTLs. (#15999 ) The ACL token decoding was not correctly handling time duration syntax such as "1h" which forced people to use the nanosecond representation via the HTTP API. The change adds an unmarshal function which allows this syntax to be used, along with other styles correctly.	2023-02-01 17:43:41 +01:00
James Rasell	67acfd9f6b	acl: return 400 not 404 code when creating an invalid policy. (#16000 )	2023-02-01 17:40:15 +01:00
Mike Nomitch	80848b202e	Increases max variable size to 64KiB from 16KiB (#15983 )	2023-01-31 13:32:36 -05:00
stswidwinski	16eefbbf4d	GC: ensure no leakage of evaluations for batch jobs. (#15097 ) Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory. Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.	2023-01-31 13:32:14 -05:00
Jorge Marey	d1c9aad762	Rename fields on proxyConfig (#15541 ) * Change api Fields for expose and paths * Add changelog entry * changelog: add deprecation notes about connect fields * api: minor style tweaks --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-01-30 09:31:16 -06:00
Piotr Kazmierczak	14b53df3b6	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Seth Hoenig	074b76e3bf	consul: check for acceptable service identity on consul tokens (#15928 ) When registering a job with a service and 'consul.allow_unauthenticated=false', we scan the given Consul token for an acceptable policy or role with an acceptable policy, but did not scan for an acceptable service identity (which is backed by an acceptable virtual policy). This PR updates our consul token validation to also accept a matching service identity when registering a service into Consul. Fixes #15902	2023-01-27 18:15:51 -06:00
Tim Gross	881a4cfaff	metrics: Add remaining server RPC rate metrics (#15901 )	2023-01-27 08:29:53 -05:00
Tim Gross	ce3eef8037	metrics: Add rate metrics to Client CSI endpoints (#15905 ) Also tightens up authentication for these endpoints by enforcing the server certificate name is valid. We protect these endpoints currently by mTLS and can't use an auth token because these endpoints are (uniquely) called by the leader and followers for a given node won't have the leader's ephemeral ACL token. Add a certificate name check that requests come from a server and not a client, because no client should ever send these RPCs directly.	2023-01-26 16:40:58 -05:00
Tim Gross	bed8716e44	metrics: Add metrics to unauthenticated endpoints (#15899 )	2023-01-26 15:05:51 -05:00
Tim Gross	5e75ea9fb3	metrics: Add RPC rate metrics to endpoints that validate TLS names (#15900 )	2023-01-26 15:04:25 -05:00
Yorick Gersie	2a5c423ae0	Allow per_alloc to be used with host volumes (#15780 ) Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. Ensure host volumes understand the concept of per_alloc	2023-01-26 09:14:47 -05:00
Piotr Kazmierczak	f4d6efe69f	acl: make auth method default across all types (#15869 )	2023-01-26 14:17:11 +01:00
James Rasell	5d33891910	sso: allow binding rules to create management ACL tokens. (#15860 ) * sso: allow binding rules to create management ACL tokens. * docs: update binding rule docs to detail management type addition.	2023-01-26 09:57:44 +01:00
Tim Gross	6677a103c2	metrics: measure rate of RPC requests that serve API (#15876 ) This changeset configures the RPC rate metrics that were added in #15515 to all the RPCs that support authenticated HTTP API requests. These endpoints already configured with pre-forwarding authentication in #15870, and a handful of others were done already as part of the proof-of-concept work. So this changeset is entirely copy-and-pasting one method call into a whole mess of handlers. Upcoming PRs will wire up pre-forwarding auth and rate metrics for the remaining set of RPCs that have no API consumers or aren't authenticated, in smaller chunks that can be more thoughtfully reviewed.	2023-01-25 16:37:24 -05:00
Luiz Aoqui	3479e2231f	core: enforce strict steps for clients reconnect (#15808 ) When a Nomad client that is running an allocation with `max_client_disconnect` set misses a heartbeat the Nomad server will update its status to `disconnected`. Upon reconnecting, the client will make three main RPC calls: - `Node.UpdateStatus` is used to set the client status to `ready`. - `Node.UpdateAlloc` is used to update the client-side information about allocations, such as their `ClientStatus`, task states etc. - `Node.Register` is used to upsert the entire node information, including its status. These calls are made concurrently and are also running in parallel with the scheduler. Depending on the order they run the scheduler may end up with incomplete data when reconciling allocations. For example, a client disconnects and its replacement allocation cannot be placed anywhere else, so there's a pending eval waiting for resources. When this client comes back the order of events may be: 1. Client calls `Node.UpdateStatus` and is now `ready`. 2. Scheduler reconciles allocations and places the replacement alloc to the client. The client is now assigned two allocations: the original alloc that is still `unknown` and the replacement that is `pending`. 3. Client calls `Node.UpdateAlloc` and updates the original alloc to `running`. 4. Scheduler notices too many allocs and stops the replacement. This creates unnecessary placements or, in a different order of events, may leave the job without any allocations running until the whole state is updated and reconciled. To avoid problems like this clients must update _all_ of its relevant information before they can be considered `ready` and available for scheduling. To achieve this goal the RPC endpoints mentioned above have been modified to enforce strict steps for nodes reconnecting: - `Node.Register` does not set the client status anymore. - `Node.UpdateStatus` sets the reconnecting client to the `initializing` status until it successfully calls `Node.UpdateAlloc`. These changes are done server-side to avoid the need of additional coordination between clients and servers. Clients are kept oblivious of these changes and will keep making these calls as they normally would. The verification of whether allocations have been updates is done by storing and comparing the Raft index of the last time the client missed a heartbeat and the last time it updated its allocations.	2023-01-25 15:53:59 -05:00
Tim Gross	f3f64af821	WI: allow workloads to use RPCs associated with HTTP API (#15870 ) This changeset allows Workload Identities to authenticate to all the RPCs that support HTTP API endpoints, for use with PR #15864. * Extends the work done for pre-forwarding authentication to all RPCs that support a HTTP API endpoint. * Consolidates the auth helpers used by the CSI, Service Registration, and Node endpoints that are currently used to support both tokens and client secrets. Intentionally excluded from this changeset: * The Variables endpoint still has custom handling because of the implicit policies. Ideally we'll figure out an efficient way to resolve those into real policies and then we can get rid of that custom handling. * The RPCs that don't currently support auth tokens (i.e. those that don't support HTTP endpoints) have not been updated with the new pre-forwarding auth We'll be doing this under a separate PR to support RPC rate metrics.	2023-01-25 14:33:06 -05:00
Tim Gross	cf9e5f3327	acl: Fix panic when bogus token is passed (#15863 ) If a consumer of the new `Authenticate` method gets passed a bogus token that's a correctly-shaped UUID, it will correctly get an identity without a ACL token. But most consumers will then panic when they consume this nil `ACLToken` for authorization. Because no API client should ever send a bogus auth token, update the `Authenticate` method to create the identity with remote IP (for metrics tracking) but also return an `ErrPermissionDenied`.	2023-01-25 10:03:17 -05:00
Tim Gross	055434cca9	add metric for count of RPC requests (#15515 ) Implement a metric for RPC requests with labels on the identity, so that administrators can monitor the source of requests within the cluster. This changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire up the remaining RPCs once we've threaded the new pre-forwarding authentication through the all. Note that metrics are measured after we forward but before we return any authentication error. This ensures that we only emit metrics on the server that actually serves the request. We'll perform rate limiting at the same place. Includes telemetry configuration to omit identity labels.	2023-01-24 11:54:20 -05:00
Tim Gross	2030d62920	implement pre-forwarding auth on select RPCs (#15513 ) In #15417 we added a new `Authenticate` method to the server that returns an `AuthenticatedIdentity` struct. This changeset implements this method for a small number of RPC endpoints that together represent all the various ways in which RPCs are sent, so that we can validate that we're happy with this approach.	2023-01-24 10:52:07 -05:00
Michael Schurter	ace5faf948	core: backoff considerably when worker is behind raft (#15523 ) Upon dequeuing an evaluation workers snapshot their state store at the eval's wait index or later. This ensures we process an eval at a point in time after it was created or updated. Processing an eval on an old snapshot could cause any number of problems such as: 1. Since job registration atomically updates an eval and job in a single raft entry, scheduling against indexes before that may not have the eval's job or may have an older version. 2. The older the scheduler's snapshot, the higher the likelihood something has changed in the cluster state which will cause the plan applier to reject the scheduler's plan. This could waste work or even cause eval's to be failed needlessly. However, the workers run in parallel with a new server pulling the cluster state from a peer. During this time, which may be many minutes long, the state store is likely far behind the minimum index required to process evaluations. This PR addresses this by adding an additional long backoff period after an eval is nacked. If the scheduler's indexes catches up within the additional backoff, it will unblock early to dequeue the next eval. When the server shuts down we'll get a `context.Canceled` error from the state store method. We need to bubble this error up so that other callers can detect it. Handle this case separately when waiting after dequeue so that we can warn on shutdown instead of throwing an ambiguous error message with just the text "canceled." While there may be more precise ways to block scheduling until the server catches up, this approach adds little risk and covers additional cases where a server may be temporarily behind due to a spike in load or a saturated network. For testing, we make the `raftSyncLimit` into a parameter on the worker's `run` method so that we can run backoff tests without waiting 30+ seconds. We haven't followed thru and made all the worker globals into worker parameters, because there isn't much use outside of testing, but we can consider that in the future. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-01-24 08:56:35 -05:00
Tim Gross	a51149736d	Rename `nomad.broker.total_blocked` metric (#15835 ) This changeset fixes a long-standing point of confusion in metrics emitted by the eval broker. The eval broker has a queue of "blocked" evals that are waiting for an in-flight ("unacked") eval of the same job to be completed. But this "blocked" state is not the same as the `blocked` status that we write to raft and expose in the Nomad API to end users. There's a second metric `nomad.blocked_eval.total_blocked` that refers to evaluations in that state. This has caused ongoing confusion in major customer incidents and even in our own documentation! (Fixed in this PR.) There's little functional change in this PR aside from the name of the metric emitted, but there's a bit refactoring to clean up the names in `eval_broker.go` so that there aren't name collisions and multiple names for the same state. Changes included are: * Everything that was previously called "pending" referred to entities that were associated witht he "ready" metric. These are all now called "ready" to match the metric. * Everything named "blocked" in `eval_broker.go` is now named "pending", except for a couple of comments that actually refer to blocked RPCs. * Added a note to the upgrade guide docs for 1.5.0. * Fixed the scheduling performance metrics docs because the description for `nomad.broker.total_blocked` was actually the description for `nomad.blocked_eval.total_blocked`.	2023-01-20 14:23:56 -05:00
Charlie Voiselle	5ea1d8a970	Add raft snapshot configuration options (#15522 ) * Add config elements * Wire in snapshot configuration to raft * Add hot reload of raft config * Add documentation for new raft settings * Add changelog	2023-01-20 14:21:51 -05:00

1 2 3 4 5 ...

4271 Commits