open-nomad

Author	SHA1	Message	Date
Tim Gross	577d96034d	test: deflake job endpoint registration test (#18170 ) We've seen test flakiness in the `TestJobEndpoint_Register_NonOverlapping` test, which asserts that we don't try to placed allocations for blocked evals until resources have been actually freed by setting the client status of the previous alloc to complete. The flaky assertion includes sorting the two allocations by CreateIndex and this appears to be a non-stable sort in the context of the test run, which results in failures that shouldn't exist. There's no reason to sort the allocations instead of just examining them by ID. This changeset does so.	2023-08-14 16:18:53 -04:00
Charlie Voiselle	bac4d112d1	[dep] bump golang.org/x/exp (#18102 ) There are some refactorings that have to be made in the getter and state where the api changed in `slices` * Bump golang.org/x/exp * Bump golang.org/x/exp in api * Update job_endpoint_test * [feedback] unexport sort function	2023-08-03 15:14:39 -04:00
Luiz Aoqui	ac08fc751b	node pools: apply node pool scheduler configuration (#17598 )	2023-06-21 20:31:50 -04:00
James Rasell	6ec4a69f47	scale: fixed a bug where evals could be created with wrong type. (#17092 ) The job scale RPC endpoint hard-coded the eval creation to use the type of service. This meant scaling events triggered on jobs of type batch would create evaluations with the wrong type, which does not seem to cause any problems, just confusion when correlating the two.	2023-05-05 14:46:10 +01:00
Luiz Aoqui	7b5a8f1fb0	Revert "hashicorp/go-msgpack v2 (#16810 )" (#17047 ) This reverts commit 8a98520d56eed3848096734487d8bd3eb9162a65.	2023-05-01 17:18:34 -04:00
James Rasell	4d2c1403c2	scale: do not allow scaling of jobs with type system. (#16969 )	2023-04-25 15:47:44 +01:00
Ian Fijolek	619f49afcf	hashicorp/go-msgpack v2 (#16810 ) * Upgrade from hashicorp/go-msgpack v1.1.5 to v2.1.0 Fixes #16808 * Update hashicorp/net-rpc-msgpackrpc to v2 to match go-msgpack * deps: use go-msgpack v2.0.0 go-msgpack v2.1.0 includes some code changes that we will need to investigate furthere to assess its impact on Nomad, so keeping this dependency on v2.0.0 for now since it's no-op. --------- Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-17 17:02:05 -04:00
Seth Hoenig	ba728f8f97	api: enable support for setting original job source (#16763 ) * api: enable support for setting original source alongside job This PR adds support for setting job source material along with the registration of a job. This includes a new HTTP endpoint and a new RPC endpoint for making queries for the original source of a job. The HTTP endpoint is /v1/job/<id>/submission?version=<version> and the RPC method is Job.GetJobSubmission. The job source (if submitted, and doing so is always optional), is stored in the job_submission memdb table, separately from the actual job. This way we do not incur overhead of reading the large string field throughout normal job operations. The server config now includes job_max_source_size for configuring the maximum size the job source may be, before the server simply drops the source material. This should help prevent Bad Things from happening when huge jobs are submitted. If the value is set to 0, all job source material will be dropped. * api: avoid writing var content to disk for parsing * api: move submission validation into RPC layer * api: return an error if updating a job submission without namespace or job id * api: be exact about the job index we associate a submission with (modify) * api: reword api docs scheduling * api: prune all but the last 6 job submissions * api: protect against nil job submission in job validation * api: set max job source size in test server * api: fixups from pr	2023-04-11 08:45:08 -05:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Luiz Aoqui	adf147cb36	acl: update job eval requirement to `submit-job` (#16463 ) The job evaluate endpoint creates a new evaluation for the job which is a write operation. This change modifies the necessary capability from `read-job` to `submit-job` to better reflect this.	2023-03-13 17:13:54 -04:00
Tim Gross	3c0eaba9db	remove backcompat support for non-atomic job registration (#16305 ) In Nomad 0.12.1 we introduced atomic job registration/deregistration, where the new eval was written in the same raft entry. Backwards-compatibility checks were supposed to have been removed in Nomad 1.1.0, but we missed that. This is long safe to remove.	2023-03-03 15:52:22 -05:00
Tim Gross	0e1b554299	handle `FSM.Apply` errors in `raftApply` (#16287 ) The signature of the `raftApply` function requires that the caller unwrap the first returned value (the response from `FSM.Apply`) to see if it's an error. This puts the burden on the caller to remember to check two different places for errors, and we've done so inconsistently. Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error as the error value. Similar work was done in Consul in https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate and surfaces a few minor bugs in the process: * job deregistrations of already-GC'd jobs were still emitting evals * reconcile job summaries does not return scheduler errors * node updates did not report errors associated with inconsistent service discovery or CSI plugin states Note that although _most_ of the `FSM.Apply` functions return only errors (which makes it tempting to remove the first return value entirely), there are few that return `bool` for some reason and Variables relies on the response value for proper CAS checking.	2023-03-02 13:51:09 -05:00
Alessio Perugini	4e9ec24b22	Allow configurable range of Job priorities (#16084 )	2023-02-17 09:23:13 -05:00
Piotr Kazmierczak	14b53df3b6	renamed stanza to block for consistency with other projects (#15941 )	2023-01-30 15:48:43 +01:00
Phil Renaud	ffb4c63af7	[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833 ) * Adds meta to job list stub and displays a pack logo on the jobs index * Changelog * Modifying struct for optional meta param * Explicitly ask for meta anytime I look up a job from index or job page * Test case for the endpoint * adding meta field to API struct and ommitting from response if empty * passthru method added to api/jobs.list * Meta param listed in docs for jobs list * Update api/jobs.go Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-11-02 16:58:24 -04:00
Seth Hoenig	5e38a0e82c	cleanup: rename Equals to Equal for consistency (#14759 )	2022-10-10 09:28:46 -05:00
Michael Schurter	0df5c7d5ae	test: fix flaky test (#14713 ) Need to wait for Stop evals to be processed before you can expect subsequent RPCs to see the alloc's DesiredStatus=stop.	2022-09-27 10:36:16 -07:00
Mahmood Ali	a9d5e4c510	scheduler: stopped-yet-running allocs are still running (#10446 ) * scheduler: stopped-yet-running allocs are still running * scheduler: test new stopped-but-running logic * test: assert nonoverlapping alloc behavior Also add a simpler Wait test helper to improve line numbers and save few lines of code. * docs: tried my best to describe #10446 it's not concise... feedback welcome * scheduler: fix test that allowed overlapping allocs * devices: only free devices when ClientStatus is terminal * test: output nicer failure message if err==nil Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-09-13 12:52:47 -07:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Luiz Aoqui	a8cc633156	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
James Rasell	010acce59f	job_hooks: add implicit constraint when using Consul for services. (#12602 )	2022-04-20 14:09:13 +02:00
Luiz Aoqui	ab7eb5de6e	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
Tim Gross	b7075f04fd	CSI: enforce single access mode at validation time (#12337 ) A volume that has single-use access mode is feasibility checked during scheduling to ensure that only a single reader or writer claim exists. However, because feasibility checking is done one alloc at a time before the plan is written, a job that's misconfigured to have count > 1 that mounts one of these volumes will pass feasibility checking. Enforce the check at validation time instead to prevent us from even trying to evaluation a job that's misconfigured this way.	2022-03-23 09:21:26 -04:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Luiz Aoqui	ab8ce87bba	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
James Rasell	ca6ba2e047	rpc: add job service registration list RPC endpoint.	2022-03-03 11:26:14 +01:00
Conor Evans	8d622797af	replace 'a alloc' with 'an alloc' where appropriate (#11792 )	2022-01-10 11:59:46 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Luiz Aoqui	9d6842dd4d	Don't emit scaling event error when a deployment is underway (#11556 )	2021-11-23 10:20:18 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
James Rasell	99955eb80f	Merge pull request #11426 from hashicorp/b-set-dereg-eval-priority-correctly rpc: set the deregistration eval priority to the job priority.	2021-11-05 15:53:10 +01:00
James Rasell	ac9268a429	rpc: set the job scale eval priority to the job priority.	2021-11-02 12:57:53 +01:00
James Rasell	afb6913428	rpc: set the deregistration eval priority to the job priority. Previously when creating an eval for job deregistration, the eval priority was set to the default value irregardless of the job priority. In situations where an operator would want to deregister a high priority job so they could re-register; the evaluation may get blocked for some time on a busy cluster because of the deregsiter priority. If a job had a lower than default priority and was deregistered, the deregister eval would get a priority higher than that of the job. If we attempted to register another job with a higher priority than this, but still below the default, the deregister would be actioned before the register. Both situations described above seem incorrect and unexpected from a user prespective. This fix modifies to behaviour to set the deregister eval priority to that of the job, if available. Otherwise the default value is still used.	2021-11-02 09:11:44 +01:00
Alex Munda	557a227de1	Match idempotency key on all child jobs and return existing job when idempotency keys match.	2021-07-02 14:08:46 -05:00
Alex Munda	baba8fe7df	Update tests after moving idempotency token to WriteOptions	2021-07-01 08:48:57 -05:00
Alex Munda	ca86c7ba0c	Add idempotency token to dispatch request instead of special meta key	2021-06-29 15:59:23 -05:00
Alex Munda	122136b657	Always allow idempotency key meta. Tests for idempotent dispatch	2021-06-29 10:30:04 -05:00
Seth Hoenig	87be8c4c4b	consul: correctly check consul acl token namespace when using consul oss This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship check when using Consul OSS (or Consul ENT without namespace support). Nomad v1.1.0 introduced a regression where Nomad would fail the validation when submitting Connect jobs and allow_unauthenticated set to true, with Consul OSS - because it would do the namespace check against the Consul ACL token assuming the "default" namespace, which does not work because Consul OSS does not have namespaces. Instead of making the bad assumption, expand the namespace check to handle each special case explicitly. Fixes #10718	2021-06-08 13:55:57 -05:00
Tim Gross	c01d661c98	csi: validate `volume` block has `attachment_mode` and `access_mode` The `attachment_mode` and `access_mode` fields are required for CSI volumes. The `mount_options` block is only allowed for CSI volumes.	2021-06-03 16:07:19 -04:00
Mahmood Ali	52d881f567	Allow configuring memory oversubscription (#10466 ) Cluster operators want to have better control over memory oversubscription and may want to enable/disable it based on their experience. This PR adds a scheduler configuration field to control memory oversubscription. It's additional field that can be set in the [API via Scheduler Config](https://www.nomadproject.io/api-docs/operator/scheduler), or [the agent server config](https://www.nomadproject.io/docs/configuration/server#configuring-scheduler-config). I opted to have the memory oversubscription be an opt-in, but happy to change it. To enable it, operators should call the API with: ```json { "MemoryOversubscriptionEnabled": true } ``` If memory oversubscription is disabled, submitting jobs specifying `memory_max` will get a "Memory oversubscription is not enabled" warnings, but the jobs will be accepted without them accessing the additional memory. The warning message is like: ``` $ nomad job run /tmp/j Job Warnings: 1 warning(s): * Memory oversubscription is not enabled; Task cache.redis memory_max value will be ignored ==> Monitoring evaluation "7c444157" Evaluation triggered by job "example" ==> Monitoring evaluation "7c444157" Evaluation within deployment: "9d826f13" Allocation "aa5c3cad" created: node "9272088e", group "cache" Evaluation status changed: "pending" -> "complete" ==> Evaluation "7c444157" finished with status "complete" # then you can examine the Alloc AllocatedResources to validate whether the task is allowed to exceed memory: $ nomad alloc status -json aa5c3cad \| jq '.AllocatedResources.Tasks["redis"].Memory' { "MemoryMB": 256, "MemoryMaxMB": 0 } ```	2021-04-29 22:09:56 -04:00
Mahmood Ali	cf24a9eaaf	api: /v1/jobs always include namespaces (#10434 ) Add Namespace as a top-level field in `/v1/jobs` stub. The `/v1/jobs` endpoint already includes the namespace under `JobSummary`, though the API is odd, as typically the job ID and Namespace are in the same level, and the oddity complicates the UI frontend development. The downside of adding it is redundant field, that makes the response body a bit bigger, specially for clusters with large jobs. Though, it should compress nicely and I expect the overhead to be small to overall response size. The benefit of a cleaner and more consistent API seem worth it. Fixes #10431	2021-04-23 16:36:54 -04:00
Seth Hoenig	f17ba33f61	consul: plubming for specifying consul namespace in job/group This PR adds the common OSS changes for adding support for Consul Namespaces, which is going to be a Nomad Enterprise feature. There is no new functionality provided by this changeset and hopefully no new bugs.	2021-04-05 10:03:19 -06:00
Mahmood Ali	e643742a38	Add a test for parameterized summary counts	2021-03-25 11:27:09 -04:00
Mahmood Ali	1d48433356	server: handle invalid jobs in expose handler hook (#10154 ) The expose handler hook must handle if the submitted job is invalid. Without this validation, the rpc handler panics on invalid input. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-03-10 09:12:46 -05:00
Chris Baker	3546469205	nicer error message	2021-01-08 21:13:29 +00:00

1 2 3 4 5 ...

256 commits