open-nomad

Author	SHA1	Message	Date
Tim Gross	b62da8fc9a	docs: improve changelog for PR #11783 (#11818 )	2022-01-11 11:54:12 -05:00
Tim Gross	1a5973184e	docs: changelog for PR #11783 (#11812 )	2022-01-10 16:39:21 -05:00
Alessandro De Blasis	e647549ecf	metrics: added `mapped_file` metric (#11500 ) Signed-off-by: Alessandro De Blasis <alex@deblasis.net> Co-authored-by: Nate <37554478+servusdei2018@users.noreply.github.com>	2022-01-10 15:35:19 -05:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Tim Gross	32f150d469	docs: new scheduler metrics (#11790 ) * Fixed name of `nomad.scheduler.allocs.reschedule` metric * Added new metrics to metrics reference documentation * Expanded definitions of "waiting" metrics * Changelog entry for #10236 and #10237	2022-01-07 09:51:15 -05:00
Charlie Voiselle	98a240cd99	Make number of scheduler workers reloadable (#11593 ) ## Development Environment Changes * Added stringer to build deps ## New HTTP APIs * Added scheduler worker config API * Added scheduler worker info API ## New Internals * (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume() * Update shutdown to use context * Add mutex for contended server data - `workerLock` for the `workers` slice - `workerConfigLock` for the `Server.Config.NumSchedulers` and `Server.Config.EnabledSchedulers` values ## Other * Adding docs for scheduler worker api * Add changelog message Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-01-06 11:56:13 -05:00
Michael Schurter	1af8d47de2	Merge pull request #11744 from hashicorp/b-node-copy Fix Node.Copy()	2022-01-05 17:01:53 -08:00
Jai	c7e581d879	Merge pull request #11590 from hashicorp/e-ui/breadcrumbs-service Refactor: Breadcrumbs Service	2022-01-05 17:46:48 -05:00
Tim Gross	51f512a3e6	csi: reap unused volume claims at leadership transitions (#11776 ) When `volumewatcher.Watcher` starts on the leader, it starts a watch on every volume and triggers a reap of unused claims on any change to that volume. But if a reaping is in-flight during leadership transitions, it will fail and the event that triggered the reap will be dropped. Perform one reap of unused claims at the start of the watcher so that leadership transitions don't drop this event.	2022-01-05 11:40:20 -05:00
Arkadiusz	ffb174b596	Fix log streaming missing frames (#11721 ) Perform one more read after receiving cancel when streaming file from the allocation API	2022-01-04 14:07:16 -05:00
Tim Gross	2806dc2bd7	docs/tests for multiple HTTP address config (#11760 )	2022-01-03 10:17:13 -05:00
Tim Gross	395628efe1	api: paginate deployment list and accept wildcard namespace (#11743 ) Add `per_page` and `next_token` handling to `Deployment.List` RPC, and allow the use of a wildcard namespace for namespace filtering.	2022-01-03 08:36:02 -05:00
Michael Schurter	c4d03815e1	add changelog for Node.Copy fix	2021-12-23 12:34:05 -08:00
Tim Gross	265e488ab4	task runner: fix goroutine leak in prestart hook (#11741 ) The task runner prestart hooks take a `joincontext` so they have the option to exit early if either of two contexts are canceled: from killing the task or client shutdown. Some tasks exit without being shutdown from the server, so neither of the joined contexts ever gets canceled and we leak the `joincontext` (48 bytes) and its internal goroutine. This primarily impacts batch jobs and any task that fails or completes early such as non-sidecar prestart lifecycle tasks. Cancel the `joincontext` after the prestart call exits to fix the leak.	2021-12-23 11:50:51 -05:00
Luiz Aoqui	4bdd2c84e3	fix host network reserved port fingerprint (#11728 )	2021-12-22 15:29:54 -05:00
Luiz Aoqui	40093f97cd	api: support namespace wildcard in CSI volume list (#11724 )	2021-12-21 17:19:45 -05:00
Shishir	65eab35412	Add support for setting pids_limit in docker plugin config. (#11526 )	2021-12-21 13:31:34 -05:00
Tim Gross	b0c3b99b03	scheduler: fix quadratic performance with spread blocks (#11712 ) When the scheduler picks a node for each evaluation, the `LimitIterator` provides at most 2 eligible nodes for the `MaxScoreIterator` to choose from. This keeps scheduling fast while producing acceptable results because the results are binpacked. Jobs with a `spread` block (or node affinity) remove this limit in order to produce correct spread scoring. This means that every allocation within a job with a `spread` block is evaluated against _all_ eligible nodes. Operators of large clusters have reported that jobs with `spread` blocks that are eligible on a large number of nodes can take longer than the nack timeout to evaluate (60s). Typical evaluations are processed in milliseconds. In practice, it's not necessary to evaluate every eligible node for every allocation on large clusters, because the `RandomIterator` at the base of the scheduler stack produces enough variation in each pass that the likelihood of an uneven spread is negligible. Note that feasibility is checked before the limit, so this only impacts the number of _eligible_ nodes available for scoring, not the total number of nodes. This changeset sets the iterator limit for "large" `spread` block and node affinity jobs to be equal to the number of desired allocations. This brings an example problematic job evaluation down from ~3min to ~10s. The included tests ensure that we have acceptable spread results across a variety of large cluster topologies.	2021-12-21 10:10:01 -05:00
Jai Bhagat	1887e34b75	chore: edit mirage scenario to populate csi	2021-12-21 07:42:23 -05:00
Luiz Aoqui	3d3b5a2c8e	changelog: add entries for #11555 , #11557 , and #11687 (#11706 )	2021-12-20 13:45:20 -05:00
Tim Gross	e046bb31e9	api: respect wildcard in evaluations list API (#11710 )	2021-12-20 12:23:50 -05:00
Jai	93d5ef596f	Merge pull request #11545 from hashicorp/f-ui/add-alloc-filters-on-table Add Allocation Filters in Client View	2021-12-18 09:39:53 -05:00
Jai	2b7fb2c5bd	Merge pull request #11544 from hashicorp/f-ui/add-filters-to-allocs Add filters to Allocations	2021-12-18 09:38:28 -05:00
Luiz Aoqui	e067b3d75f	changelog: fix entry for #11544	2021-12-17 18:57:54 -05:00
Luiz Aoqui	a1c4536523	changelog: add entry for #11545	2021-12-17 18:49:56 -05:00
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Tim Gross	f8a133a810	cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678 ) When a cluster doesn't have a leader, the `nomad operator debug` command can safely use stale queries to gracefully degrade the consistency of almost all its queries. The query parameter for these API calls was not being set by the command. Some `api` package queries do not include `QueryOptions` because they target a specific agent, but they can potentially be forwarded to other agents. If there is no leader, these forwarded queries will fail. Provide methods to call these APIs with `QueryOptions`.	2021-12-15 10:44:03 -05:00
Luiz Aoqui	05bb65779c	api: return error when `LicenseGet` status is not `200` (#11644 )	2021-12-14 19:47:09 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	5a68373e7f	Version 1.2.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJhs7QgAAoJELC0QQl2hbZ2IQQP/3aKKgsptB0IPGx4vAAlIfMY IUyj9KdQ0SRN4B0C4h/T3CxqIhFPGmrV2RkOtEpDyBJuTUbH4FBjCscsKePFON+g Kfk/SoP05AQSksXFiKVK99UxUjg43SdqvatwnmLH4hafapbq5mMouTkBho+i05xK n6853DOwoq5qsPs6ihwRddRtpduozBKWLBMoBUm3syf8erWX0dafU5WszvLvG16R YJxTNr0nwQFhKDfY1CFUHJglj1s521poA9Zj6Xa1fNnIQ2JdKW1kElPUXmra1w7X 0Wussv4fgJAetTO2bz0+IeuQf+EzxQ7vKDklt4ORypXkwiC9h7x2ZNCKRL+GReyU wUnzccXBfOsgpvW5EAoNXCGOQa6c2+uvHAAd62AAqljLh+B+yDJysvPobihfbSsu E2kXJEd3N6GndDjFfzUaYPhhGkvBaPUTNxybSaaREShJ7a7c8tedxfMpNYt1RwGz llJEoeZZketwjEFLEHp9xjNeqXdAXyrqCkluMvy+foU72HaRPFc0tlDnRsqirZ0p hBxLxPp5oM4V/RegTa3z8P4J0kMSvCdCE4bPNgyiEJDmvxRYDVk5YorLTCDTGrWU 4WO7fue0bOwhGBYWRfAWzfpoHrCvRLto2vVdtBaFwlzmGP8j/QjM8ANrGyiJeiuY IPZSM93pAAcWQEV9id/E =G3In -----END PGP SIGNATURE----- Merge tag 'v1.2.3' into merge-release-1.2.3-branch Version 1.2.3	2021-12-13 10:12:07 -05:00
Tim Gross	46e1d29298	golang security update 1.17.5	2021-12-10 13:50:22 -05:00
Tim Gross	624ecab901	evaluations list pagination and filtering (#11648 ) API queries can request pagination using the `NextToken` and `PerPage` fields of `QueryOptions`, when supported by the underlying API. Add a `NextToken` field to the `structs.QueryMeta` so that we have a common field across RPCs to tell the caller where to resume paging from on their next API call. Include this field on the `api.QueryMeta` as well so that it's available for future versions of List HTTP APIs that wrap the response with `QueryMeta` rather than returning a simple list of structs. In the meantime callers can get the `X-Nomad-NextToken`. Add pagination to the `Eval.List` RPC by checking for pagination token and page size in `QueryOptions`. This will allow resuming from the last ID seen so long as the query parameters and the state store itself are unchanged between requests. Add filtering by job ID or evaluation status over the results we get out of the state store. Parse the query parameters of the `Eval.List` API into the arguments expected for filtering in the RPC call.	2021-12-10 13:43:03 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Vyacheslav Morov	6a244f18ad	cli: Add var args to plan output. (#11631 )	2021-12-07 10:43:52 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Derek Strickland	8595e3ed6a	Add change log entry for PR 11592 (#11609 )	2021-12-02 16:18:56 -05:00
Tim Gross	5097546153	changelog: new metrics in Nomad Enterprise (#11591 ) This changelog is for a PR that landed in Nomad Enterprise only.	2021-12-01 09:15:12 -05:00
Michael Schurter	3d248153f4	Merge pull request #11579 from hashicorp/b-getscalingpolicy-rpc-index-response rpc: fix scaling policy get index response when policy is found.	2021-11-30 10:43:20 -08:00
Tim Gross	6e1311a265	client: respect `client_auto_join` after connection loss (#11585 ) The `consul.client_auto_join` configuration block tells the Nomad client whether to use Consul service discovery to find Nomad servers. By default it is set to `true`, but contrary to the documentation it was only respected during the initial client registration. If a client missed a heartbeat, failed a `Node.UpdateStatus` RPC, or if there was no Nomad leader, the client would fallback to Consul even if `client_auto_join` was set to `false`. This changeset returns early from the client's trigger for Consul discovery if the `client_auto_join` field is set to `false`.	2021-11-30 13:20:42 -05:00
James Rasell	a9a624574f	changelog: add entry for #11579	2021-11-26 11:16:17 +01:00
Tim Gross	74768eb7d3	scheduler: fix panic in system jobs when nodes filtered by class (#11565 ) In the system scheduler, if a subset of clients are filtered by class, we hit a code path where the `AllocMetric` has been copied, but the `Copy` method does not instantiate the various maps. This leads to an assignment to a nil map. This changeset ensures that the maps are non-nil before continuing. The `Copy` method relies on functions in the `helper` package that all return nil slices or maps when passed zero-length inputs. This changeset to fix the panic bug intentionally defers updating those functions because it'll have potential impact on memory usage. See https://github.com/hashicorp/nomad/issues/11564 for more details.	2021-11-24 12:59:15 -05:00
Tim Gross	ba38008596	scheduler: fix panic in system jobs when nodes filtered by class (#11565 ) In the system scheduler, if a subset of clients are filtered by class, we hit a code path where the `AllocMetric` has been copied, but the `Copy` method does not instantiate the various maps. This leads to an assignment to a nil map. This changeset ensures that the maps are non-nil before continuing. The `Copy` method relies on functions in the `helper` package that all return nil slices or maps when passed zero-length inputs. This changeset to fix the panic bug intentionally defers updating those functions because it'll have potential impact on memory usage. See https://github.com/hashicorp/nomad/issues/11564 for more details.	2021-11-24 12:28:47 -05:00
Jai Bhagat	9dc6ad7b7d	chore: changelog entry	2021-11-23 18:28:33 -05:00
Luiz Aoqui	9d6842dd4d	Don't emit scaling event error when a deployment is underway (#11556 )	2021-11-23 10:20:18 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Luiz Aoqui	d3c1a03edd	Version 1.2.1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJhl94SAAoJELC0QQl2hbZ2pqoP/R7HyOxvealo5MBJcG4mGiWT Hsu9VXpYKDWn0GSXd3JmqYWH7tIwFMXispZ7pMlDLieypW3UpMYIbIquaePxOaRL yhlc0CLT7JDsFPx8Puv1fgKXaS3EfFyJlYx437bhCQ+K0k2+1n3EOhrzU/DQ4j8V D5qxlkZh6IK6brIJ54NivGzTxtzGGvIGXCrDPolX3cwoBtyO/pbecfEkRlN2xwxl P68l52+Jit3lK2Cljh4Kr1qFj8voHPjYUTXGas8ZkIVrx9l4fb6CHib2y3hy4bRR qwXT4keWc8bxtLQ7vtetGBAXp4UKJigziE4imhHAttBN9th2/Oy0qSQCNX3xELJC Jwgc+N+ON63QI2sP/8FWvmeUrJpASRITYl/Gr8uOR6n1PacrBhFT9OV4VMkte1ua jS/WF/7k21NZYqZca+thvN12wmw/gSEAEeCHH5kR3vPLeV6FdanhKLjufMNuMShc UKJCEZw1/Lyux1XkLqMPoZ4DCak8/HskupQoLNsekF1Uki8ObU4as7GERedxqkj6 i2+1QIQMqvviskOwT0QOWm4RFXjRQsIK8uUfXzHHWDMzDhvnGjB0eWVMLAj4/rTe 46yUP4kdarFkxwkDmLEyoogdD35wC4Xc8Y8IynzUTN77pOWID5QEyFZVaaBB4NR3 wNowUJGrNkxEYXwGSkjh =Zuw2 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmGbu3sACgkQKfRZwNnL tXMx4BAAksQ07tSoOku8zDwx2JpoiNApoYhMLlfJ4S3Mw+RYtbayAMRyA08GG56I U85XJB/Z2CzliYL/Nya1e3z6Gyn92V0iD9u7N1xEAPt8PdyiXqIBZn1rWoiCcnMO C3f2aRGhLZMVOZG0v7fgbh1PkhJt4MLcRQE9nn5ojPvFzW9bL0Iz7lc9IxHQtaU0 rANDcXdj3IhiOdEgjtO++Qhdeu3t2SBhT2xFnlJ3gXC2q/aY1a2C7BYdlSxtw0JU nKpxvBTsB7rINGcYxhXZlckui5YLL4BX11XqsYhUTMC+33vxE5HNty1ANc1+SNyO 0iHp0yc5J6MCLuiZ/2sBek2tC+KHCufb+qEIqPmBpcWPJRT8HjginLxj/HyL2TQc pLF9XxhYKvv0sm3Zr3Ima5kqWgayph3XhQ73hKs9f7SLfErr6qr4XaI8egZA4OTG 0QGmY/61UlAdsz5tUvIGRWYD5rqXyXIYnUprldPSQdeZ0o2GjX7T0GZ934O5uHfE Ne73GafGn8JaGxH9+AEHMJAVpkrzWR1wrExL3kGJ8NF40HlsYofIuhTkZqMKX3EH 7KfefSJW1NQAGeAEwjtvzhmUiM0cVoCWGd4COxX1G3oJ0o8gZ3RklDEA4Pa9C0rO pBW/KIckPpGieGvPaA3mqmXDjx6oOaxPi9wd5TniBHh43pgrASo= =KVce -----END PGP SIGNATURE----- Merge tag 'v1.2.1' into merge-release-1.2.1-branch Version 1.2.1	2021-11-22 10:47:04 -05:00
Tim Gross	fc1d4814d9	qemu: add `args_allowlist` to sandbox VM command line inputs The QEMU driver allows arbitrary command line options, but many of these options give access to host resources that operators may not want to expose such as devices. Add an optional allowlist to the plugin configuration so that operators can limit the resources for QEMU.	2021-11-19 11:11:52 -05:00
Tim Gross	7f6fca6db9	changelog batch (#11517 )	2021-11-17 11:24:32 -05:00
Tim Gross	e729133134	api: return 404 for alloc FS list/stat endpoints (#11482 ) * api: return 404 for alloc FS list/stat endpoints If the alloc filesystem doesn't have a file requested by the List Files or Stat File API, we currently return a HTTP 500 error with the expected "file not found" error message. Return a HTTP 404 error instead. * update FS Handler Previously the FS handler would interpret a 500 status as a 404 in the adapter layer by checking if the response body contained the text or is the response status was 500 and then throw an error code for 404. Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>	2021-11-17 11:15:07 -05:00

1 2 3 4

195 commits