open-nomad

Author	SHA1	Message	Date
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Tim Gross	f8a133a810	cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678 ) When a cluster doesn't have a leader, the `nomad operator debug` command can safely use stale queries to gracefully degrade the consistency of almost all its queries. The query parameter for these API calls was not being set by the command. Some `api` package queries do not include `QueryOptions` because they target a specific agent, but they can potentially be forwarded to other agents. If there is no leader, these forwarded queries will fail. Provide methods to call these APIs with `QueryOptions`.	2021-12-15 10:44:03 -05:00
Luiz Aoqui	05bb65779c	api: return error when `LicenseGet` status is not `200` (#11644 )	2021-12-14 19:47:09 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	5a68373e7f	Version 1.2.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJhs7QgAAoJELC0QQl2hbZ2IQQP/3aKKgsptB0IPGx4vAAlIfMY IUyj9KdQ0SRN4B0C4h/T3CxqIhFPGmrV2RkOtEpDyBJuTUbH4FBjCscsKePFON+g Kfk/SoP05AQSksXFiKVK99UxUjg43SdqvatwnmLH4hafapbq5mMouTkBho+i05xK n6853DOwoq5qsPs6ihwRddRtpduozBKWLBMoBUm3syf8erWX0dafU5WszvLvG16R YJxTNr0nwQFhKDfY1CFUHJglj1s521poA9Zj6Xa1fNnIQ2JdKW1kElPUXmra1w7X 0Wussv4fgJAetTO2bz0+IeuQf+EzxQ7vKDklt4ORypXkwiC9h7x2ZNCKRL+GReyU wUnzccXBfOsgpvW5EAoNXCGOQa6c2+uvHAAd62AAqljLh+B+yDJysvPobihfbSsu E2kXJEd3N6GndDjFfzUaYPhhGkvBaPUTNxybSaaREShJ7a7c8tedxfMpNYt1RwGz llJEoeZZketwjEFLEHp9xjNeqXdAXyrqCkluMvy+foU72HaRPFc0tlDnRsqirZ0p hBxLxPp5oM4V/RegTa3z8P4J0kMSvCdCE4bPNgyiEJDmvxRYDVk5YorLTCDTGrWU 4WO7fue0bOwhGBYWRfAWzfpoHrCvRLto2vVdtBaFwlzmGP8j/QjM8ANrGyiJeiuY IPZSM93pAAcWQEV9id/E =G3In -----END PGP SIGNATURE----- Merge tag 'v1.2.3' into merge-release-1.2.3-branch Version 1.2.3	2021-12-13 10:12:07 -05:00
Tim Gross	46e1d29298	golang security update 1.17.5	2021-12-10 13:50:22 -05:00
Tim Gross	624ecab901	evaluations list pagination and filtering (#11648 ) API queries can request pagination using the `NextToken` and `PerPage` fields of `QueryOptions`, when supported by the underlying API. Add a `NextToken` field to the `structs.QueryMeta` so that we have a common field across RPCs to tell the caller where to resume paging from on their next API call. Include this field on the `api.QueryMeta` as well so that it's available for future versions of List HTTP APIs that wrap the response with `QueryMeta` rather than returning a simple list of structs. In the meantime callers can get the `X-Nomad-NextToken`. Add pagination to the `Eval.List` RPC by checking for pagination token and page size in `QueryOptions`. This will allow resuming from the last ID seen so long as the query parameters and the state store itself are unchanged between requests. Add filtering by job ID or evaluation status over the results we get out of the state store. Parse the query parameters of the `Eval.List` API into the arguments expected for filtering in the RPC call.	2021-12-10 13:43:03 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Vyacheslav Morov	6a244f18ad	cli: Add var args to plan output. (#11631 )	2021-12-07 10:43:52 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Derek Strickland	8595e3ed6a	Add change log entry for PR 11592 (#11609 )	2021-12-02 16:18:56 -05:00
Tim Gross	5097546153	changelog: new metrics in Nomad Enterprise (#11591 ) This changelog is for a PR that landed in Nomad Enterprise only.	2021-12-01 09:15:12 -05:00
Michael Schurter	3d248153f4	Merge pull request #11579 from hashicorp/b-getscalingpolicy-rpc-index-response rpc: fix scaling policy get index response when policy is found.	2021-11-30 10:43:20 -08:00
Tim Gross	6e1311a265	client: respect `client_auto_join` after connection loss (#11585 ) The `consul.client_auto_join` configuration block tells the Nomad client whether to use Consul service discovery to find Nomad servers. By default it is set to `true`, but contrary to the documentation it was only respected during the initial client registration. If a client missed a heartbeat, failed a `Node.UpdateStatus` RPC, or if there was no Nomad leader, the client would fallback to Consul even if `client_auto_join` was set to `false`. This changeset returns early from the client's trigger for Consul discovery if the `client_auto_join` field is set to `false`.	2021-11-30 13:20:42 -05:00
James Rasell	a9a624574f	changelog: add entry for #11579	2021-11-26 11:16:17 +01:00
Tim Gross	74768eb7d3	scheduler: fix panic in system jobs when nodes filtered by class (#11565 ) In the system scheduler, if a subset of clients are filtered by class, we hit a code path where the `AllocMetric` has been copied, but the `Copy` method does not instantiate the various maps. This leads to an assignment to a nil map. This changeset ensures that the maps are non-nil before continuing. The `Copy` method relies on functions in the `helper` package that all return nil slices or maps when passed zero-length inputs. This changeset to fix the panic bug intentionally defers updating those functions because it'll have potential impact on memory usage. See https://github.com/hashicorp/nomad/issues/11564 for more details.	2021-11-24 12:59:15 -05:00
Tim Gross	ba38008596	scheduler: fix panic in system jobs when nodes filtered by class (#11565 ) In the system scheduler, if a subset of clients are filtered by class, we hit a code path where the `AllocMetric` has been copied, but the `Copy` method does not instantiate the various maps. This leads to an assignment to a nil map. This changeset ensures that the maps are non-nil before continuing. The `Copy` method relies on functions in the `helper` package that all return nil slices or maps when passed zero-length inputs. This changeset to fix the panic bug intentionally defers updating those functions because it'll have potential impact on memory usage. See https://github.com/hashicorp/nomad/issues/11564 for more details.	2021-11-24 12:28:47 -05:00
Jai Bhagat	9dc6ad7b7d	chore: changelog entry	2021-11-23 18:28:33 -05:00
Luiz Aoqui	9d6842dd4d	Don't emit scaling event error when a deployment is underway (#11556 )	2021-11-23 10:20:18 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Luiz Aoqui	d3c1a03edd	Version 1.2.1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJhl94SAAoJELC0QQl2hbZ2pqoP/R7HyOxvealo5MBJcG4mGiWT Hsu9VXpYKDWn0GSXd3JmqYWH7tIwFMXispZ7pMlDLieypW3UpMYIbIquaePxOaRL yhlc0CLT7JDsFPx8Puv1fgKXaS3EfFyJlYx437bhCQ+K0k2+1n3EOhrzU/DQ4j8V D5qxlkZh6IK6brIJ54NivGzTxtzGGvIGXCrDPolX3cwoBtyO/pbecfEkRlN2xwxl P68l52+Jit3lK2Cljh4Kr1qFj8voHPjYUTXGas8ZkIVrx9l4fb6CHib2y3hy4bRR qwXT4keWc8bxtLQ7vtetGBAXp4UKJigziE4imhHAttBN9th2/Oy0qSQCNX3xELJC Jwgc+N+ON63QI2sP/8FWvmeUrJpASRITYl/Gr8uOR6n1PacrBhFT9OV4VMkte1ua jS/WF/7k21NZYqZca+thvN12wmw/gSEAEeCHH5kR3vPLeV6FdanhKLjufMNuMShc UKJCEZw1/Lyux1XkLqMPoZ4DCak8/HskupQoLNsekF1Uki8ObU4as7GERedxqkj6 i2+1QIQMqvviskOwT0QOWm4RFXjRQsIK8uUfXzHHWDMzDhvnGjB0eWVMLAj4/rTe 46yUP4kdarFkxwkDmLEyoogdD35wC4Xc8Y8IynzUTN77pOWID5QEyFZVaaBB4NR3 wNowUJGrNkxEYXwGSkjh =Zuw2 -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmGbu3sACgkQKfRZwNnL tXMx4BAAksQ07tSoOku8zDwx2JpoiNApoYhMLlfJ4S3Mw+RYtbayAMRyA08GG56I U85XJB/Z2CzliYL/Nya1e3z6Gyn92V0iD9u7N1xEAPt8PdyiXqIBZn1rWoiCcnMO C3f2aRGhLZMVOZG0v7fgbh1PkhJt4MLcRQE9nn5ojPvFzW9bL0Iz7lc9IxHQtaU0 rANDcXdj3IhiOdEgjtO++Qhdeu3t2SBhT2xFnlJ3gXC2q/aY1a2C7BYdlSxtw0JU nKpxvBTsB7rINGcYxhXZlckui5YLL4BX11XqsYhUTMC+33vxE5HNty1ANc1+SNyO 0iHp0yc5J6MCLuiZ/2sBek2tC+KHCufb+qEIqPmBpcWPJRT8HjginLxj/HyL2TQc pLF9XxhYKvv0sm3Zr3Ima5kqWgayph3XhQ73hKs9f7SLfErr6qr4XaI8egZA4OTG 0QGmY/61UlAdsz5tUvIGRWYD5rqXyXIYnUprldPSQdeZ0o2GjX7T0GZ934O5uHfE Ne73GafGn8JaGxH9+AEHMJAVpkrzWR1wrExL3kGJ8NF40HlsYofIuhTkZqMKX3EH 7KfefSJW1NQAGeAEwjtvzhmUiM0cVoCWGd4COxX1G3oJ0o8gZ3RklDEA4Pa9C0rO pBW/KIckPpGieGvPaA3mqmXDjx6oOaxPi9wd5TniBHh43pgrASo= =KVce -----END PGP SIGNATURE----- Merge tag 'v1.2.1' into merge-release-1.2.1-branch Version 1.2.1	2021-11-22 10:47:04 -05:00
Tim Gross	fc1d4814d9	qemu: add `args_allowlist` to sandbox VM command line inputs The QEMU driver allows arbitrary command line options, but many of these options give access to host resources that operators may not want to expose such as devices. Add an optional allowlist to the plugin configuration so that operators can limit the resources for QEMU.	2021-11-19 11:11:52 -05:00
Tim Gross	7f6fca6db9	changelog batch (#11517 )	2021-11-17 11:24:32 -05:00
Tim Gross	e729133134	api: return 404 for alloc FS list/stat endpoints (#11482 ) * api: return 404 for alloc FS list/stat endpoints If the alloc filesystem doesn't have a file requested by the List Files or Stat File API, we currently return a HTTP 500 error with the expected "file not found" error message. Return a HTTP 404 error instead. * update FS Handler Previously the FS handler would interpret a 500 status as a 404 in the adapter layer by checking if the response body contained the text or is the response status was 500 and then throw an error code for 404. Co-authored-by: Jai Bhagat <jaybhagat841@gmail.com>	2021-11-17 11:15:07 -05:00
Tim Gross	863486ffb0	deps: update go-getter to 1.5.9 (#11481 ) go-getter 1.5.9 includes a patch in 1.5.6 that automatically unpacks uncompressed tar archives. Previously Nomad only unpacked compressed archives, but documented that it unpacked all archives.	2021-11-17 11:14:44 -05:00
James Rasell	519851cf1a	changelog: add entry for #11504	2021-11-15 12:01:52 +01:00
Dave May	3c04d7927b	cli: refactor operator debug capture (#11466 ) * debug: refactor Consul API collection * debug: refactor Vault API collection * debug: cleanup test timing * debug: extend test to multiregion * debug: save cmdline flags in bundle * debug: add cli version to output * Add changelog entry	2021-11-05 19:43:10 -04:00
Tim Gross	73e3b15305	build: bump go version to 1.17.3 (#11461 )	2021-11-05 15:34:24 -04:00
James Rasell	99955eb80f	Merge pull request #11426 from hashicorp/b-set-dereg-eval-priority-correctly rpc: set the deregistration eval priority to the job priority.	2021-11-05 15:53:10 +01:00
James Rasell	2cc661c523	Merge pull request #11429 from hashicorp/b-set-scale-eval-priority-correctly rpc: set the job scale eval priority to the job priority.	2021-11-05 15:52:31 +01:00
Alessandro De Blasis	07c670fdc0	cli: show `host_network` in `nomad status` (#11432 ) Enhance the CLI in order to return the host network in two flavors (default, verbose) of the `node status` command. Fixes: #11223. Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2021-11-05 09:02:46 -04:00
Florian Apolloner	ef88795af3	Added a `-hcl2-strict` flag to allow for lenient hcl variable parsing. (#11284 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2021-11-04 16:33:09 +01:00
James Rasell	674761436e	Merge pull request #11165 from hashicorp/b-gh-11149 jobspec2: ensure consistent error handling between var-file & var.	2021-11-04 16:24:00 +01:00
James Rasell	4125e13698	changelog: add entry for #11165	2021-11-04 15:35:02 +01:00
James Rasell	2b866b1d34	changelog: fixup entry extension for #11167	2021-11-04 15:28:34 +01:00
Michael Schurter	3718557041	Merge pull request #11416 from hashicorp/f-rejected-info core: bump rejected plans from debug -> info	2021-11-03 16:49:28 -07:00
Michael Schurter	ef3fc79225	Merge pull request #11334 from hashicorp/f-chroot-skip-allocdir client: never embed alloc_dir in chroot	2021-11-03 16:48:09 -07:00
Charlie Voiselle	71643263a6	Parse `job > group > consul` block in HCL1 (#11423 )	2021-11-03 13:49:32 -04:00
Luiz Aoqui	5d204c8ced	Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )" (#11433 )	2021-11-02 17:42:52 -04:00
James Rasell	a2176474a5	changelog: add entry for #11429	2021-11-02 12:58:10 +01:00
James Rasell	4803eb9d88	changelog: add entry for #11426	2021-11-02 11:43:13 +01:00
James Rasell	c071efbd6b	Merge pull request #11411 from hashicorp/f-gh-11406 cli: add json and template flag opts to acl bootstrap command.	2021-11-02 09:48:25 +01:00
Charlie Voiselle	29e7d46dd9	Making RPC Upgrade mode reloadable. (#11144 ) - Making RPC Upgrade mode reloadable. - Add suggestions from code review - remove spurious comment - switch to require(t,...) form for test. - Add to changelog	2021-11-01 16:30:53 -04:00
Luiz Aoqui	655ac2719f	Allow using specific object ID on diff (#11400 )	2021-11-01 15:16:31 -04:00
Michael Schurter	efe5714840	core: bump rejected plans from debug -> info As we have continued to see reports of #9506 we need to elevate this log line as it is the only way to detect when plans are being erroneously rejected. Users who see this log line repeatedly should drain and restart the node in the log line. This seems to workaorund the issue. Please post any details on #9506!	2021-10-31 12:51:42 -07:00
James Rasell	30ad7985b2	changelog: add entry for #11411 .	2021-10-29 09:08:10 +02:00
Dave May	509c74ce19	debug: update default node-id and docs (#11398 ) * debug: default node-id to all * debug: align cli help and website documentation	2021-10-27 13:43:56 -04:00
Mahmood Ali	cdddd64a42	logging: Log the cause behind agent startup failure (#11353 ) Log the failure error when the agent fails to start. Previously, the agent startup failure error would be emitted to the command UI but not logged. So it doesn't get emitted to syslog or `log_file` if they are set, and it makes debugging much harder. Also, logging the error again before exit makes the error more visible: previously, the operator needed to scroll to the top to find the error. On a sample failure, the output will look like: ``` ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation. ==> Loaded configuration from sample-configs/config-bad ==> Starting Nomad agent... ==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system 2021-10-20T14:38:51.179-0400 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system" ``` This change adds the final `ERROR` message. It's easy to miss the `==> Error starting agent` above.	2021-10-27 10:41:17 -07:00
Mahmood Ali	daf20f9788	vault: set JobID in Vault metadata (#11397 ) Closes: #11395 .	2021-10-27 07:20:29 -07:00
Mahmood Ali	e06ff1d613	scheduler: stop allocs in unrelated nodes (#11391 ) The system scheduler should leave allocs on draining nodes as-is, but stop node stop allocs on nodes that are no longer part of the job datacenters. Previously, the scheduler did not make the distinction and left system job allocs intact if they are already running. I've added a failing test first, which you can see in https://app.circleci.com/jobs/github/hashicorp/nomad/179661 . Fixes https://github.com/hashicorp/nomad/issues/11373	2021-10-27 07:04:13 -07:00
Mahmood Ali	f03d65062d	Fix arm64 panics by updating google/snappy library to latest, 0.0.4 (#11396 ) Pick up https://github.com/golang/snappy/pull/56 to handle arm64 architectures to fix panics. tldr; Golang 1.16 changed `memmove` implementation for arm64 requiring additional cpu registers that snappy wasn't preserving in its assembly implementation. Other projects have experienced this issue as well, searching for `encode_arm64.s:666` on your favorite search engine will reveal some. Vault updated the dependency earlier this August: https://github.com/hashicorp/vault/pull/12371 . I believe this issue affects Nomad 1.2.x and 1.1.x. Nomad 1.0.x use Golang 1.15 and isn't affected. However, backporting the change to 1.0.x should be harmless. Fixed https://github.com/hashicorp/nomad/issues/11385 .	2021-10-27 06:39:16 -07:00
Luiz Aoqui	b463715a98	prevent active log from being overwritten when agent starts (#11386 )	2021-10-26 20:57:07 -04:00
Luiz Aoqui	3c22fc79a5	add dispatch idempotency token support in the CLI (#10930 )	2021-10-22 12:39:05 -04:00
Luiz Aoqui	2c7bfb7000	ui: persist node drain settings (#11368 )	2021-10-22 10:51:31 -04:00
Luiz Aoqui	dc5222f6e5	ui: display Nomad version in the Clients and Servers table (#11366 )	2021-10-22 10:33:06 -04:00
Luiz Aoqui	b73ecf684b	ui: update favicon (#11371 )	2021-10-22 09:40:38 -04:00
Luiz Aoqui	6853bf9632	cli: allow setting namespace and region in the `nomad ui` command (#11364 )	2021-10-21 16:24:39 -04:00
Luiz Aoqui	362c8c54f4	ui: set * as the default namespace selector (#11357 )	2021-10-21 10:24:07 -04:00
Luiz Aoqui	dceeccfc5d	ui: add client name tooltip when displaying client ID in tables (#11358 )	2021-10-21 10:23:06 -04:00
Mahmood Ali	e992ebf58d	document GH-11346 fix (#11350 )	2021-10-20 22:03:19 -04:00
Michael Schurter	081cfb85d7	docs: add #11331 to changelog	2021-10-19 16:30:06 -07:00
Michael Schurter	d25b60a82d	docs: add #11334 to changelog	2021-10-18 09:22:01 -07:00
Luiz Aoqui	1bd9db3df0	changlog: add entry for #10796 (#11312 )	2021-10-14 09:01:43 -04:00
James Rasell	444d25db07	Merge pull request #11280 from benbuzbee/log-err Log error if there are no event handlers registered	2021-10-14 14:49:22 +02:00
Mahmood Ali	d5e136b82b	executor: set CpuWeight in cgroup-v2 (#11287 ) Cgroup-v2 uses `cpu.weight` property instead of cpu shares: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu-interface-files . And it uses a different range (i.e. `[1, 10000]`) from cpu.shares (i.e. `[2, 262144]`) to make things more interesting. Luckily, the libcontainer provides a helper function to perform the conversion [`ConvertCPUSharesToCgroupV2Value`](https://pkg.go.dev/github.com/opencontainers/runc@v1.0.2/libcontainer/cgroups#ConvertCPUSharesToCgroupV2Value). I have confirmed that docker/libcontainer performs the conversion as well in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/specconv/spec_linux.go#L536-L541 , and that CpuShares is ignored by libcontainer in https://github.com/opencontainers/runc/blob/v1.0.2/libcontainer/cgroups/fs2/cpu.go#L24-L29 .	2021-10-14 08:46:07 -04:00
Luiz Aoqui	536a5751ff	changelog: add entries for #9160 and #11078 (#11290 )	2021-10-14 08:43:36 -04:00
Charlie Voiselle	cb8e52b5df	Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )	2021-10-13 21:23:13 -04:00
Michael Schurter	59fda1894e	Merge pull request #11167 from a-zagaevskiy/master Support configurable dynamic port range	2021-10-13 16:47:38 -07:00
Dave May	c37a6ed583	cli: rename paths in debug bundle for clarity (#11307 ) * Rename folders to reflect purpose * Improve captured files test coverage * Rename CSI plugins output file * Add changelog entry * fix test and make changelog message more explicit Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2021-10-13 18:00:55 -04:00
Dave May	305e8e98bf	cli: Improved autocomplete support for job dispatch and operator debug (#11270 ) * Add autocomplete to nomad job dispatch * Add autocomplete to nomad operator debug * Update incorrect comment * Update test to verify autocomplete * Add changelog * Apply lint suggestions * Create dynamic slices instead of specific length * Align style across predictors	2021-10-12 20:01:54 -04:00
Dave May	2d14c54fa0	debug: Improve namespace and region support (#11269 ) * Include region and namespace in CLI output * Add region and prefix matching for server members * Add namespace and region API outputs to cluster metadata folder * Add region awareness to WaitForClient helper function * Add helper functions for SliceStringHasPrefix and StringHasPrefixInSlice * Refactor test client agent generation * Add tests for region * Add changelog	2021-10-12 16:58:41 -04:00
Florian Apolloner	511cae92b4	Fixed plan diffing to handle non-unique service names. (#10965 )	2021-10-12 16:42:39 -04:00
Dave May	76b05f3cd2	cli: Add nomad job allocs command (#11242 )	2021-10-12 16:30:36 -04:00
Luiz Aoqui	3e0bad5a41	wrap `log` messages with `hclog` (#11291 )	2021-10-12 14:38:44 -04:00
Ben Buzbee	573fb840fa	Log error if there are no event handlers registered We see this error all the time ``` no handler registered for event event.Message=, event.Annotations=, event.Timestamp=0001-01-01T00:00:00Z, event.TaskName=, event.AllocID=, event.TaskID=, ``` So we're handling an even with all default fields. I noted that this can happen if only err is set as in ``` func (d driverPluginClient) handleTaskEvents(reqCtx context.Context, ch chan TaskEvent, stream proto.Driver_TaskEventsClient) { defer close(ch) for { ev, err := stream.Recv() if err != nil { if err != io.EOF { ch <- &TaskEvent{ Err: grpcutils.HandleReqCtxGrpcErr(err, reqCtx, d.doneCtx), } } ``` In this case Err fails to be serialized by the logger, see this test ``` ev := &drivers.TaskEvent{ Err: fmt.Errorf("errz"), } i.logger.Warn("ben test", "event", ev) i.logger.Warn("ben test2", "event err str", ev.Err.Error()) i.logger.Warn("ben test3", "event err", ev.Err) ev.Err = nil i.logger.Warn("ben test4", "nil error", ev.Err) 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.643900Z","driver":"mock_driver","event":{"TaskID":"","TaskName":"","AllocID":"","Timestamp":"0001-01-01T00:00:00Z","Message":"","Annotations":null,"Err":{}}} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test2","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644226Z","driver":"mock_driver","event err str":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test3","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644240Z","driver":"mock_driver","event err":"errz"} 2021-10-06T22:37:56.736Z INFO nomad.stdout {"@level":"warn","@message":"ben test4","@module":"client.driver_mgr","@timestamp":"2021-10-06T22:37:56.644252Z","driver":"mock_driver","nil error":null} ``` Note in the first example err is set to an empty object and the error is lost. What we want is the last two examples which call out the err field explicitly so we can see what it is in this case	2021-10-11 19:44:52 +00:00
James Rasell	6f3a6f5ccf	Merge pull request #11283 from hashicorp/f-update-hclog-dep deps: update hashicorp/go-hclog to v1.0.0	2021-10-11 08:39:41 +02:00
James Rasell	7200858cca	changelog: add entry for #11283	2021-10-07 08:16:05 +01:00
Matt Mukerjee	b56432e645	Add FailoverHeartbeatTTL to config (#11127 ) FailoverHeartbeatTTL is the amount of time to wait after a server leader failure before considering reallocating client tasks. This TTL should be fairly long as the new server leader needs to rebuild the entire heartbeat map for the cluster. In deployments with a small number of machines, the default TTL (5m) may be unnecessary long. Let's allow operators to configure this value in their config files.	2021-10-06 18:48:12 -04:00
Mahmood Ali	48aa6e26e9	executor: suppress spurious log messages (#11273 ) Suppress stats streaming error log messages when task finishes. Streaming errors are expected when a task finishes and they aren't actionable to users. Also, note that the task runner Stats hook retries collecting stats after a delay. If the connection terminates prematurely, it will be retried, and closing the stats stream is not very disruptive. Ideally, executor terminates cleanly when task exits, but that's a more substantial change that may require changing the executor/drivers interface. Fixes #10814	2021-10-06 12:42:35 -04:00
Florian Apolloner	709c1a2947	Fixed creation of ControllerCreateVolumeRequest. (#11238 )	2021-10-06 10:17:39 -04:00
Florian Apolloner	0fa60dae9d	Added support for `-force-color` to the CLI. (#10975 )	2021-10-06 10:02:42 -04:00
Yan	6ff0b6debc	add `-show-url` option for `ui` command (#11213 )	2021-10-05 20:08:42 -04:00
Mahmood Ali	f4b92c609e	add changelog	2021-10-05 13:01:19 -04:00
Mahmood Ali	583b9f2506	Merge pull request #11089 from hashicorp/b-cve-2021-37218 Apply authZ for nomad Raft RPC layer	2021-10-05 08:49:21 -04:00
Luiz Aoqui	0a62bdc3c5	fix panic when Connect mesh gateway doesn't have a proxy block (#11257 ) Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2021-10-04 15:52:07 -04:00
Mahmood Ali	8b2ce4e353	Merge pull request #11251 from hashicorp/f-golang-1.17.1 Upgrade Golang to 1.17.1	2021-10-04 13:52:00 -04:00
Mahmood Ali	d78fb265ae	update docs and changelog	2021-10-04 13:50:42 -04:00
James Rasell	9ad89a9b59	changelog: add entry for #11249	2021-10-01 12:50:51 +01:00
Michael Schurter	50a48aa992	docs: add #11167 to changelog	2021-09-30 17:06:38 -07:00
Luiz Aoqui	a7698dedba	Disable PowerShell profile and simplify fingerprinting link speed on Windows (#11183 )	2021-09-22 11:17:47 -04:00
Michael Schurter	4ad0c258b9	client: add NOMAD_LICENSE to default env deny list By default we should not expose the NOMAD_LICENSE environment variable to tasks. Also refactor where the DefaultEnvDenyList lives so we don't have to maintain 2 copies of it. Since client/config is the most obvious location, keep a reference there to its unfortunate home buried deep in command/agent/host. Since the agent uses this list as well for the /agent/host endpoint the list must be accessible from both command/agent and client.	2021-09-21 13:51:17 -07:00
James Rasell	3cba21718e	changelog: add entry for #11206	2021-09-20 18:05:42 +01:00
Florian Apolloner	7805b8edf4	Fixed usage of NOMAD_CLI_NO_COLOR env variable. (#11168 )	2021-09-17 20:37:05 -04:00
Michael Schurter	ebf0bca5f8	docs: add changelog entry for audit log naming	2021-09-16 16:21:57 -07:00
Luiz Aoqui	edd32ba571	Log network device name during fingerprinting (#11184 )	2021-09-16 10:48:31 -04:00
Luiz Aoqui	1035805a42	connect: update allowed protocols in ingress gateway config (#11187 )	2021-09-16 10:47:53 -04:00
James Rasell	da8bd5612d	changelog: add entry for #11173 .	2021-09-15 11:44:10 +02:00
Luiz Aoqui	bbae221c8c	deps: update go-memdb to 1.3.2 (#11185 )	2021-09-14 20:26:45 -04:00
Michael Schurter	7035c94320	Merge pull request #11111 from hashicorp/b-system-no-match scheduler: warn when system jobs cannot place an alloc	2021-09-13 16:06:04 -07:00
Michael Schurter	d32d0326e8	docs: focus changelog entry for #11111 on the ux While I don't think this fully encompasses the changes, other bits like marking sysbatch as dead immediately are new so haven't changed from a previous release.	2021-09-10 16:45:43 -07:00
James Rasell	686189aade	Merge pull request #11143 from hashicorp/b-gh-11026 deps: update go-plugin to v1.4.3 to fix Windows handle leak.	2021-09-09 09:39:22 +02:00
Luiz Aoqui	4dd8b6b571	cli: include all possible scores in alloc status metric table (#11128 )	2021-09-08 17:30:11 -04:00
Luiz Aoqui	305f0b5702	ui: set the job namespace when redirecting after the job is dispatched (#11141 )	2021-09-07 12:27:33 -04:00
James Rasell	fa149744a9	changelog: add entry for #11143 .	2021-09-07 09:51:17 +02:00
Isabel Suchanek	ab51050ce8	events: fix wildcard namespace handling (#10935 ) This fixes a bug in the event stream API where it currently interprets namespace=* as an actual namespace, not a wildcard. When Nomad parses incoming requests, it sets namespace to default if not specified, which means the request namespace will never be an empty string, which is what the event subscription was checking for. This changes the conditional logic to check for a wildcard namespace instead of an empty one. It also updates some event tests to include the default namespace in the subscription to match current behavior. Fixes #10903	2021-09-02 09:36:55 -07:00
Luiz Aoqui	12f5f3ae90	changelog: add entry for #11111	2021-09-02 12:13:42 -04:00
Luiz Aoqui	eb0ed980a5	ui: set namespace when looking for and displaying children jobs (#11110 )	2021-09-01 14:40:25 -04:00
Mahmood Ali	35ff41c266	link to cve listing in changelog Co-authored-by: Kent 'picat' Gruber <kent@hashicorp.com>	2021-08-27 10:42:39 -04:00
Mahmood Ali	ff7c1ca79b	Apply authZ for nomad Raft RPC layer When mTLS is enabled, only nomad servers of the region should access the Raft RPC layer. Clients and servers in other regions should only use the Nomad RPC endpoints. Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>	2021-08-26 15:10:07 -04:00
Mahmood Ali	641afebeed	update golang to 1.16.7 (#11083 )	2021-08-25 11:56:46 -04:00
Roopak Venkatakrishnan	dcf5981bcd	Update x/sys to support go 1.17 (#11065 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2021-08-25 17:23:01 +02:00
Luiz Aoqui	104d29e808	Don't timestamp active log file (#11070 ) * don't timestamp active log file * website: update log_file default value * changelog: add entry for #11070 * website: add upgrade instructions for log_file in v1.14 and v1.2.0	2021-08-23 11:27:34 -04:00
Mahmood Ali	84a3522133	Consider all system jobs for a new node (#11054 ) When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.	2021-08-18 09:50:37 -04:00
Michael Schurter	a7aae6fa0c	Merge pull request #10848 from ggriffiths/listsnapshot_secrets CSI Listsnapshot secrets support	2021-08-10 15:59:33 -07:00
Mahmood Ali	ea003188fa	system: re-evaluate node on feasibility changes (#11007 ) Fix a bug where system jobs may fail to be placed on a node that initially was not eligible for system job placement. This changes causes the reschedule to re-evaluate the node if any attribute used in feasibility checks changes. Fixes https://github.com/hashicorp/nomad/issues/8448	2021-08-10 17:17:44 -04:00
Mahmood Ali	bfc766357e	deployments: canary=0 is implicitly autopromote (#11013 ) In a multi-task-group job, treat 0 canary groups as auto-promote. This change fixes an edge case where Nomad requires a manual promotion, if the job had any group with canary=0 and rest of groups having auto_promote set. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2021-08-10 17:06:40 -04:00
Mahmood Ali	efcc8bf082	Speed up client startup and registration (#11005 ) Speed up client startup, by retrying more until the servers are known. Currently, if client fingerprinting is fast and finishes before the client connect to a server, node registration may be delayed by 15 seconds or so! Ideally, we'd wait until the client discovers the servers and then retry immediately, but that requires significant code changes. Here, we simply retry the node registration request every second. That's basically the equivalent of check if the client discovered servers every second. Should be a cheap operation. When testing this change on my local computer and where both servers and clients are co-located, the time from startup till node registration dropped from 34 seconds to 8 seconds!	2021-08-10 17:06:18 -04:00
Luiz Aoqui	c1d1906628	ui: add missing pipe separator in parameterized and periodic jobs (#11020 )	2021-08-10 13:48:20 -04:00
Jai	29a7fe6efa	Merge pull request #10666 from hashicorp/b-ui/search-namespaces ui: Fix fuzzy search namespace-handling	2021-08-10 13:13:20 -04:00
Jai Bhagat	a9b9132f35	edit hierarchy to lead with namespace before job	2021-08-10 10:35:36 -04:00
Luiz Aoqui	d283e90c35	ui: only dipslay "Dispatch Job" button on parameterized jobs (#11019 )	2021-08-09 17:49:08 -04:00
Michael Schurter	c39ca0773d	Merge pull request #10951 from hashicorp/b-cn-proxy consul/connect: avoid warn messages on connect proxy errors	2021-08-06 15:25:40 -07:00
James Rasell	a9a04141a3	consul/connect: avoid warn messages on connect proxy errors When creating a TCP proxy bridge for Connect tasks, we are at the mercy of either end for managing the connection state. For long lived gRPC connections the proxy could reasonably expect to stay open until the context was cancelled. For the HTTP connections used by connect native tasks, we experience connection disconnects. The proxy gets recreated as needed on follow up requests, however we also emit a WARN log when the connection is broken. This PR lowers the WARN to a TRACE, because these disconnects are to be expected. Ideally we would be able to proxy at the HTTP layer, however Consul or the connect native task could be configured to expect mTLS, preventing Nomad from MiTM the requests. We also can't mange the proxy lifecycle more intelligently, because we have no control over the HTTP client or server and how they wish to manage connection state. What we have now works, it's just noisy. Fixes #10933	2021-08-05 11:27:35 +02:00
James Rasell	c7449b4810	changelog: add entry for #10929	2021-08-05 10:48:36 +02:00
Luiz Aoqui	7341615fac	changelog: add entry for #10934 (#11001 )	2021-08-04 11:33:18 -04:00
Mahmood Ali	0bc12fba7c	Only initialize task.VolumeMounts when not-nil (#10990 ) 1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning. The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally. Fixes #10981	2021-08-02 13:08:10 -04:00
Mahmood Ali	22a91f7003	update changelog (#10963 )	2021-07-28 16:02:04 -04:00
Grant Griffiths	fecbbaee22	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00
Mahmood Ali	62fe6f12f9	api: revert to defaulting to http/1 (#10958 ) * api: revert to defaulting to http/1 PR #10778 incidentally changed the api http client to connect with HTTP/2 first. However, the websocket libraries used in `alloc exec` features don't handle http/2 well, and don't downgrade to http/1 gracefully. Given that the switch is incidental, and not requested by users. Furthermore, api consumers can opt-in to forcing http/2 by setting custom http clients. Fixes #10922	2021-07-28 11:21:53 -04:00
Michael Schurter	ea996c321d	Merge pull request #10916 from hashicorp/f-audit-log-mode Add audit log file mode config parameter	2021-07-27 12:16:37 -07:00
Michael Schurter	d64d70607a	docs: add changelog for #10916	2021-07-27 11:51:38 -07:00
Mahmood Ali	ac3cf10849	nomad: only activate one-time auth tokens with 1.1.0 (#10952 ) Fix a panic in handling one-time auth tokens, used to support `nomad ui --authenticate`. If the nomad leader is a 1.1.x with some servers running as 1.0.x, the pre-1.1.0 servers risk crashing and the cluster may lose quorum. That can happen when `nomad authenticate -ui` command is issued, or when the leader scans for expired tokens every 10 minutes. Fixed #10943 .	2021-07-27 13:17:55 -04:00
Mahmood Ali	d97927ebcf	cli: Use glint to determine if os.Stdout is tty (#10926 ) Use glint to determine if os.Stdout is a terminal. glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](`b492b545f6/renderer_term.go (L39-L46)`). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it. By using golint to perform the check, we eliminate the risk of mis-judgement.	2021-07-23 11:27:47 -04:00
Jai	0ccf60444d	Merge pull request #10893 from hashicorp/f-ui/namespace-acl-bug edit ember-can to add additional attribute for namespace	2021-07-22 12:57:34 -04:00
Jai Bhagat	5d33884cdc	ui: fixes #10885	2021-07-22 11:44:25 -04:00
Seth Hoenig	54d9bad657	Merge pull request #10904 from hashicorp/b-no-affinity-intern core: remove internalization of affinity strings	2021-07-22 09:09:07 -05:00
Luiz Aoqui	484037aff1	fix `nomad alloc signal` help message (#10917 )	2021-07-21 11:02:44 -04:00
Luiz Aoqui	a26874215a	changelog: add entry for #10675 (#10919 )	2021-07-21 10:05:48 -04:00
Mahmood Ali	8df9b1fd0f	client: avoid acting on stale data after launch (#10907 ) When the client launches, use a consistent read to fetch its own allocs, but allow stale read afterwards as long as reads don't revert into older state. This change addresses an edge case affecting restarting client. When a client restarts, it may fetch a stale data concerning its allocs: allocs that have completed prior to the client shutdown may still have "run/running" desired/client status, and have the client attempt to re-run again. An alternative approach is to track the indices such that the client set MinQueryIndex on the maximum index the client ever saw, or compare received allocs against locally restored client state. Garbage collection complicates this approach (local knowledge is not complete), and the approach still risks starting "dead" allocations (e.g. the allocation may have been placed when client just restarted and have already been reschuled by the time the client started. This approach here is effective against all kinds of stalness problems with small overhead.	2021-07-20 15:13:28 -04:00
Michael Schurter	efe8ea2c2c	Merge pull request #10849 from benbuzbee/benbuz/fix-destroy Don't treat a failed recover + successful destroy as a successful recover	2021-07-19 10:49:31 -07:00
Michael Schurter	6aee3de420	docs: add changelog entry for #10849	2021-07-16 15:58:58 -07:00
Seth Hoenig	ac5c83cafd	core: remove internalization of affinity strings Basically the same as #10896 but with the Affinity struct. Since we use reflect.DeepEquals for job comparison, there is risk of false positives for changes due to a job struct with memoized vs non-memoized strings. Closes #10897	2021-07-15 15:15:39 -05:00
Mahmood Ali	996ea1fa46	Merge pull request #10875 from hashicorp/b-namespace-flag-override cli: `-namespace` should override job namespace	2021-07-14 17:28:36 -04:00
Mahmood Ali	26509f2299	Merge pull request #10864 from hashicorp/b-10746-plan-datacenter scheduler: datacenter updates should be destructive	2021-07-14 17:25:13 -04:00
Seth Hoenig	3fce1d3f11	Merge pull request #10898 from hashicorp/f-rm-vendor build: no longer use vendor directory	2021-07-14 13:00:41 -05:00
Seth Hoenig	1b5f902842	docs: update changelog	2021-07-14 11:21:00 -05:00
Seth Hoenig	a4af3fcad0	docs: add changelog entry	2021-07-14 10:46:40 -05:00
James Rasell	66d3b98db5	Merge pull request #10892 from hashicorp/b-gh-10890 deps: update consul-template to v0.25.2.	2021-07-14 09:26:16 +02:00
Luiz Aoqui	dd8213abc1	changelog: add entry for GH-10563 (#10894 )	2021-07-13 16:12:41 -04:00

1 2 3 4 5 ...

270 commits