open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	c68ed3b4c8	client: protect user lookups with global lock (#14742 ) * client: protect user lookups with global lock This PR updates Nomad client to always do user lookups while holding a global process lock. This is to prevent concurrency unsafe implementations of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo). * cl: add cl	2022-09-29 09:30:13 -05:00
Derek Strickland	4c73a3b1dc	Remove changelog entry for test update PR	2022-09-27 18:17:49 -04:00
Derek Strickland	52e4997ace	Add enterprise tag	2022-09-27 17:50:25 -04:00
Derek Strickland	ef0f8c5b81	Add enterprise tag	2022-09-27 17:49:27 -04:00
Derek Strickland	6738684167	Delete 14665.txt	2022-09-27 17:47:35 -04:00
Derek Strickland	87bdb74221	Remove bug fix changelog files	2022-09-27 17:46:32 -04:00
Derek Strickland	cacf4bb8e1	Fix changelog entry type	2022-09-27 14:33:39 -04:00
Jim Razmus II	7da3fd050b	jobspec: allow artifact headers in HCLv1 (#14637 ) * jobspec: allow artifact headers in HCLv1 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-09-27 12:18:49 -04:00
Seth Hoenig	5df5e70542	core: numeric operands comparisons in constraints (#14722 ) * cleanup: fixup linter warnings in schedular/feasible.go * core: numeric operands comparisons in constraints This PR changes constraint comparisons to be numeric rather than lexical if both operands are integers or floats. Inspiration #4856 Closes #4729 Closes #14719 * fix: always parse as int64	2022-09-27 11:07:07 -05:00
Tim Gross	87681fca68	CSI: ensure initial unpublish state is checkpointed (#14675 ) A test flake revealed a bug in the CSI unpublish workflow, where an unpublish that comes from a client that's successfully done the node-unpublish step will not have the claim checkpointed if the controller-unpublish step fails. This will result in a delay in releasing the volume claim until the next GC. This changeset also ensures we're using a new snapshot after each write to raft, and fixes two timing issues in test where either the volume watcher can unpublish before the unpublish RPC is sent or we don't wait long enough in resource-restricted environements like GHA.	2022-09-27 08:43:45 -04:00
Michael Schurter	e6af1c0a14	fingerprint: add node attr for reserverable cores (#14694 ) * fingerprint: add node attr for reserverable cores Add an attribute for the number of reservable CPU cores as they may differ from the existing `cpu.numcores` due to client configuration or OS support. Hopefully clarifies some confusion in #14676 * add changelog * num_reservable_cores -> reservablecores	2022-09-26 13:03:03 -07:00
Luiz Aoqui	5c100c0d3d	client: recover from getter panics (#14696 ) The artifact getter uses the go-getter library to fetch files from different sources. Any bug in this library that results in a panic can cause the entire Nomad client to crash due to a single file download attempt. This change aims to guard against this types of crashes by recovering from panics when the getter attempts to download an artifact. The resulting panic is converted to an error that is stored as a task event for operator visibility and the panic stack trace is logged to the client's log.	2022-09-26 15:16:26 -04:00
Luiz Aoqui	f7c6534a79	cli: set content length on `operator api` requests (#14634 ) http.NewRequestWithContext will only set the right value for Content-Length if the input is bytes.Buffer, bytes.Reader, or *strings.Reader [0]. Since os.Stdin is an os.File, POST requests made with the `nomad operator api` command would always have Content-Length set to -1, which is interpreted as an unknown length by web servers. [0]: https://pkg.go.dev/net/http#NewRequestWithContext	2022-09-26 14:21:40 -04:00
Phil Renaud	497bd02169	[ui] Warn users when they leave an edited but unsaved variable page (#14665 ) * Warning on attempt to leave * Lintfix * Only router.off once * Dont warn on transition when only updating queryparams * Remove double-push and queryparam-only issues, thanks @lgfa29 * Acceptance tests * Changelog	2022-09-23 16:53:40 -04:00
Phil Renaud	a28e1bcc1e	[ui] Service Healthchecks: styles for pseudo-timestamp axis (#14677 ) * Styles for pseudo-timestamp axis * Changelog	2022-09-23 16:53:28 -04:00
Tim Gross	17aee4d69c	fingerprint: don't clear Consul/Vault attributes on failure (#14673 ) Clients periodically fingerprint Vault and Consul to ensure the server has updated attributes in the client's fingerprint. If the client can't reach Vault/Consul, the fingerprinter clears the attributes and requires a node update. Although this seems like correct behavior so that we can detect intentional removal of Vault/Consul access, it has two serious failure modes: (1) If a local Consul agent is restarted to pick up configuration changes and the client happens to fingerprint at that moment, the client will update its fingerprint and result in evaluations for all its jobs and all the system jobs in the cluster. (2) If a client loses Vault connectivity, the same thing happens. But the consequences are much worse in the Vault case because Vault is not run as a local agent, so Vault connectivity failures are highly correlated across the entire cluster. A 15 second Vault outage will cause a new `node-update` evalution for every system job on the cluster times the number of nodes, plus one `node-update` evaluation for every non-system job on each node. On large clusters of 1000s of nodes, we've seen this create a large backlog of evaluations. This changeset updates the fingerprinting behavior to keep the last fingerprint if Consul or Vault queries fail. This prevents a storm of evaluations at the cost of requiring a client restart if Consul or Vault is intentionally removed from the client.	2022-09-23 14:45:12 -04:00
Derek Strickland	6874997f91	scheduler: Fix bug where the would treat multiregion jobs as paused for job types that don't use deployments (#14659 ) * scheduler: Fix bug where the scheduler would treat multiregion jobs as paused for job types that don't use deployments Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-09-22 14:31:27 -04:00
Jorge Marey	92158a1c62	connect: add nomad env to envoy bootstrap (#12959 ) * Add nomad env to envoy bootstrap * Add changelog file	2022-09-22 13:18:18 -05:00
Phil Renaud	eca0e7bf56	[ui] task logs in sidebar (#14612 ) * button styles * Further styles including global toggle adjustment * sidebar funcs and header * Functioning task logs in high-level sidebars * same-lineify the show tasks toggle * Changelog * Full-height sidebar calc in css, plz drop soon container queries * Active status and query params for allocations page * Reactive shouldShowLogs getter and added to client and task group pages * Higher order func passing, thanks @DingoEatingFuzz * Non-service job types get allocation params passed * Keyframe animation for task log sidebar * Acceptance test * A few more sub-row tests * Lintfix	2022-09-22 10:58:52 -04:00
Tim Gross	c29c4bd66c	cli: remove deprecated `eval status -json` list behavior (#14651 ) In Nomad 1.2.6 we shipped `eval list`, which accepts a `-json` flag, and deprecated the usage of `eval status` without an evaluation ID with an upgrade note that it would be removed in Nomad 1.4.0. This changeset completes that work.	2022-09-22 10:56:32 -04:00
Jorge Marey	584ddfe859	Add Namespace, Job and Group to envoy stats (#14311 )	2022-09-22 10:38:21 -04:00
Tim Gross	d327a68696	operator debug: write NDJSON for large collections (#14610 ) The `operator debug` command writes JSON files from API responses as a single line containing an array of JSON objects. But some of these files can be extremely large (GB's) for large production clusters, which makes it difficult to parse them using typical line-oriented Unix command line tools that can stream their inputs without consuming a lot of memory. For collections that are typically large, instead emit newline-delimited JSON. This changeset includes some first-pass refactoring of this command. It breaks up monolithic methods that validate a path, create a file, serialize objects, and write them to disk into smaller functions, some of which can now be standalone to take advantage of generics.	2022-09-22 10:02:00 -04:00
James Rasell	a25028c412	cli: fix a bug in operator API when setting HTTPS via address. (#14635 ) Operators may have a setup whereby the TLS config comes from a source other than setting Nomad specific env vars. In this case, we should attempt to identify the scheme using the config setting as a fallback.	2022-09-22 15:43:58 +02:00
Luiz Aoqui	ad48401219	chore: move changelog file to the right folder (#14639 )	2022-09-21 13:50:22 -04:00
Tim Gross	38a6e7e343	remove 1.4.0 changelog entry that refers to bugfix on new code (#14611 ) Bug fixes on new features in Nomad 1.4.0 don't need or want changelog entries in the same changelog the feature appeared, so remove this one.	2022-09-16 16:14:02 -04:00
Phil Renaud	d6c9676252	Added task links to various alloc tables (#14592 ) * Added task links to various alloc tables * Lintfix * Border collapse and added to task group page * Logs icon temporarily removed and localStorage added * Mock task added to test * Delog * Two asserts in new test * Remove commented-out code * Changelog * Removing args.allocation deps	2022-09-16 15:58:22 -04:00
Phil Renaud	cebfbb0c28	Stabilizing percy snapshots with faker (#14551 ) * First attempt at stabilizing percy snapshots with faker * Tokens seed moved to before management token generation * Faker seed only in token test * moving seed after storage clear * And again, but back to no faker seeding * Isolated seed and temporary log * Setting seed(1) wherever we're snapshotting, or before establishing cluster scenarios * Deliberate noop to see if percy is stable * Changelog entry	2022-09-14 11:27:48 -04:00
Mahmood Ali	a9d5e4c510	scheduler: stopped-yet-running allocs are still running (#10446 ) * scheduler: stopped-yet-running allocs are still running * scheduler: test new stopped-but-running logic * test: assert nonoverlapping alloc behavior Also add a simpler Wait test helper to improve line numbers and save few lines of code. * docs: tried my best to describe #10446 it's not concise... feedback welcome * scheduler: fix test that allowed overlapping allocs * devices: only free devices when ClientStatus is terminal * test: output nicer failure message if err==nil Co-authored-by: Mahmood Ali <mahmood@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-09-13 12:52:47 -07:00
Tim Gross	eb757606f3	changelog entry for variables (#14509 )	2022-09-13 10:25:26 -04:00
Derek Strickland	5ca934015b	job_endpoint: check spec for all regions (#14519 ) * job_endpoint: check spec for all regions	2022-09-12 09:24:26 -04:00
James Rasell	009948186b	changelog: add entry for #14320 (#14518 )	2022-09-09 17:25:50 +02:00
James Rasell	f51a8c73e6	deps: update armon/go-metrics to v0.4.1 (#14493 )	2022-09-09 09:20:55 +02:00
Charlie Voiselle	e58998e218	Add client scheduling eligibility to heartbeat (#14483 )	2022-09-08 14:31:36 -04:00
Tim Gross	3fc7482ecd	CSI: failed allocation should not block its own controller unpublish (#14484 ) A Nomad user reported problems with CSI volumes associated with failed allocations, where the Nomad server did not send a controller unpublish RPC. The controller unpublish is skipped if other non-terminal allocations on the same node claim the volume. The check has a bug where the allocation belonging to the claim being freed was included in the check incorrectly. During a normal allocation stop for job stop or a new version of the job, the allocation is terminal. But allocations that fail are not yet marked terminal at the point in time when the client sends the unpublish RPC to the server. For CSI plugins that support controller attach/detach, this means that the controller will not be able to detach the volume from the allocation's host and the replacement claim will fail until a GC is run. This changeset fixes the conditional so that the claim's own allocation is not included, and makes the logic easier to read. Include a test case covering this path. Also includes two minor extra bugfixes: * Entities we get from the state store should always be copied before altering. Ensure that we copy the volume in the top-level unpublish workflow before handing off to the steps. * The list stub object for volumes in `nomad/structs` did not match the stub object in `api`. The `api` package also did not include the current readers/writers fields that are expected by the UI. True up the two objects and add the previously undocumented fields to the docs.	2022-09-08 13:30:05 -04:00
Seth Hoenig	a608e7950e	helper: guard against negative inputs into random stagger This PR modifies RandomStagger to protect against negative input values. If the given interval is negative, the value returned will be somewhere in the stratosphere. Instead, treat negative inputs like zero, returning zero.	2022-09-08 09:17:48 -05:00
Michael Schurter	7ff0290f8b	docs: add quota panic fix changelog entry (#14485 ) See https://github.com/hashicorp/nomad-enterprise/pull/839 for original (Enterprise only)	2022-09-07 17:04:46 -07:00
Phil Renaud	52bb5de25a	Changelog added and unused tests removed	2022-09-07 10:31:39 -04:00
Luiz Aoqui	358ba279d0	ui: remove extra space in menu footer (#14457 )	2022-09-06 16:53:17 -04:00
James Rasell	813c5daa96	hcl2: add strlen function and update docs. (#14463 )	2022-09-06 18:42:40 +02:00
Tim Gross	6ff59e71a5	cli: remove network from `quota status` output (#14468 ) Network quotas were removed in Nomad 1.0.4. Remove the fields no longer in use from the `quota status` output.	2022-09-06 09:37:16 -04:00
Kellen Fox	5086368a1e	Add a log line to help track node eligibility (#14125 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-09-06 14:03:33 +02:00
Yan	6e927fa125	warn destructive update only when count > 1 (#13103 )	2022-09-02 15:30:06 -04:00
Giovani Avelar	b5cf358212	[ui] Show a different message when there are no tasks in a job (#14071 ) Different mesage when there are not tasks in a job	2022-09-02 15:20:45 -04:00
Tiernan	98022376be	Fix error handling in Client consulDiscoveryImpl (#14431 ) Added a missing `continue` on non-nil error to avoid accidentally using a bad peer.	2022-09-02 15:13:03 -04:00
Luiz Aoqui	1ae26981a0	connect: interpolate task env in config values (#14445 ) When configuring Consul Service Mesh, it's sometimes necessary to provide dynamic value that are only known to Nomad at runtime. By interpolating configuration values (in addition to configuration keys), user are able to pass these dynamic values to Consul from their Nomad jobs.	2022-09-02 15:00:28 -04:00
Tim Gross	7921f044e5	migrate autopilot implementation to raft-autopilot (#14441 ) Nomad's original autopilot was importing from a private package in Consul. It has been moved out to a shared library. Switch Nomad to use this library so that we can eliminate the import of Consul, which is necessary to build Nomad ENT with the current version of the Consul SDK. This also will let us pick up autopilot improvements shared with Consul more easily.	2022-09-01 14:27:10 -04:00
Luiz Aoqui	94d7dddccd	cli: set -hcl2-strict to false if -hcl1 is defined (#14426 ) These options are mutually exclusive but, since `-hcl2-strict` defaults to `true` users had to explicitily set it to `false` when using `-hcl1`. Also return `255` when job plan fails validation as this is the expected code in this situation.	2022-09-01 10:42:08 -04:00
Luiz Aoqui	19de803503	cli: ignore VaultToken when generating job diff (#14424 )	2022-09-01 10:01:53 -04:00
dependabot[bot]	9f8a3824c4	build(deps): bump github.com/hashicorp/go-version from 1.4.0 to 1.6.0 (#14364 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-09-01 11:55:42 +02:00
Luiz Aoqui	6f5d3e724f	changelog: add entry for #14374 (#14419 )	2022-08-31 10:59:19 -04:00

1 2 3 4 5 ...

532 Commits