open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	322ecc858f	client: defer `nobody` user lookup so Windows doesn't panic (#14790 ) In #14742 we introduced a cached lookup of the `nobody` user, which is only ever called on Unixish machines. But the initial caching was being done in an `init` block, which meant it was being run on Windows as well. This prevents the Nomad agent from starting on Windows. An alternative fix here would be to have a separate `init` block for Windows and Unix, but this potentially masks incorrect behavior if we accidentally added a call to the `Nobody()` method on Windows later. This way we're forced to handle the error in the caller.	2022-10-04 11:52:12 -04:00
Tim Gross	341dc84a77	variables: use correct URL in ref to docs (#14792 )	2022-10-04 11:30:49 -04:00
Tim Gross	2f3d4f51e6	deps: remove gophers.dev dependency (#14789 )	2022-10-04 09:49:50 -04:00
Tim Gross	a3ff23608c	deps: use install from current HEAD for `hc-install` (#14786 ) The `hc-install` tool we're using needed a patch for a specific bug, but that's since been merged. We definitely want to switch to using a standard release from that project once one is shipped with the CLI, but pinning to HEAD should keep us for now.	2022-10-04 08:22:30 -04:00
Michael Schurter	ed3218c3dd	Fixing flaky TestOverlap test (#14780 ) * test: ensure feasible node selected in overlap test * test: warn when getting close to retry limit	2022-10-03 14:35:02 -07:00
Elijah Voigt	0a80a58394	Docs(job-specification/periodic): Add enabled toggle (#14767 ) This is probably undocumented for a reason, but the `enabled` toggle in the `periodic` stanza is very useful so I figured I try adding it to the docs. The feature has been secretly avaliable since #9142 and was called out in that PR as being a dubious addition, only added to avoid regressions. The use case for disabling a periodic job in this way is to prevent it from running without modifying the schedule. Ideally Nomad would make it more clear that this was the case, and allow you to force a run of the job, but even with those rough edges I think users would benefit from knowing about this toggle.	2022-10-03 15:08:24 -04:00
Tim Gross	2a6e8be6ba	internals documentation with diagrams (#14750 ) This changeset adds new architecture internals documents to the contributing guide. These are intentionally here and not on the public-facing website because the material is not required for operators and includes a lot of diagrams that we can cheaply maintain with mermaid syntax but would involve art assets to have up on the main site that would become quickly out of date as code changes happen and be extremely expensive to maintain. However, these should be suitable to use as points of conversation with expert end users. Included: * A description of Evaluation triggers and expected counts, with examples. * A description of Evaluation states and implicit states. This is taken from an internal document in our team wiki. * A description of how writing the State Store works. This is taken from a diagram I put together a few months ago for internal education purposes. * A description of Evaluation lifecycle, from registration to running Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but broken into digestible chunks and without multi-region deployments, which I'd like to cover in a future doc. Also includes adding Deployments to our public-facing glossary. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2022-10-03 14:06:41 -04:00
Jai	bd8d023ee5	refact: only reload job if job has no taskGroups (#14760 )	2022-09-30 16:14:40 -04:00
dependabot[bot]	9ce74c83e6	build(deps-dev): bump @hashicorp/platform-cli in /website (#14541 ) Bumps [@hashicorp/platform-cli](https://github.com/hashicorp/web-platform-packages/tree/HEAD/packages/cli) from 2.1.0 to 2.3.0. - [Release notes](https://github.com/hashicorp/web-platform-packages/releases) - [Changelog](https://github.com/hashicorp/web-platform-packages/blob/main/packages/cli/CHANGELOG.md) - [Commits](https://github.com/hashicorp/web-platform-packages/commits/@hashicorp/platform-cli@2.3.0/packages/cli) --- updated-dependencies: - dependency-name: "@hashicorp/platform-cli" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-30 14:59:55 -04:00
Luiz Aoqui	b924802958	template: apply splay value on change_mode script (#14749 ) Previously, the splay timeout was only applied if a template re-render caused a restart or a signal action. The `change_mode = "script"` was running after the `if restart \|\| len(signals) != 0` check, so it was invoked at all times. This change refactors the logic so it's easier to notice that new `change_mode` options should start only after `splay` is applied.	2022-09-30 12:04:22 -04:00
Tim Gross	e13ac471fc	Revert removing deprecated client options docs (#14753 ) This reverts PR #12416 and commit 6668ce022ac561f75ad113cc838b1fb786f11f79. While the driver options are well and truly deprecated, this documentation also covers features like `fingerprint.denylist` that are not available any other way. Let's revert this until #12420 is ready.	2022-09-30 08:38:03 -04:00
Phil Renaud	8ac604841a	[ui] Bugfix: reinstate the "this variable will be accessible by $job/$group/$task" notification (#14741 ) * When we isolated the variable form path to within its component for isolation reasons, we lost the model-level checks for related entites at type-time * Be a little more functionally pure * Use Ember.set to appease mirage	2022-09-29 10:40:00 -04:00
Phil Renaud	a200c2f2f2	Fix a bug where we only checked the first task within a given alloc for services (#14740 )	2022-09-29 10:39:42 -04:00
Seth Hoenig	c68ed3b4c8	client: protect user lookups with global lock (#14742 ) * client: protect user lookups with global lock This PR updates Nomad client to always do user lookups while holding a global process lock. This is to prevent concurrency unsafe implementations of NSS, but still enabling NSS lookups of users (i.e. cannot not use osusergo). * cl: add cl	2022-09-29 09:30:13 -05:00
Michael Schurter	0e95fb03c0	test: skip chown test if nonroot (#14738 ) CI always runs this as root, so it worked there and always scared me when I ran it locally.	2022-09-28 14:45:38 -07:00
Derek Strickland	2c4df95e92	Merge pull request #14664 from hashicorp/docs-multiregion-dispatch multiregion: Added a section for multiregion parameterized job dispatch	2022-09-28 15:40:11 -04:00
Derek Strickland	c3d4496287	link from dispatch command	2022-09-28 08:30:22 -04:00
Derek Strickland	8b37e558fb	Apply suggestions from code review	2022-09-28 08:18:56 -04:00
Derek Strickland	fe7d1e08ac	Update website/content/docs/job-specification/multiregion.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-09-28 07:20:11 -04:00
Derek Strickland	e1dba23ccf	Update website/content/docs/job-specification/multiregion.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-09-28 07:19:54 -04:00
Derek Strickland	7a26c4fb20	Merge pull request #14730 from hashicorp/remove-testing-changelog-entry Remove changelog entry for test update PR	2022-09-27 18:26:02 -04:00
Derek Strickland	4c73a3b1dc	Remove changelog entry for test update PR	2022-09-27 18:17:49 -04:00
Derek Strickland	eeeadfd24a	Merge pull request #14729 from hashicorp/remove-bug-fix-changelog-files Fix and remove changelog files	2022-09-27 18:11:52 -04:00
Derek Strickland	06db9d5a4d	Merge pull request #14728 from hashicorp/post-1.4.0-rc.1-release Post 1.4.0 rc.1 release	2022-09-27 17:52:47 -04:00
Derek Strickland	52e4997ace	Add enterprise tag	2022-09-27 17:50:25 -04:00
Derek Strickland	ef0f8c5b81	Add enterprise tag	2022-09-27 17:49:27 -04:00
Derek Strickland	6738684167	Delete 14665.txt	2022-09-27 17:47:35 -04:00
Derek Strickland	87bdb74221	Remove bug fix changelog files	2022-09-27 17:46:32 -04:00
hc-github-team-nomad-core	9232da0914	Prepare for next release	2022-09-27 17:33:32 -04:00
hc-github-team-nomad-core	2fe5a962f3	Generate files for 1.4.0-rc.1 release	2022-09-27 17:33:32 -04:00
Derek Strickland	a3cda3ede0	Apply changes from code review	2022-09-27 17:33:32 -04:00
Derek Strickland	1eb07f6202	Prepare release 1.4.0-rc.1	2022-09-27 17:33:32 -04:00
Phil Renaud	5a6084fff7	Visual diff tests: error states (#14707 ) * 3 error states captured * Assertion expecters * Attempt to stabilize datacenters	2022-09-27 15:46:33 -04:00
Derek Strickland	d26d2874ae	Merge pull request #14727 from hashicorp/changelog-14651-breaking-change Fix changelog entry type	2022-09-27 15:03:02 -04:00
Derek Strickland	cacf4bb8e1	Fix changelog entry type	2022-09-27 14:33:39 -04:00
Michael Schurter	0df5c7d5ae	test: fix flaky test (#14713 ) Need to wait for Stop evals to be processed before you can expect subsequent RPCs to see the alloc's DesiredStatus=stop.	2022-09-27 10:36:16 -07:00
Jim Razmus II	7da3fd050b	jobspec: allow artifact headers in HCLv1 (#14637 ) * jobspec: allow artifact headers in HCLv1 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-09-27 12:18:49 -04:00
Seth Hoenig	5df5e70542	core: numeric operands comparisons in constraints (#14722 ) * cleanup: fixup linter warnings in schedular/feasible.go * core: numeric operands comparisons in constraints This PR changes constraint comparisons to be numeric rather than lexical if both operands are integers or floats. Inspiration #4856 Closes #4729 Closes #14719 * fix: always parse as int64	2022-09-27 11:07:07 -05:00
Phil Renaud	24eea2d4d4	task logs page snapshot (#14709 )	2022-09-27 10:25:04 -04:00
Phil Renaud	29feb48835	Empty and filled task exec screenshots for test (#14702 ) * Empty and filled task exec screenshots for test * Attempting to stabilize datacenter prop on servers	2022-09-27 10:24:53 -04:00
Tim Gross	87681fca68	CSI: ensure initial unpublish state is checkpointed (#14675 ) A test flake revealed a bug in the CSI unpublish workflow, where an unpublish that comes from a client that's successfully done the node-unpublish step will not have the claim checkpointed if the controller-unpublish step fails. This will result in a delay in releasing the volume claim until the next GC. This changeset also ensures we're using a new snapshot after each write to raft, and fixes two timing issues in test where either the volume watcher can unpublish before the unpublish RPC is sent or we don't wait long enough in resource-restricted environements like GHA.	2022-09-27 08:43:45 -04:00
Michael Schurter	fb8739d926	docs: write a lot of words about heartbeats (#14679 ) * docs: write a lot of words about heartbeats Alternative to #14670 * Apply suggestions from code review Co-authored-by: Tim Gross <tgross@hashicorp.com> * use descriptive title for link * rework example of high failover ttl Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-09-26 14:43:34 -07:00
Seth Hoenig	7235d9988b	e2e: convert chroot env unit tests into e2e tests (#14710 ) This PR translates two of our most flakey unit tests into e2e tests where they are fit much more naturally.	2022-09-26 15:40:29 -05:00
Michael Schurter	e6af1c0a14	fingerprint: add node attr for reserverable cores (#14694 ) * fingerprint: add node attr for reserverable cores Add an attribute for the number of reservable CPU cores as they may differ from the existing `cpu.numcores` due to client configuration or OS support. Hopefully clarifies some confusion in #14676 * add changelog * num_reservable_cores -> reservablecores	2022-09-26 13:03:03 -07:00
Luiz Aoqui	5c100c0d3d	client: recover from getter panics (#14696 ) The artifact getter uses the go-getter library to fetch files from different sources. Any bug in this library that results in a panic can cause the entire Nomad client to crash due to a single file download attempt. This change aims to guard against this types of crashes by recovering from panics when the getter attempts to download an artifact. The resulting panic is converted to an error that is stored as a task event for operator visibility and the panic stack trace is logged to the client's log.	2022-09-26 15:16:26 -04:00
Michael Schurter	b554f9344a	fingerprint: lengthen Vault check after seen (#14693 ) Extension of #14673 Once Vault is initially fingerprinted, extend the period since changes should be infrequent and the fingerprint is relatively expensive since it is contacting a central Vault server. Also move the period timer reset after the fingerprint. This is similar to #9435 where the idea is to ensure the retry period starts after the operation is attempted. 15s will be the minimum time between fingerprints now instead of the maximum time between fingerprints. In the case of Vault fingerprinting, the original behavior might cause the following: 1. Timer is reset to 15s 2. Fingerprint takes 16s 3. Timer has already elapsed so we immediately Fingerprint again Even if fingerprinting Vault only takes a few seconds, that may very well be due to excessive load and backing off our fingerprints is desirable. The new bevahior ensures we always wait at least 15s between fingerprint attempts and should allow some natural jittering based on server load and network latency.	2022-09-26 12:14:19 -07:00
Tim Gross	a661399b41	cli: fix doc strings for `var get` command (#14697 )	2022-09-26 15:05:22 -04:00
Luiz Aoqui	f7c6534a79	cli: set content length on `operator api` requests (#14634 ) http.NewRequestWithContext will only set the right value for Content-Length if the input is bytes.Buffer, bytes.Reader, or *strings.Reader [0]. Since os.Stdin is an os.File, POST requests made with the `nomad operator api` command would always have Content-Length set to -1, which is interpreted as an unknown length by web servers. [0]: https://pkg.go.dev/net/http#NewRequestWithContext	2022-09-26 14:21:40 -04:00
Karan Sharma	cdb3ec25d3	docs: add new tools (#14596 )	2022-09-26 11:42:06 -04:00
Tim Gross	62b1e2ef97	variables: document restrictions on path and size (#14687 )	2022-09-26 11:40:53 -04:00

... 5 6 7 8 9 ...

24117 Commits All Branches Search

24117 Commits

All Branches