open-nomad

Commit Graph

Author	SHA1	Message	Date
James Rasell	9923f9e6f3	nnsd: gate registration write & delete RPC use on v1.3.0 or greater. (#14924 )	2022-10-18 15:30:28 +02:00
Seth Hoenig	f1b902beac	consul: do not re-register already registered services (#14917 ) This PR updates Nomad's Consul service client to do map comparisons using maps.Equal instead of reflect.DeepEqual. The bug fix is in how DeepEqual treats nil slices different from empty slices, when actually they should be treated the same.	2022-10-18 08:10:59 -05:00
Tim Gross	3c78980b78	make version checks specific to region (1.4.x) (#14912 ) * One-time tokens are not replicated between regions, so we don't want to enforce that the version check across all of serf, just members in the same region. * Scheduler: Disconnected clients handling is specific to a single region, so we don't want to enforce that the version check across all of serf, just members in the same region. * Variables: enforce version check in Apply RPC * Cleans up a bunch of legacy checks. This changeset is specific to 1.4.x and the changes for previous versions of Nomad will be manually backported in a separate PR.	2022-10-17 16:23:51 -04:00
Seth Hoenig	306b4dd38e	cleanup: remove another string-set helper function (#14902 )	2022-10-17 14:14:52 -05:00
Tim Gross	c721ce618e	keyring: filter by region before checking version (#14901 ) In #14821 we fixed a panic that can happen if a leadership election happens in the middle of an upgrade. That fix checks that all servers are at the minimum version before initializing the keyring (which blocks evaluation processing during trhe upgrade). But the check we implemented is over the serf membership, which includes servers in any federated regions, which don't necessarily have the same upgrade cycle. Filter the version check by the leader's region. Also bump up log levels of major keyring operations	2022-10-17 13:21:16 -04:00
Kevin Wang	d66b2eba43	fix: website broken links (#14904 ) * fix: website broken links * fix up keyring-rotate link Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-10-17 11:32:10 -04:00
Michael Schurter	21eced0a4e	test: extend timing and output of overlap e2e test (#14894 ) Keeps failing in the nightly e2e test with unhelpful output like: ``` Failed === RUN TestOverlap overlap_test.go:92: Followup job overlap93ee1d2b blocked. Sleeping for the rest of overlap48c26c39's shutdown_delay (9.2/10s) overlap_test.go:105: 1500/2000 retries reached for github.com/hashicorp/nomad/e2e/overlap.TestOverlap (err=timed out before an allocation was found for overlap93ee1d2b) overlap_test.go:105: timeout: timed out before an allocation was found for overlap93ee1d2b --- FAIL: TestOverlap (38.96s) ``` I have not been able to replicate it in my own e2e cluster, so I added the EvalDump helper to add detailed eval information like: ``` === RUN TestOverlap 1/1 Job overlap7b0e90ec Eval c38c9919-a4f0-5baf-45f7-0702383c682a Type: service TriggeredBy: job-register Deployment: Status: pending () NextEval: PrevEval: BlockedEval: -- No placement failures -- QueuedAllocs: SnapshotIdx: 0 CreateIndex: 96 ModifyIndex: 96 ... ``` Hopefully helpful when debugging other tests as well!	2022-10-14 14:15:07 -07:00
Mike Nomitch	91d32bb8df	Merge pull request #14879 from hashicorp/mnomitch/job-purge-ui Adds purge job button to UI	2022-10-14 12:46:20 -07:00
hashicorp-copywrite[bot]	2df28b0d7e	[COMPLIANCE] Update MPL 2.0 LICENSE (#14884 ) Co-authored-by: hashicorp-copywrite[bot] <noreply@hashicorp.com>	2022-10-13 08:43:12 -04:00
Michael Schurter	bdb639b3e2	test: simplify overlap job placement logic (#14811 ) * test: simplify overlap job placement logic Trying to fix #14806 Both the previous approach as well as this one worked on e2e clusters I spun up. * simplify code flow	2022-10-12 11:21:28 -07:00
Mike Nomitch	c4ec506009	Adds purge job button to UI when job stopped	2022-10-12 08:14:48 -07:00
Tim Gross	bcd26f8815	docker_logger: reorder imports to save memory (#14875 ) Nomad runs one logmon process and also one docker_logger process for each running allocation. A naive look at memory usage shows 10-30 MB of RSS, but a closer look shows that most of this memory (ex. all but ~2MB for logmon) is shared (`Shared_Clean` in Linux pmap). But a heap dump of docker_logger shows that it currently has an extra ~2500 KiB of heap (anonymously-mapped unshared memory) used for init blocks coming from the agent code (ex. mostly regexes from go-version, structs, and the Consul SDK). The packages for running logmon, docker_logger, and executor have an init block that parses `os.Args` to drop into their own logic, which prevents them from loading all the rest of the agent code and saves on memory, so this was unexpected. It looks like we accidentally reordered the imports in main to undo some of the work originally done in 404d2d4c98f1df930be1ae9852fe6e6ae8c1517e. This changeset restores the ordering. A follow-up heap dump shows this saves ~2MB of unshared RSS per docker_logger process.	2022-10-11 13:23:03 -04:00
Michael Schurter	45ce8c13cf	client: remove unused LogOutput and LogLevel (#14867 ) * client: remove unused LogOutput * client: remove unused config.LogLevel	2022-10-11 09:24:40 -07:00
Seth Hoenig	ba1e337f8b	helpers: lockfree lookup of nobody user on unix systems (#14866 ) * helpers: lockfree lookup of nobody user on linux and darwin This PR continues the nobody user lookup saga, by making the nobody user lookup lock-free on linux and darwin. By doing the lookup in an init block this originally broke on Windows, where we must avoid doing the lookup at all. We can get around that breakage by only doing the lookup on linux/darwin where the nobody user is going to exist. Also return the nobody user by value so that a copy is created that cannot be modified by callers of Nobody(). * helper: move nobody code into unix file	2022-10-11 08:38:05 -05:00
Seth Hoenig	1593963cd1	servicedisco: implicit constraint for nomad v1.4 when using nsd checks (#14868 ) This PR adds a jobspec mutator to constrain jobs making use of checks in the nomad service provider to nomad clients of at least v1.4.0. Before, in a mixed client version cluster it was possible to submit an NSD job making use of checks and for that job to land on an older, incompatible client node. Closes #14862	2022-10-11 08:21:42 -05:00
Seth Hoenig	69ced2a2bd	services: remove assertion on 'task' field being set (#14864 ) This PR removes the assertion around when the 'task' field of a check may be set. Starting in Nomad 1.4 we automatically set the task field on all checks in support of the NSD checks feature. This is causing validation problems elsewhere, e.g. when a group service using the Consul provider sets 'task' it will fail validation that worked previously. The assertion of leaving 'task' unset was only about making sure job submitters weren't expecting some behavior, but in practice is causing bugs now that we need the task field for more than it was originally added for. We can simply update the docs, noting when the task field set by job submitters actually has value.	2022-10-10 13:02:33 -05:00
Seth Hoenig	5e38a0e82c	cleanup: rename Equals to Equal for consistency (#14759 )	2022-10-10 09:28:46 -05:00
Seth Hoenig	0e702aec00	build: move imports into the transitive require block (#14863 )	2022-10-10 09:27:55 -05:00
Phil Renaud	e771b94164	[ui] Makes service tags wrap and look like tag items (#14834 ) * Makes service tags wrap and look like tag items * Add a little vertical spacing and changelog * Put client before tags * Force tags list to new line	2022-10-07 09:23:52 -04:00
Tim Gross	91d4ccd905	Dependency updates from dependabot (#14844 ) * build(deps): bump github.com/opencontainers/runc from 1.1.3 to 1.1.4 Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.3 to 1.1.4. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.1.4/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.3...v1.1.4) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): bump github.com/prometheus/client_golang Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/prometheus/client_golang/releases) - [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md) - [Commits](https://github.com/prometheus/client_golang/compare/v1.12.0...v1.13.0) --- updated-dependencies: - dependency-name: github.com/prometheus/client_golang dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): bump github.com/mattn/go-colorable from 0.1.12 to 0.1.13 Bumps [github.com/mattn/go-colorable](https://github.com/mattn/go-colorable) from 0.1.12 to 0.1.13. - [Release notes](https://github.com/mattn/go-colorable/releases) - [Commits](https://github.com/mattn/go-colorable/compare/v0.1.12...v0.1.13) --- updated-dependencies: - dependency-name: github.com/mattn/go-colorable dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): bump github.com/google/go-cmp from 0.5.8 to 0.5.9 Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.8 to 0.5.9. - [Release notes](https://github.com/google/go-cmp/releases) - [Commits](https://github.com/google/go-cmp/compare/v0.5.8...v0.5.9) --- updated-dependencies: - dependency-name: github.com/google/go-cmp dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * build(deps): bump github.com/miekg/dns from 1.1.41 to 1.1.50 Bumps [github.com/miekg/dns](https://github.com/miekg/dns) from 1.1.41 to 1.1.50. - [Release notes](https://github.com/miekg/dns/releases) - [Changelog](https://github.com/miekg/dns/blob/master/Makefile.release) - [Commits](https://github.com/miekg/dns/compare/v1.1.41...v1.1.50) --- updated-dependencies: - dependency-name: github.com/miekg/dns dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-10-07 09:09:28 -04:00
Damian Czaja	95f969c4bf	cli: add `nomad fmt` (#14779 )	2022-10-06 17:00:29 -04:00
Phil Renaud	4b93a30225	[ui] Line charts: explicitly update X-axis whenever xScale changes (#14814 ) * Explicitly update X-axis whenever xScale changes * Changelog	2022-10-06 16:59:16 -04:00
Hemanth Krishna	e516fc266f	enhancement: UpdateTask when Task is waiting for ShutdownDelay (#14775 ) Signed-off-by: Hemanth Krishna <hkpdev008@gmail.com>	2022-10-06 16:33:28 -04:00
Will Jordan	8ae13208c9	Allow jobs not requiring any network resources (#14300 ) Jobs not requiring any network resources should be allowed even when the network fingerprinter is disabled.	2022-10-06 16:25:41 -04:00
Gabriel Villalonga Simon	b974c32ba6	Check that JobPlanResponse Diff Type is None before checking for changes on getExitCode (#14492 )	2022-10-06 16:23:22 -04:00
Pablo Ruiz García	40416be7b1	Invoke FingerprintManager's Reload() func during agent's SIGHUP (#14615 ) Fixes #14614	2022-10-06 16:22:59 -04:00
Giovani Avelar	a625de2062	Allow specification of a custom job name/prefix for parameterized jobs (#14631 )	2022-10-06 16:21:40 -04:00
Tim Gross	6263c8b323	lock closed issues and PRs after 120 days (#14824 ) When community members comment on long-closed issues, there's a number of failure modes that make for a bad experience for them: * Their comments are often missed entirely because notification settings make it impractical for most developers to read comments on inactive issues. * In our experience, the problem is only rarely a regression; because failures are complex, totally different code paths can result in symptoms that initially appear to be the same but turn out to be completely different under close examination. This is particularly the case for issues fixed in very old versions (sometimes 2 or more years old). The Terraform core team uses a bot that locks issues after only 30 days. But because we typically close issues automatically on PR merge but don't have rolling releases, it'd frequently happen that unrelease fixes will have locked comments, which isn't a good experience either. I've looked through the pace of releases since Nomad 0.9.0 and the longest window between releases was 3 months. Set the window for the lock bot to 120 days to give us plenty of breathing room so it doesn't feel like we're shutting down discussion prematurely.	2022-10-06 16:18:00 -04:00
Michael Schurter	7bbbef9951	docs: clarify nomad vars vs vault (#14831 ) * docs: clarify nomad vars vs vault I think we should make the difference in root key management between Nomad and Vault clear in the concept docs. I didn't see anywhere else in the docs we compared it. I also s/secrets/variables everywhere except the first sentence since the feature is intended to be more generic than secrets. Right now it's more of a compliment to Consul's kv than Vault due to root key handling and featureset. * Update website/content/docs/concepts/variables.mdx Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-10-06 13:17:26 -07:00
HashiBot	eab6bb5e35	website: upgrade next version (#14830 ) Co-authored-by: Bryce Kalow <bkalow@hashicorp.com>	2022-10-06 13:48:11 -05:00
Tim Gross	80ec5e1346	fix panic from keyring raft entries being written during upgrade (#14821 ) During an upgrade to Nomad 1.4.0, if a server running 1.4.0 becomes the leader before one of the 1.3.x servers, the old server will crash because the keyring is initialized and writes a raft entry. Wait until all members are on a version that supports the keyring before initializing it.	2022-10-06 12:47:02 -04:00
Derek Strickland	36c644aaf2	Merge pull request #14828 from hashicorp/post-1.4.0-release Post 1.4.0 release	2022-10-06 09:30:56 -07:00
Derek Strickland	88ddf260da	Merge release 1.4.0 files	2022-10-06 09:24:54 -07:00
hc-github-team-nomad-core	bfd7159f42	Prepare for next release	2022-10-06 09:16:01 -07:00
hc-github-team-nomad-core	4fdcd197c0	Generate files for 1.4.0 release	2022-10-06 09:16:00 -07:00
Luiz Aoqui	f9aeb11183	prepare release 1.4.0	2022-10-06 09:16:00 -07:00
Tim Gross	0cc64da404	docs: 1.4.0 upgrade warning for keyring initialization (#14825 )	2022-10-06 11:32:35 -04:00
Tim Gross	0e1f8cd803	semgrep: add MeasureSinceWithLabels to FSM time rule (#14812 ) Metrics state is local to the server and needs to use time, which is normally forbidden in the FSM code. We have a bypass for this rule for `metrics.MeasureSince` but needed one for `metrics.MeasureSinceWithLabels` as well.	2022-10-06 10:59:53 -04:00
James Rasell	0187240e7c	e2e: fixes the ordering on greater than checks within spread test. (#14818 )	2022-10-06 15:27:36 +02:00
James Rasell	67e8f85360	e2e: fix incorrect must function usage in namespace suite. (#14805 )	2022-10-05 15:50:56 +02:00
Phil Renaud	7313ac2905	Switch to the 'running' green for health checks (#14799 )	2022-10-04 16:59:50 -04:00
Tim Gross	322ecc858f	client: defer `nobody` user lookup so Windows doesn't panic (#14790 ) In #14742 we introduced a cached lookup of the `nobody` user, which is only ever called on Unixish machines. But the initial caching was being done in an `init` block, which meant it was being run on Windows as well. This prevents the Nomad agent from starting on Windows. An alternative fix here would be to have a separate `init` block for Windows and Unix, but this potentially masks incorrect behavior if we accidentally added a call to the `Nobody()` method on Windows later. This way we're forced to handle the error in the caller.	2022-10-04 11:52:12 -04:00
Tim Gross	341dc84a77	variables: use correct URL in ref to docs (#14792 )	2022-10-04 11:30:49 -04:00
Tim Gross	2f3d4f51e6	deps: remove gophers.dev dependency (#14789 )	2022-10-04 09:49:50 -04:00
Tim Gross	a3ff23608c	deps: use install from current HEAD for `hc-install` (#14786 ) The `hc-install` tool we're using needed a patch for a specific bug, but that's since been merged. We definitely want to switch to using a standard release from that project once one is shipped with the CLI, but pinning to HEAD should keep us for now.	2022-10-04 08:22:30 -04:00
Michael Schurter	ed3218c3dd	Fixing flaky TestOverlap test (#14780 ) * test: ensure feasible node selected in overlap test * test: warn when getting close to retry limit	2022-10-03 14:35:02 -07:00
Elijah Voigt	0a80a58394	Docs(job-specification/periodic): Add enabled toggle (#14767 ) This is probably undocumented for a reason, but the `enabled` toggle in the `periodic` stanza is very useful so I figured I try adding it to the docs. The feature has been secretly avaliable since #9142 and was called out in that PR as being a dubious addition, only added to avoid regressions. The use case for disabling a periodic job in this way is to prevent it from running without modifying the schedule. Ideally Nomad would make it more clear that this was the case, and allow you to force a run of the job, but even with those rough edges I think users would benefit from knowing about this toggle.	2022-10-03 15:08:24 -04:00
Tim Gross	2a6e8be6ba	internals documentation with diagrams (#14750 ) This changeset adds new architecture internals documents to the contributing guide. These are intentionally here and not on the public-facing website because the material is not required for operators and includes a lot of diagrams that we can cheaply maintain with mermaid syntax but would involve art assets to have up on the main site that would become quickly out of date as code changes happen and be extremely expensive to maintain. However, these should be suitable to use as points of conversation with expert end users. Included: * A description of Evaluation triggers and expected counts, with examples. * A description of Evaluation states and implicit states. This is taken from an internal document in our team wiki. * A description of how writing the State Store works. This is taken from a diagram I put together a few months ago for internal education purposes. * A description of Evaluation lifecycle, from registration to running Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but broken into digestible chunks and without multi-region deployments, which I'd like to cover in a future doc. Also includes adding Deployments to our public-facing glossary. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>	2022-10-03 14:06:41 -04:00
Jai	bd8d023ee5	refact: only reload job if job has no taskGroups (#14760 )	2022-09-30 16:14:40 -04:00
dependabot[bot]	9ce74c83e6	build(deps-dev): bump @hashicorp/platform-cli in /website (#14541 ) Bumps [@hashicorp/platform-cli](https://github.com/hashicorp/web-platform-packages/tree/HEAD/packages/cli) from 2.1.0 to 2.3.0. - [Release notes](https://github.com/hashicorp/web-platform-packages/releases) - [Changelog](https://github.com/hashicorp/web-platform-packages/blob/main/packages/cli/CHANGELOG.md) - [Commits](https://github.com/hashicorp/web-platform-packages/commits/@hashicorp/platform-cli@2.3.0/packages/cli) --- updated-dependencies: - dependency-name: "@hashicorp/platform-cli" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-30 14:59:55 -04:00

1 2 3 4 5 ...

23858 Commits All Branches Search

23858 Commits

All Branches