open-nomad

Commit Graph

Author	SHA1	Message	Date
Seth Hoenig	cfb7efc478	fix changelog entry typo (#17743 )	2023-06-27 08:02:06 -05:00
Seth Hoenig	4771690582	deps: update cronexpr to capture license file in SBOM tools (#17733 )	2023-06-27 07:58:20 -05:00
Juana De La Cuesta	28b66d2400	Update checklist-rpc-endpoint.md (#17698 ) * Update checklist-rpc-endpoint.md * Update checklist-rpc-endpoint.md * Update contributing/checklist-rpc-endpoint.md Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-06-27 10:52:38 +02:00
Phil Renaud	32af971bcb	Node Pools moved to after Type in jobs index columns (#17738 )	2023-06-26 17:00:01 -04:00
Seth Hoenig	d590123637	drivers/docker: refactor use of clients in docker driver (#17731 ) * drivers/docker: refactor use of clients in docker driver This PR refactors how we manage the two underlying clients used by the docker driver for communicating with the docker daemon. We keep two clients - one with a hard-coded timeout that applies to all operations no matter what, intended for use with short lived / async calls to docker. The other has no timeout and is the responsibility of the caller to set a context that will ensure the call eventually terminates. The use of these two clients has been confusing and mistakes were made in a number of places where calls were making use of the wrong client. This PR makes it so that a user must explicitly call a function to get the client that makes sense for that use case. Fixes #17023 * cr: followup items	2023-06-26 15:21:42 -05:00
sejalapeno	4c6906d873	Update allocations.go (#17726 ) * Update allocations.go updated missing client status "unknown" #17688 * changelog * Update .changelog/17726.txt adding relevant desc. Co-authored-by: Seth Hoenig <shoenig@duck.com> --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-26 13:33:29 -05:00
nicoche	649831c1d3	deploymentwatcher: fail early whenever possible (#17341 ) Given a deployment that has a `progress_deadline`, if a task group runs out of reschedule attempts, allow it to fail at this time instead of waiting until the `progress_deadline` is reached. Fixes: #17260	2023-06-26 14:01:03 -04:00
Phil Renaud	81edceb2de	[ui] alignment and spacing for job status panel (#17708 ) * CSS alignment and spacing for job status panel * Only fade the count, not the legend icon, when count is 0 * Unrounded version corners * changelog * css has to only remove border radius when count is present * Seed stabilization for services test * Try consolidating the testfixes from before * Total test isolation and bonus logs * Drop the isolation but keep the logs * Remove bonus logging	2023-06-26 12:18:12 -04:00
hashicorp-copywrite[bot]	e901340c3f	[COMPLIANCE] Add Copyright and License Headers (#17732 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-26 11:11:17 -05:00
dependabot[bot]	05a8ccff26	build(deps): bump github.com/containerd/go-cni from 1.1.7 to 1.1.9 (#17582 )	2023-06-26 16:47:20 +01:00
James Rasell	74ab0badb4	test: add drain config tests. (#17724 )	2023-06-26 16:23:13 +01:00
Seth Hoenig	2e2c578298	e2e: refactor pids isolation tests (#17717 ) This PR refactors some old PID isolation tests to make use of the e2e/v3 packages. Should be quite a bit easier to read. Adds 'alloc exec' capability to the jobs3 package.	2023-06-26 09:51:18 -05:00
Tim Gross	f65a925096	adjust prioritized client updates (#17541 ) In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.	2023-06-26 09:14:24 -04:00
dependabot[bot]	e93af16008	build(deps): bump github.com/opencontainers/runtime-spec (#17719 ) Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.0.3-0.20210326190908-1c3f411f0417 to 1.1.0-rc.3. - [Release notes](https://github.com/opencontainers/runtime-spec/releases) - [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog) - [Commits](https://github.com/opencontainers/runtime-spec/commits/v1.1.0-rc.3) --- updated-dependencies: - dependency-name: github.com/opencontainers/runtime-spec dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-26 08:03:50 -05:00
Piotr Kazmierczak	abd2252115	chore: gofmt docker driver handle.go (#17721 )	2023-06-26 10:38:23 +02:00
Johan Forssell	9174f38f8c	drivers: OOM kill logging for Docker driver (#17518 ) Explicit error log of the docker ID and container image name	2023-06-26 10:13:23 +02:00
Tim Gross	926b3030d7	cli: fix broken `node pool jobs` test (#17715 ) In #17705 we fixed a bug in the treatment of the "all" node pool for the `node pool jobs` command but missed a test in the CLI.	2023-06-23 14:10:45 -07:00
Tim Gross	1432af9a88	docs: clarify drain's `-force` flag behavior with system/CSI jobs (#17703 ) If you use `nomad node drain -force`, the drain deadline is set to -1ns. If you have not prevented system and CSI node plugin allocations from being drained with `-ignore-system`, they will be immediately drained as well. This is typically not safe for CSI node plugins. Also fix some broken links. Fixes: #17696	2023-06-23 16:38:11 -04:00
Luiz Aoqui	9aa9779d80	api: prevent panic on job plan (#17689 ) Check for a nil job ID to prevent a panic when calling Jobs().Plan().	2023-06-23 16:20:52 -04:00
Luiz Aoqui	d62c34b9f9	build: add Docker image (#17017 ) Co-authored-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>	2023-06-23 15:57:09 -04:00
Luiz Aoqui	66962b2b28	np: fix list of jobs for node pool `all` (#17705 ) Unlike nodes, jobs are allowed to be registered in the node pool `all`, in which case all nodes are used for evaluating placements. When listing jobs for the `all` node pool only those that are explicitly in this node pool should be returned.	2023-06-23 15:47:53 -04:00
Luiz Aoqui	3398d32000	changelog: add entry for node pools (#17707 )	2023-06-23 15:47:35 -04:00
Tim Gross	12d5eab2d1	docs: split out unsupported versions in changelog (#17704 ) Our changelog has become large enough that GitHub's rendering is very slow, resulting in error pages ("angry unicorns"). Split out the older unsupported versions of Nomad into their own file so that we only need to render the most recent versions, while keeping the older versions relatively searchable by having them in a single file.	2023-06-23 15:17:57 -04:00
grembo	7936c1e33f	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00
Michael Lange	faa3377a56	Merge pull request #17691 from hashicorp/f/missing-chart-stories [UI] Missing chart stories	2023-06-23 08:17:34 -07:00
James Rasell	b9440965db	client: remove unused nsd check allocation result diff func (#17695 )	2023-06-23 15:26:06 +01:00
Seth Hoenig	2c7877658c	e2e: create a v3/ set of packages for creating Nomad e2e tests (#17620 ) * e2e: create a v3/ set of packages for creating Nomad e2e tests This PR creates an experimental set of packages under `e2e/v3/` for crafting Nomad e2e tests. Unlike previous generations, this is an attempt at providing a way to create tests in a declarative (ish) pattern, with a focus on being easy to use, easy to cleanup, and easy to debug. @shoenig is just trying this out to see how it goes. Lots of features need to be implemented. Many more docs need to be written. Breaking changes are to be expected. There are known and unknown bugs. No warranty. Quick run of `example` with verbose logging. ```shell ➜ NOMAD_E2E_VERBOSE=1 go test -v === RUN TestExample === RUN TestExample/testSleep util3.go:25: register (service) job: "sleep-809" util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: pending util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: complete util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: successful util3.go:25: deployment a85ad2f8-269c-6620-d390-8eac7a9c397d was a success util3.go:25: deregister job "sleep-809" util3.go:25: system gc === RUN TestExample/testNamespace util3.go:25: apply namespace "example-291" util3.go:25: register (service) job: "sleep-967" util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: pending util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: complete util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: successful util3.go:25: deployment 3395e9a8-3ffc-8990-d5b8-cc0ce311f302 was a success util3.go:25: deregister job "sleep-967" util3.go:25: system gc util3.go:25: cleanup namespace "example-291" === RUN TestExample/testEnv util3.go:25: register (batch) job: "env-582" util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: pending util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: complete util3.go:25: deregister job "env-582" util3.go:25: system gc --- PASS: TestExample (10.08s) --- PASS: TestExample/testSleep (5.02s) --- PASS: TestExample/testNamespace (4.02s) --- PASS: TestExample/testEnv (1.03s) PASS ok github.com/hashicorp/nomad/e2e/example 10.079s ``` * cluster3: use filter for kernel.name instead of filtering manually	2023-06-23 09:10:49 -05:00
James Rasell	78cdf0d0d8	server: remove unused endpoints struct. (#17665 )	2023-06-23 08:20:33 +01:00
Luiz Aoqui	f785da4748	ci: fix flaky UI test (#17676 )	2023-06-22 23:07:36 -04:00
Michael Lange	41f6f7e04f	TopoViz story that is sourced from Mirage Unfortunately due to the split build nature of the ember app and storybook it isn't possible to import mirage in the storybook context to control scenarios via a knob :(	2023-06-22 16:55:36 -07:00
Michael Lange	85371941c4	Full TopoViz story	2023-06-22 16:55:25 -07:00
Michael Lange	cb30ef1a0f	TopoViz child component stories	2023-06-22 15:03:32 -07:00
Michael Lange	859374ecad	Standard usage story	2023-06-22 13:53:21 -07:00
Michael Lange	de09e3f51a	Basic recommendation-chart story with knobs	2023-06-22 13:53:21 -07:00
Phil Renaud	7373261b58	[ui] Versions added to deploying status panel (#17629 ) * Versions added to deploying status panel * Wrap the running and healthy title in a span * Versions in the deployment UI next to titles * Version count and label styles updated	2023-06-22 16:19:41 -04:00
Tim Gross	12ca68ec26	release pipeline: fix ref arguments in invoking workflow (#17684 ) Although #17669 fixed the permissions of the release pipeline to push new commits, there was still an error when invoking the `build` workflow. The format of the reference was changed in #17103 such that we're sending the git ref (a SHA) and not the "--ref" argument required by the GH actions workflow API, which in this case is apparently specially defined as "The branch or tag name which contains the version of the workflow file you'd like to run" and not what git calls a "ref". This changeset: * Removes the third-party action entirely so that we're using GitHub's own tooling. This removes one more thing from the supply chain to pin and ensures a 1:1 mapping of args to what's documented by GitHub. * Removes the `--ref` argument entirely, which causes it to default to the current branch that the release workflow is running on (which is always what we want).	2023-06-22 15:33:19 -04:00
Phil Renaud	cf21a246c0	[ui, deployments] job status panel legend: counts of 0 don't get links (#17644 )	2023-06-22 14:40:11 -04:00
Luiz Aoqui	e66a7bbefe	core: remove unnecessary call to SetNodes and adds DC downgrade test (#17655 )	2023-06-22 13:26:14 -04:00
Jai	7103ce1957	ui: create node pool model (#17301 ) Co-authored-by: Phil Renaud <phil@riotindustries.com> Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-06-22 13:11:44 -04:00
Luiz Aoqui	8f05eaaa68	np: check for license on RPC endpoints (#17656 )	2023-06-22 12:52:20 -04:00
Luiz Aoqui	53dd8835b8	ci: set `continue-on-error: true` on `test-ui` (#17646 ) Since the matrix exercises different test cases, it's better to allow all partitions to completely run, even if one of them fails, so it's easier to catch multiple test failures.	2023-06-22 11:31:49 -04:00
Tim Gross	11216d09af	client: send node secret with every client-to-server RPC (#16799 ) In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the request came thru a client node first. But this fix broke (knowingly) the identification of many client-to-server RPCs. These will be now measured as if they were anonymous. The reason for this is that many client-to-server RPCs do not send the node secret and instead rely on the protection of mTLS. This changeset ensures that the node secret is being sent with every client-to-server RPC request. In a future version of Nomad we can add enforcement on the server side, but this was left out of this changeset to reduce risks to the safe upgrade path. Sending the node secret as an auth token introduces a new problem during initial introduction of a client. Clients send many RPCs concurrently with `Node.Register`, but until the node is registered the node secret is unknown to the server and will be rejected as invalid. This causes permission denied errors. To fix that, this changeset introduces a gate on having successfully made a `Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`, which we need earlier but which also ignores the error because that handler doesn't do an authorization check). This ensures that we only send requests with a node secret already known to the server. This also makes client startup a little easier to reason about because we know `Node.Register` must succeed first, and it should make for a good place to hook in future plans for secure introduction of nodes. The tradeoff is that an existing client that has running allocs will take slightly longer (a second or two) to transition to ready after a restart, because the transition in `Node.UpdateStatus` is gated at the server by first submitting `Node.UpdateAlloc` with client alloc updates.	2023-06-22 11:06:49 -04:00
Luiz Aoqui	ca3c004130	ci: fix some flaky UI tests (#17648 ) These tests would fail depending on the value of the seed used.	2023-06-22 10:51:07 -04:00
Tim Gross	70a359048e	release pipeline: release workflow needs write permissions (#17669 ) In #17103 we set read-only permissions on all the workflows. Unfortunately we missed that the `release` workflow makes git commits and pushes them to the repository, so it needs to have write permissions.	2023-06-22 10:40:45 -04:00
Luiz Aoqui	0549b880ef	ui: display mirage scenario in header label (#17649 ) This information is useful when switching between different scenarios for testing.	2023-06-22 10:38:17 -04:00
Seth Hoenig	5138c5b99e	client: do not disable memory swappiness if kernel does not support it (#17625 ) * client: do not disable memory swappiness if kernel does not support it This PR adds a workaround for very old Linux kernels which do not support the memory swappiness interface file. Normally we write a "0" to the file to explicitly disable swap. In the case the kernel does not support it, give libcontainer a nil value so it does not write anything. Fixes #17448 * client: detect swappiness by writing to the file * fixup changelog Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-06-22 09:36:31 -05:00
Luiz Aoqui	9f5c02d947	ui: add tooltips to the Topology labels (#17647 ) Add tooltips to labels in nodes and datacenters for the Topology view page to clarify what each value represents.	2023-06-22 10:33:42 -04:00
Luiz Aoqui	3d761e712b	ui: remove redundant columns from child job table (#17645 ) Namespace, job type, and priority are already available from the parent job header, so displaying them in the table caused it to be too crowded.	2023-06-22 10:22:41 -04:00
James Rasell	4e2d019639	variables: remove unused state store functions. (#17660 )	2023-06-22 13:54:58 +01:00
James Rasell	71fdd7e891	core: use faster concatenation for alloc name generation. (#17591 )	2023-06-22 07:46:28 +01:00

... 4 5 6 7 8 ...

25070 Commits All Branches Search

25070 Commits

All Branches