open-nomad

Commit Graph

Author	SHA1	Message	Date
hc-github-team-nomad-core	d21d4e85cf	backport of commit ff928a804590611111763632388161dc711adf88 (#19124 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-17 15:10:23 -05:00
hc-github-team-nomad-core	c742a55583	backport of commit a3f8a52fd4b192db339540152033b94f1e010b31 (#19123 ) Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-17 15:10:13 -05:00
hc-github-team-nomad-core	7057c0c886	e2e: fix and modernize rescheduling test (#19105 ) (#19107 ) The E2E test suite for rescheduling had a few bugs: * Using the command line to stop a job with a failing deployment returns a non-zero exit code, which would cause an otherwise passing test to fail. * Two of the input jobs were actually invalid but were only correctly detected as such because of #17342 This changeset also updates the whole test suite to move it off the v1 "framework". A few test assertions are also de-flaked. Fixes: #19076 Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-11-16 15:52:05 -05:00
hc-github-team-nomad-core	6a4a3f6d78	backport of commit 6fca4fa715fcfe5c4a214e90f72c54cda7da6efd (#18490 ) Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-09-13 16:38:21 -05:00
hc-github-team-nomad-core	c25c04816d	Backport of e2e: modernize vaultcompat testing into release/1.6.x (#18182 ) This pull request was automerged via backport-assistant	2023-08-09 09:25:32 -05:00
James Rasell	5571890974	e2e: respect timeout value when waiting for allocs in v3. (#17800 )	2023-07-10 09:47:10 +01:00
Daniel Bennett	77a8d79bb5	e2e: use DNS instead of HTTP to get my_public_ipv4 (#17759 )	2023-06-28 13:11:57 -05:00
hashicorp-copywrite[bot]	e901340c3f	[COMPLIANCE] Add Copyright and License Headers (#17732 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-26 11:11:17 -05:00
Seth Hoenig	2e2c578298	e2e: refactor pids isolation tests (#17717 ) This PR refactors some old PID isolation tests to make use of the e2e/v3 packages. Should be quite a bit easier to read. Adds 'alloc exec' capability to the jobs3 package.	2023-06-26 09:51:18 -05:00
Seth Hoenig	2c7877658c	e2e: create a v3/ set of packages for creating Nomad e2e tests (#17620 ) * e2e: create a v3/ set of packages for creating Nomad e2e tests This PR creates an experimental set of packages under `e2e/v3/` for crafting Nomad e2e tests. Unlike previous generations, this is an attempt at providing a way to create tests in a declarative (ish) pattern, with a focus on being easy to use, easy to cleanup, and easy to debug. @shoenig is just trying this out to see how it goes. Lots of features need to be implemented. Many more docs need to be written. Breaking changes are to be expected. There are known and unknown bugs. No warranty. Quick run of `example` with verbose logging. ```shell ➜ NOMAD_E2E_VERBOSE=1 go test -v === RUN TestExample === RUN TestExample/testSleep util3.go:25: register (service) job: "sleep-809" util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: pending util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: complete util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: successful util3.go:25: deployment a85ad2f8-269c-6620-d390-8eac7a9c397d was a success util3.go:25: deregister job "sleep-809" util3.go:25: system gc === RUN TestExample/testNamespace util3.go:25: apply namespace "example-291" util3.go:25: register (service) job: "sleep-967" util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: pending util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: complete util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: successful util3.go:25: deployment 3395e9a8-3ffc-8990-d5b8-cc0ce311f302 was a success util3.go:25: deregister job "sleep-967" util3.go:25: system gc util3.go:25: cleanup namespace "example-291" === RUN TestExample/testEnv util3.go:25: register (batch) job: "env-582" util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: pending util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: complete util3.go:25: deregister job "env-582" util3.go:25: system gc --- PASS: TestExample (10.08s) --- PASS: TestExample/testSleep (5.02s) --- PASS: TestExample/testNamespace (4.02s) --- PASS: TestExample/testEnv (1.03s) PASS ok github.com/hashicorp/nomad/e2e/example 10.079s ``` * cluster3: use filter for kernel.name instead of filtering manually	2023-06-23 09:10:49 -05:00
Daniel Bennett	e58ba84a9e	e2e: fix windows client docker (#17572 ) the windows docker install script stopped working. after trying various things to fix the script, I opted instead for a base image that comes with docker already installed. error output during build was: Installing Docker. WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist. WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string. WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists. WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists. Failed to install Docker. Install-Package : No match was found for the specified search criteria and package name 'docker'.	2023-06-20 10:17:16 -05:00
hashicorp-copywrite[bot]	d797da4a3c	[COMPLIANCE] Add Copyright and License Headers (#17596 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-19 12:23:28 -04:00
Seth Hoenig	320bac0ac4	e2e: modernize podman test suite (#17564 ) Use the new style of e2e test for the podman suite ... which is all of one test case that was skipped out. Turn the case back on, and we will add more tests in the near future.	2023-06-16 10:36:17 -05:00
Seth Hoenig	cafaf2e2ee	e2e: cleanup podman installation in jammy image (#17558 ) * e2e: cleanup podman installation in jammy image The original steps were copied over from the bionic image and does a lot of hoop jumping we do not need anymore. For the moment just hard-code installing the v0.4.2 version of the driver, but I may follow up and modify hc-install to support installing @latest like go itself. * use releases for hc-install	2023-06-15 18:17:31 -05:00
Seth Hoenig	c7b44a57a2	e2e: purge bionic packer image scripts (#17559 ) Bionic is dead, long live the Jammy!	2023-06-15 15:15:01 -05:00
Patric Stout	4767d44b94	Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535 ) * Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 This meant that if any allocation was created or removed, all active DevicesSets were removed from all cgroups of all tasks. This was most noticeable with "exec" and "raw_exec", as it meant they no longer had access to /dev files. * e2e: add test for verifying cgroups do not interfere with access to devices --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-15 09:39:36 -05:00
Seth Hoenig	acfdf0f479	compliance: add headers with fixed copywrite tool (#17353 ) Closes #17117	2023-05-30 09:20:32 -05:00
Seth Hoenig	e04ff0d935	client: ignore restart issued to terminal allocations (#17175 ) * client: ignore restart issued to terminal allocations This PR fixes a bug where issuing a restart to a terminal allocation would cause the allocation to run its hooks anyway. This was particularly apparent with group_service_hook who would then register services but then never deregister them - as the allocation would be effectively in a "zombie" state where it is prepped to run tasks but never will. * e2e: add e2e test for alloc restart zombies * cl: tweak text Co-authored-by: Tim Gross <tgross@hashicorp.com> --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>	2023-05-16 10:19:41 -05:00
Seth Hoenig	4abb3e03ca	cli: upload var file(s) content on job submission (#17128 ) This PR makes it so that the content of any -var-file files is uploaded to Nomad on job run.	2023-05-11 08:04:33 -05:00
Seth Hoenig	74714272cc	api: set the job submission during job reversion (#17097 ) * api: set the job submission during job reversion This PR fixes a bug where the job submission would always be nil when a job goes through a reversion to a previous version. Basically we need to detect when this happens, lookup the submission of the job version being reverted to, and set that as the submission of the new job being created. * e2e: add e2e test for job submissions during reversion This e2e test ensures a reverted job inherits the job submission associated with the version of the job being reverted to.	2023-05-08 14:18:34 -05:00
Seth Hoenig	753c17c9de	services: un-mark group services as deregistered if restart hook runs (#16905 ) * services: un-mark group services as deregistered if restart hook runs This PR may fix a bug where group services will never be deregistered if the group undergoes a task restart. * e2e: add test case for restart and deregister group service * cl: add cl * e2e: add wait for service list call	2023-04-24 14:24:51 -05:00
Tim Gross	5a9abdc469	drain: use client status to determine drain is complete (#14348 ) If an allocation is slow to stop because of `kill_timeout` or `shutdown_delay`, the node drain is marked as complete prematurely, even though drain monitoring will continue to report allocation migrations. This impacts the UI or API clients that monitor node draining to shut down nodes. This changeset updates the behavior to wait until the client status of all drained allocs are terminal before marking the node as done draining.	2023-04-13 08:55:28 -04:00
Shawn	007b534020	fix: typo (#16873 )	2023-04-12 16:18:13 -04:00
Tim Gross	4df2d9bda8	E2E: clarify drain `-deadline` and `-force` flag behaviors (#16868 ) The `-deadline` and `-force` flag for the `nomad node drain` command only cause the draining to ignore the `migrate` block's healthy deadline, max parallel, etc. These flags don't have anything to do with the `kill_timeout` or `shutdown_delay` options of the jobspec. This changeset fixes the skipped E2E tests so that they validate the intended behavior, and updates the docs for more clarity.	2023-04-12 15:27:24 -04:00
Seth Hoenig	dbb6edd96d	e2e: add e2e tests for job submission api (#16841 ) * e2e: add e2e tests for job submission api * e2e: fixup callers of AllocLogs * fix typo	2023-04-12 08:36:17 -05:00
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	1335543731	ephemeral disk: `migrate` should imply `sticky` (#16826 ) The `ephemeral_disk` block's `migrate` field allows for best-effort migration of the ephemeral disk data to new nodes. The documentation says the `migrate` field is only respected if `sticky=true`, but in fact if client ACLs are not set the data is migrated even if `sticky=false`. The existing behavior when client ACLs are disabled has existed since the early implementation, so "fixing" that case now would silently break backwards compatibility. Additionally, having `migrate` not imply `sticky` seems nonsensical: it suggests that if we place on a new node we migrate the data but if we place on the same node, we throw the data away! Update so that `migrate=true` implies `sticky=true` as follows: * The failure mode when client ACLs are enabled comes from the server not passing along a migration token. Update the server so that the server provides a migration token whenever `migrate=true` and not just when `sticky=true` too. * Update the scheduler so that `migrate` implies `sticky`. * Update the client so that we check for `migrate \|\| sticky` where appropriate. * Refactor the E2E tests to move them off the old framework and make the intention of the test more clear.	2023-04-07 16:33:45 -04:00
Tim Gross	e7eae66cf1	E2E: update subset of node drain tests off the old framework (#16823 ) While working on several open drain issues, I'm fixing up the E2E tests. This subset of tests being refactored are existing ones that already work. I'm shipping these as their own PR to keep review sizes manageable when I push up PRs in the next few days for #9902, #12314, and #12915.	2023-04-07 09:17:19 -04:00
Seth Hoenig	4b7cd0a651	e2e/acl: export ACL resource Cleanup helpers (#16822 ) The e2e/acl package has some nice helpers for tracking and cleaning up ACL objects, but they are currently private. Export them so I can abuse them in other e2e tests.	2023-04-06 14:35:22 -05:00
Seth Hoenig	d11fe234e4	e2e: swap assert for test package in e2eutil/jobs.go (#16820 )	2023-04-06 10:02:27 -05:00
Tim Gross	09c19fa44a	E2E: test enforcement of ACL system (#16796 ) This changeset provides a matrix test of ACL enforcement across several dimensions: * anonymous vs bogus vs valid tokens * permitted vs not permitted by policy * request sent to server vs sent to client (and forwarded)	2023-04-06 09:11:20 -04:00
James Rasell	cb6ba80f0f	cli: stream both stdout and stderr when following an alloc. (#16556 ) This update changes the behaviour when following logs from an allocation, so that both stdout and stderr files streamed when the operator supplies the follow flag. The previous behaviour is held when all other flags and situations are provided. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-04-04 10:42:27 +01:00
Michael Schurter	4678dc7b4d	e2e: sleep to ensure logs are picked up (#16596 ) :(	2023-03-21 14:10:50 -07:00
Tim Gross	ad774ccfa1	E2E: fix events tests (#16595 ) In #12916 we updated the events test as part of a larger set of changes around mapstructure serialization fixes. But the changes to the jobs we're deploying in the tests had invalid task configs so they never result in good deployments and the test will always fail. Make the before/after jobs identical (except for the version bump) and make them valid. Also wait for allocations for the 2nd job run to appear before checking the deployment list, so that we don't race with the scheduler.	2023-03-21 14:01:40 -07:00
Michael Schurter	15fe2ade18	Windows fixes for e2e tests (#16592 ) * e2e: skip task api test when windows too old * e2e: don't run proxy on windows	2023-03-21 13:55:32 -07:00
Michael Schurter	a875bad6e5	Enable ACLs on E2E test clients (#16530 ) * e2e: uniformly enable acls across all agents * docs: clarify that acls should be set everywhere	2023-03-16 14:22:41 -07:00
Seth Hoenig	25944cbb7d	artifact: use specific version link for zipbomb artifact (#16513 ) Fix the e2e case where we download the go-getter bomb.zip test file, which is being removed on main. We can still get it from the version tag - yay git!	2023-03-16 10:18:46 -05:00
Michael Schurter	832bca91a1	e2e fixes: cli output, timing issue, and some cleanups (#16418 ) * e2e: job expects alloc to run until stopped * e2e: fix case changed by #16306 * e2e: couldn't find a bug but improved test+jobspecs	2023-03-10 13:14:51 -08:00
Seth Hoenig	2b5efeac04	e2e: setup nomad permissions correctly (client vs. server) (#16399 ) This PR configures - server nodes with a systemd unit running the agent as the nomad service user - client nodes with a root owned nomad data directory	2023-03-08 14:41:08 -06:00
Lance Haig	e89c3d3b36	Update ioutil library references to os and io respectively for e2e helper nomad (#16332 ) No user facing changes so I assume no change log is required	2023-03-08 09:39:03 -06:00
Seth Hoenig	32f8ca6ce3	e2e: fix permissions on nomad data directory (#16376 ) This PR updates the provisioning step where we create /opt/nomad/data, such that it is with 0700 permissions in line with our security guidance.	2023-03-07 14:41:54 -06:00
Michael Schurter	bd7b60712e	Accept Workload Identities for Client RPCs (#16254 ) This change resolves policies for workload identities when calling Client RPCs. Previously only ACL tokens could be used for Client RPCs. Since the same cache is used for both bearer tokens (ACL and Workload ID), the token cache size was doubled. --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-02-27 10:17:47 -08:00
Michael Schurter	d9587b323a	Task API / Dynamic Node Metadata E2E test fixes (#16219 ) * taskapi: return Forbidden on bad credentials Prior to this change a "Server error" would be returned when ACLs are enabled which did not match when ACLs are disabled. * e2e: love love love datacenter wildcard default * e2e: skip windows nodes on linux only test The Logfs are a bit weird because they're most useful when converted to Printfs to make debugging the test much faster, but that makes CI noisy. In a perfect world Go would expose how many tests are being run and we could stream output live if there's only 1. For now I left these helpful lines in as basically glorified comments.	2023-02-21 10:53:10 -08:00
Tim Gross	e23ed85d57	E2E: add multi-home networking to test infrastructure (#16218 ) Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet we have provisioned in each AZ. Revise security groups as follows: * Split out client security groups from servers so that we can't have clients accidentally accessing serf addresses or other unexpected cross-talk. * Add new security groups for the secondary subnet that only allows communication within the security group so we can exercise behaviors with multiple IPs. This changeset doesn't include any Nomad configuration changes needed to take advantage of the extra network interface. I'll include those with testing for PR #16217.	2023-02-20 10:08:28 +01:00
Seth Hoenig	165791dd89	artifact: protect against unbounded artifact decompression (1.5.0) (#16151 ) * artifact: protect against unbounded artifact decompression Starting with 1.5.0, set defaut values for artifact decompression limits. artifact.decompression_size_limit (default "100GB") - the maximum amount of data that will be decompressed before triggering an error and cancelling the operation artifact.decompression_file_count_limit (default 4096) - the maximum number of files that will be decompressed before triggering an error and cancelling the operation. * artifact: assert limits cannot be nil in validation	2023-02-14 09:28:39 -06:00
Michael Schurter	35d65c7c7e	Dynamic Node Metadata (#15844 ) Fixes #14617 Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted. Includes new UI for editing metadata as well. --------- Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>	2023-02-07 14:42:25 -08:00
Seth Hoenig	c923bc59b1	e2e: mark framework package as deprecated (#16075 ) Nothing more motivating than lots of deprecation warnings to get some code refactored.	2023-02-07 08:10:40 -06:00
Michael Schurter	0a496c845e	Task API via Unix Domain Socket (#15864 ) This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled. This PR contains the core feature and tests but followup work is required for the following TODO items: - Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink - Unit tests for auth middleware - Caching for auth middleware - Rate limiting on negative lookups for auth middleware --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-02-06 11:31:22 -08:00
Charlie Voiselle	cc6f4719f1	Add option to expose workload token to task (#15755 ) Add `identity` jobspec block to expose workload identity tokens to tasks. --------- Co-authored-by: Anders <mail@anars.dk> Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2023-02-02 10:59:14 -08:00
Seth Hoenig	5f3bb0b197	bootstrap: upgrade golangci-lint in prep for go1.20 (#16024 ) This PR updates golangci-lint to work better with go1.20 - the previous version would cause in oom on 'make check'.	2023-02-02 09:44:12 -06:00

1 2 3 4 5 ...

633 Commits