open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	9cac60dbed	test: use port collision instead of cpu exhaustion (#14994 ) Originally this test relied on Job 1 blocking Job 2 until Job 1 had a terminal ClientStatus. Job 2 ensured it would get blocked using 2 mechanisms: 1. A constraint requiring it is placed on the same node as Job 1. 2. Job 2 would require all unreserved CPU on the node to ensure it would be blocked until Job 1's resources were free. That 2nd assertion breaks if any previous job is still running on the target node! That seems very likely to happen in the flaky world of our e2e tests. In fact there may be some jobs we intentionally want running throughout; in hindsight it was never safe to assume my test would be the only thing scheduled when it ran. Ports to the rescue! Reserving a static port means that both Job 2 will now block on Job 1 being terminal. It will only conflict with other tests if those tests use that port on every node. I ensured no existing tests were using the port I chose. Other changes: - Gave job a bit more breathing room resource-wise. - Tightened timings a bit since previous failure ran into the `go test` time limit. - Cleaned up the DumpEvals output. It's quite nice and handy now!	2022-10-21 07:53:26 -07:00
Seth Hoenig	e66c9ede24	e2e: convert flaky exec download in chroot unit test into e2e test (#14949 ) Similar to https://github.com/hashicorp/nomad/pull/14710, convert flaky test into e2e test.	2022-10-19 08:22:32 -05:00
Michael Schurter	01d90d18f6	test: expand timing and debugging for overlap test (#14920 ) attempt #9000	2022-10-18 13:02:18 -07:00
Michael Schurter	21eced0a4e	test: extend timing and output of overlap e2e test (#14894 ) Keeps failing in the nightly e2e test with unhelpful output like: ``` Failed === RUN TestOverlap overlap_test.go:92: Followup job overlap93ee1d2b blocked. Sleeping for the rest of overlap48c26c39's shutdown_delay (9.2/10s) overlap_test.go:105: 1500/2000 retries reached for github.com/hashicorp/nomad/e2e/overlap.TestOverlap (err=timed out before an allocation was found for overlap93ee1d2b) overlap_test.go:105: timeout: timed out before an allocation was found for overlap93ee1d2b --- FAIL: TestOverlap (38.96s) ``` I have not been able to replicate it in my own e2e cluster, so I added the EvalDump helper to add detailed eval information like: ``` === RUN TestOverlap 1/1 Job overlap7b0e90ec Eval c38c9919-a4f0-5baf-45f7-0702383c682a Type: service TriggeredBy: job-register Deployment: Status: pending () NextEval: PrevEval: BlockedEval: -- No placement failures -- QueuedAllocs: SnapshotIdx: 0 CreateIndex: 96 ModifyIndex: 96 ... ``` Hopefully helpful when debugging other tests as well!	2022-10-14 14:15:07 -07:00
Michael Schurter	bdb639b3e2	test: simplify overlap job placement logic (#14811 ) * test: simplify overlap job placement logic Trying to fix #14806 Both the previous approach as well as this one worked on e2e clusters I spun up. * simplify code flow	2022-10-12 11:21:28 -07:00
Giovani Avelar	a625de2062	Allow specification of a custom job name/prefix for parameterized jobs (#14631 )	2022-10-06 16:21:40 -04:00
James Rasell	0187240e7c	e2e: fixes the ordering on greater than checks within spread test. (#14818 )	2022-10-06 15:27:36 +02:00
James Rasell	67e8f85360	e2e: fix incorrect must function usage in namespace suite. (#14805 )	2022-10-05 15:50:56 +02:00
Michael Schurter	ed3218c3dd	Fixing flaky TestOverlap test (#14780 ) * test: ensure feasible node selected in overlap test * test: warn when getting close to retry limit	2022-10-03 14:35:02 -07:00
Seth Hoenig	7235d9988b	e2e: convert chroot env unit tests into e2e tests (#14710 ) This PR translates two of our most flakey unit tests into e2e tests where they are fit much more naturally.	2022-09-26 15:40:29 -05:00
Michael Schurter	6161b417f3	test: add e2e for non-overlapping placements (#14646 ) * test: add e2e for non-overlapping placements Followup to #10446 Fails (as expected) against 1.3.x at the wait for blocked eval (because the allocs are allowed to overlap). Passes against 1.4.0-beta.1 (as expected). * Update e2e/overlap/overlap_test.go Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2022-09-22 13:06:17 -07:00
Seth Hoenig	2088ca3345	cleanup more helper updates (#14638 ) * cleanup: refactor MapStringStringSliceValueSet to be cleaner * cleanup: replace SliceStringToSet with actual set * cleanup: replace SliceStringSubset with real set * cleanup: replace SliceStringContains with slices.Contains * cleanup: remove unused function SliceStringHasPrefix * cleanup: fixup StringHasPrefixInSlice doc string * cleanup: refactor SliceSetDisjoint to use real set * cleanup: replace CompareSliceSetString with SliceSetEq * cleanup: replace CompareMapStringString with maps.Equal * cleanup: replace CopyMapStringString with CopyMap * cleanup: replace CopyMapStringInterface with CopyMap * cleanup: fixup more CopyMapStringString and CopyMapStringInt * cleanup: replace CopySliceString with slices.Clone * cleanup: remove unused CopySliceInt * cleanup: refactor CopyMapStringSliceString to be generic as CopyMapOfSlice * cleanup: replace CopyMap with maps.Clone * cleanup: run go mod tidy	2022-09-21 14:53:25 -05:00
James Rasell	3f78a51fa5	e2e: use unique names for Connect ACL Consul policy names. (#14604 ) In the event a single test fails to clear up properly after itself, all other tests will fail as they attempt to create ACL policies with the same names. This change ensures they use unique ACL names, so when a single test fails, it is easy to identify it is a problem with the test rather than the suite.	2022-09-16 13:35:40 +02:00
James Rasell	90d0b9157f	e2e: rewrite spread suite to use new e2e style. (#14598 ) The rewrite refactors the suite to use the new style along with other recent testing improvements. In order to ensure the spread tests do not impact each other, there is new cleanup functionality to ensure both the job and allocations are removed from state before the test exits completely.	2022-09-15 17:12:20 +02:00
James Rasell	d65267c60c	e2e: do not assume clean cluster when checking return objects. (#14557 )	2022-09-13 14:25:19 +02:00
James Rasell	1f877bac1c	acl: fix encoding expiration time in ACL token list API. (#14542 )	2022-09-12 15:50:35 +02:00
James Rasell	6f790769bb	e2e: fixup service discovery and ACL expiration tests. (#14517 ) The NSD checks tests were racey, whereby the check may not have been triggered by the time it was queried. This change wraps the check so it can account for this. This removes the current ACL expiration GC section in order to get the tests passing and allow more time to investigate the test. I have full confidence the feature is working as expected and have tested extensively locally.	2022-09-09 14:27:40 +02:00
James Rasell	d14d6e051a	e2e: fixup token expiration test to account for longer forced GC. (#14491 )	2022-09-08 14:43:04 +02:00
James Rasell	e24de517fa	e2e: add test to exercise ACL tokens with role and policy links. (#14432 )	2022-09-02 08:56:00 +02:00
James Rasell	5d0cc93939	e2e: add acl test for token expiration. (#14418 ) In order to add an E2E test to cover token expiration, the server config has been updated to include a low minimum allowed TTL value. For ease of reading, the max value is also set.	2022-09-01 09:36:09 +02:00
James Rasell	5f3665230b	e2e: add ACL test suite with ACL Role test. (#14398 ) This adds a new ACL test suite to the e2e framework which includes an initial test for ACL roles. The ACL test includes a helper to track and clean created Nomad resources which keeps the test cluster clean no matter if the test fails early or not.	2022-08-31 10:11:28 +02:00
Seth Hoenig	38727b6ab9	e2e: add e2e tests for nomad service disco checks This PR adds 2 e2e tests for ensuring nomad service discovery checks get created and produce status results as expected.	2022-08-22 15:31:13 -05:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Tim Gross	6c080e0b10	e2e: move namespaces test out of legacy framework (#13934 ) This PR continues work we've started on other test suites to use the native golang test runner instead of the custom framework.	2022-08-01 13:24:34 -04:00
Seth Hoenig	634d84edec	e2e: add nsd simple load balancing test	2022-07-14 15:07:19 -05:00
James Rasell	17a467020c	e2e: add terraform init commands to readme doc. (#13655 )	2022-07-08 16:52:35 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Derek Strickland	34dea90d7a	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
James Rasell	c3c10d8c10	e2e: use longer wait in template update triggers to avoid flake. (#13271 )	2022-06-07 14:49:03 +02:00
Tim Gross	cc4a1f2ec4	e2e: upgrade playwright package and container image (#13080 ) The nightly playwright tests are currently failing because of a mismatch between the expected version of Chromium and what's in the container image. Unfortunately the previous specific tag we were using for the container image is no longer tagged on the registry. With some testing, I was able to find an image tag that results in a good run.	2022-05-20 08:41:07 -04:00
Derek Strickland	90daed7c1d	e2e: Wait for deployment to finish before disconnect (#12795 ) * Wait for deployment to finish * Don't reschedule disconnect or restart-node jobs	2022-04-27 12:27:03 -04:00
Tim Gross	c763c4cb96	remove pre-0.9 driver code and related E2E test (#12791 ) This test exercises upgrades between 0.8 and Nomad versions greater than 0.9. We have not supported 0.8.x in a very long time and in any case the test has been marked to skip because the downloader doesn't work.	2022-04-27 09:53:37 -04:00
Tim Gross	cfd353207f	E2E: move volume mounts test to use golang's stdlib test runner (#12788 ) Part of ongoing work to remove the old E2E framework code.	2022-04-26 14:28:20 -04:00
Tim Gross	83eb879d61	E2E: remove old CLI for driving provisioning (#12787 ) We moved off the old provisioning process for nightly E2E to one driven entirely by Terraform quite a while back now. We're in the slow process of removing the framework code for this test-by-test, but this chunk of code no longer has any callers.	2022-04-26 13:43:25 -04:00
Tim Gross	f7d6841dd2	E2E: remove platform specific realpath code from UI run script (#12750 ) We don't need the absolute path for any of the commands in this script so long as we `cd` into the source directory path. Doing this removes the need for weird platform-specific tricks we have to do with realpath vs GNU realpath.	2022-04-22 10:10:18 -04:00
Tim Gross	7dd3910e51	E2E: fix debug logging on disconnected clients test (#12621 )	2022-04-22 09:07:05 -04:00
Tim Gross	d200a66509	E2E: make UIs runnable from any working directory (#12739 ) The E2E test runner is running from the root of the Nomad repository. Make this run independent of the working directory for convenience of developers and the test runner.	2022-04-21 17:00:01 -04:00
Tim Gross	dc013b5267	E2E: set longer timeout for CSI plugin alloc start (#12732 ) The CSI plugin allocations take a while to be marked healthy, sometimes causing E2E test flakes during the setup phase of the tests. There's nothing CSI specific about marking plugin allocs healthy, as the plugin supervisor hook does all the fingerprinting in the postrun hook (the prestart hook just makes a couple of empty directories). The timeouts we're seeing may be because of where we're pulling the images from; most our jobs pull from a CDN-backed public registry whereas these are pulling from ECR. Set a 1min timeout for these to make sure we have enough time to pull the image and start the task.	2022-04-21 11:11:43 -04:00
Tim Gross	2ad9f6bc5f	E2E: playwright configuration and smoke test (#12721 ) Scripts for running playwright tests in a Docker container that has chromium and webkit preinstalled. Includes a basic smoke test for authentication so that we can be sure the test rig is working end-to-end. Wiring this up in CI will be in an upcoming PR.	2022-04-21 09:13:10 -04:00
Tim Gross	c4d92205b4	E2E: provide options for reverse proxy for web UI (#12671 ) Our E2E test environment is deployed with mTLS, but it's impractical for us to use mTLS in headless browsers for automated testing (or even in manual testing). Provide certificates for proxying the web UI via Nginx. This proxy uses client certs for proxying to the HTTP endpoint and a self-signed cert for the browser-facing endpoint. We can accept certificate errors in the automated tests we'll be adding in the next step of this work.	2022-04-19 16:55:05 -04:00
Tim Gross	70c262eb95	E2E: terraform provisioner upgrades (#12652 ) While working on infrastructure for testing the UI in E2E, we needed to upgrade the certificate provider. Performing a provider upgrade via the TF `init -upgrade` brought in updates for the file and AWS providers as well. These updates include deprecating the use of `sensitive_content` fields, removing CA algorithm parameters that can be inferred from keys, and removing the requirement to manually specify AWS assume role parameters in the provider config if they're available in the calling environment's AWS config file (as they are via doormat or our E2E environment).	2022-04-19 14:27:14 -04:00
Tim Gross	d62dd5b3fe	E2E: add debugging outputs for disconnected clients test (#12572 ) This test has a failure that's happening only occassionally and not very reproducibly. Print out the allocation status on test failure so that we can do some post-mortum debugging of the test on nightly.	2022-04-14 17:03:57 -04:00
Derek Strickland	3f871973f9	Update E2E terraform output command (#12561 )	2022-04-13 16:46:09 -04:00
Tim Gross	4078e6ea0e	scripts: fix interpreter for bash (#12549 ) Many of our scripts have a non-portable interpreter line for bash and use bash-specific variables like `BASH_SOURCE`. Update the interpreter line to be portable between various Linuxes and macOS without complaint from posix shell users.	2022-04-12 10:08:21 -04:00
Tim Gross	31e72e93ff	E2E: fix flaky event stream test (#12548 ) This changeset fixes two sources of flakiness in the event stream test. First, the stream request gets the event closest to the index, not the exact match. Although events are written before raft entries they're written asynchronously, so it's possible to race and get a raft index from this query higher than the current head of the event buffer. Ensure the job is running before we try to get the index, so that we've given the event enough time to land in the buffer. Second, the assertion that the found index is greater than the start index is only true if the `PlanResult` event manages to land before we do the second registration. Although it should now with the first fix above, it's not a correct assertion for what we're testing.	2022-04-12 08:35:39 -04:00
Tim Gross	77ab8d92f1	E2E: oversubscription assertion needs to wait for stats (#12540 ) The oversubscription test expects an output that requires the client has polled the task for stats at least once. Wait long enough to ensure that we've polled the stats before failing the test.	2022-04-11 11:40:51 -04:00
Tim Gross	c9c3cbd878	E2E: test for nodes disconnected by netsplit (#12407 )	2022-04-11 11:34:27 -04:00
James Rasell	bc800a18d1	e2e: add initial service discovery tests. (#12512 ) Some tests may chose to deregister jobs to check Nomad cleanup logic, however, it is still possible for the test to fail and exit before this is hit. This therefore adds a cancellable cleanup func which can be deferred, using context to control whether it gets run or not.	2022-04-11 11:12:24 +02:00
James Rasell	dbf28a06c1	e2e: fix eventual consistency failure within consultemplate suite. (#12494 )	2022-04-07 17:03:10 +02:00

1 2 3 4 5 ...

566 Commits