open-nomad

Commit Graph

Author	SHA1	Message	Date
James Rasell	f95e45b80c	e2e: account for race condition in periodic dispatch test.	2021-02-11 11:08:48 +01:00
Seth Hoenig	7d6e81e9e4	Merge pull request #9990 from hashicorp/f-nsiso-task drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior	2021-02-09 13:29:14 -06:00
Seth Hoenig	45e0e70a50	consul/connect: enable custom sidecars to use expose checks This PR enables jobs configured with a custom sidecar_task to make use of the `service.expose` feature for creating checks on services in the service mesh. Before we would check that sidecar_task had not been set (indicating that something other than envoy may be in use, which would not support envoy's expose feature). However Consul has not added support for anything other than envoy and probably never will, so having the restriction in place seems like an unnecessary hindrance. If Consul ever does support something other than Envoy, they will likely find a way to provide the expose feature anyway. Fixes #9854	2021-02-09 10:49:37 -06:00
Seth Hoenig	8ee9835923	drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior This PR adds pid_mode and ipc_mode options to the exec and java task driver config options. By default these will defer to the default_pid_mode and default_ipc_mode agent plugin options created in #9969. Setting these values to "host" mode disables isolation for the task. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9970	2021-02-08 14:26:35 -06:00
Chris Baker	b1bb8a760e	e2e packer build: upgrade jdk to java 14	2021-02-02 17:33:48 +00:00
Chris Baker	ce68ee164b	Version 1.0.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/ X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg= =2g3c -----END PGP SIGNATURE----- Merge tag 'v1.0.3' into post-release-1.0.3 Version 1.0.3	2021-01-29 19:30:08 +00:00
Chris Baker	2632b81124	lint some nomad HCL job specs	2021-01-28 12:03:19 +00:00
Chris Baker	2adf0f12d6	e2e: java driver isolation tests	2021-01-28 12:03:19 +00:00
Chris Baker	aa55df0413	additional e2e utils for multi-task allocs	2021-01-28 12:03:19 +00:00
Kris Hicks	d67b77f38e	Add a little comment	2021-01-28 12:03:19 +00:00
Kris Hicks	5cf972d2e7	Add test for alloc exec	2021-01-28 12:03:19 +00:00
Kris Hicks	2db8aa2a52	Add e2e test for raw exec	2021-01-28 12:03:19 +00:00
Kris Hicks	87188f04de	Add PID namespacing and e2e test	2021-01-28 12:03:19 +00:00
Mahmood Ali	c92bb342e1	e2e: skip node drain deadline/force tests	2021-01-27 08:42:16 -05:00
Mahmood Ali	b12e8912a9	e2e: use f.NoError instead of requires	2021-01-27 08:36:23 -05:00
Mahmood Ali	1ac8b32e08	e2e: Disable Connect tests The connect tests are very disruptive: restart consul/nomad agents with new tokens. The test seems particularly flaky, failing 32 times out of 73 in my sample. The tests are particularly problematic because they are disruptive and affect other tests. On failure, the nomad or consul agent on the client can get into a wedged state, so health/deployment info in subsequent tests may be wrong. In some cases, the node will be deemed as fail, and then the subsequent tests may fail when the node is deemed lost and the test allocations get migrated unexpectedly.	2021-01-26 10:01:14 -05:00
Mahmood Ali	36ce1e73eb	e2e: deflake nodedrain test The nodedrain deadline test asserts that all allocations are migrated by the deadline. However, when the deadline is short (e.g. 10s), the test may fail because of scheduler/client-propagation delays. In one failing test, it took ~15s from the RPC call to the moment to the moment the scheduler issued migration update, and then 3 seconds for the alloc to be stopped. Here, I increase the timeouts to avoid such false positives.	2021-01-26 10:01:14 -05:00
Mahmood Ali	cf8f6f07d7	e2e: vault increase timeout Increase the timeout for vaultsecrets. As the default interval is 0.1s, 10 retries mean it only retries for one second, a very short time for some waiting scenarios in the test (e.g. starting allocs, etc).	2021-01-26 10:01:14 -05:00
Mahmood Ali	94ad40907c	e2e: prefer testutil.WaitForResultRetries Prefer testutil.WaitForResultRetries that emits more descriptive errors on failures. `require.Evatually` fails with opaque "Condition never satisfied" error message.	2021-01-26 10:01:14 -05:00
Mahmood Ali	f3f8f15b7b	e2e: special case "Unexpected EOF" errors This is an attempt at deflaking the e2e exec tests, and a way to improve messages. e2e occasionally fail with "unexpected EOF" even though the exec output matches expectations. I suspect there is a race in handling EOF in server/http handling. Here, we special case this error and ensures we get all failures, to help debug the case better.	2021-01-26 10:01:14 -05:00
Mahmood Ali	925d9ce952	e2e: tweak failure messages Tweak the error messages for the flakiest tests, so that on test failure, we get more output	2021-01-26 09:16:48 -05:00
Mahmood Ali	6aa3dec6cc	e2e: use testify requires instead of t.Fatal testify requires offer better error message that is easier to notice when seeing a wall of text in the builds.	2021-01-26 09:14:47 -05:00
Mahmood Ali	236b4055a7	e2e: deflake consul/CheckRestart test Ensure we pass the alloc ID to status. Otherwise, the test may fail if there is another spurious allocation running from another test.	2021-01-26 09:12:20 -05:00
Mahmood Ali	0aafd9af64	e2e: Fix build script and pass shellcheck	2021-01-26 09:11:37 -05:00
Mahmood Ali	4397eda209	Merge pull request #9798 from hashicorp/e2e-terraform-tweaks-20200113 This PR makes two ergonomics changes, meant to get e2e builds more reproducible and ease changes. ### AMI Management First, we pin the server AMIs to the commits associated with the build. No more using the latest AMI a developer build in a test branch, or accidentally using a stale AMI because we forgot to build one! Packer is to tag the AMI images with the commit sha used to generate the image, and then Terraform would look up only the AMIs associated with that sha. To minimize churn, we use the SHA associated with the latest Packer configurations, rather than SHA of all. This has few benefits: reproducibility and avoiding accidental AMI changes and contamination of changes across branches. Also, the change is a stepping stone to an e2e pipeline that builds new AMIs automatically if Packer files changed. The downside is that new AMIs will be generated even for irrelevant changes (e.g. spelling, commits), but I suspect that's OK. Also, an engineer will be forced to build the AMI whenever they change Packer files while iterating on e2e scripts; this hasn't been an issue for me yet, and I'll be open for iterating on that later if it proves to be an issue. ### Config Files and Packer Second, this PR moves e2e config hcl management to Terraform instead of Packer. Currently, the config files live in `./terraform/config`, but they are baked into the servers by Packer and changes are ignored. This current behavior surprised me, as I spent a bit of time debugging why my config changes weren't applied. Having Terraform manage them would ease engineer's iteration. Also, make Packer management more consistent (Packer only works `e2e/terraform/packer`), and easing the logic for AMI change detection. The config directory is very small (100KB), and having it as an upload step adds negligible time to `terraform apply`.	2021-01-25 13:20:28 -05:00
Mahmood Ali	39da228964	update readme about profiles and packer build	2021-01-25 11:40:26 -05:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Tim Gross	0b49e3da12	e2e: added tests for check restart behavior	2021-01-22 10:55:40 -05:00
Drew Bailey	630babb886	prevent double job status update (#9768 ) * Prevent Job Statuses from being calculated twice https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval insertion iwth job (de-)registration. This change removes a now obsolete guard which checked if the index was equal to the job.CreateIndex, which would empty the status. Now that the job regisration eval insetion is atomic with the registration this check is no longer necessary to set the job statuses correctly. * test to ensure only single job event for job register * periodic e2e * separate job update summary step * fix updatejobstability to use copy instead of modified reference of job * update envoygatewaybindaddresses copy to prevent job diff on null vs empty * set ConsulGatewayBindAddress to empty map instead of nil fix nil assertions for empty map rm unnecessary guard	2021-01-22 09:18:17 -05:00
Mahmood Ali	9dcdafe4cf	e2e: show command output on failure When a command fails, it's nice to have the full output, as it contains diagnostic information. The status code isn't sufficient for debugging.	2021-01-21 10:32:16 -05:00
Mahmood Ali	923725bf3d	e2e: deflake TestVolumeMounts After submitting an update, the test ought to wait until the new allocations are placed. Previously, we'd use the original to-be-stopped allocations and the test fails when attempting to exec.	2021-01-21 10:28:41 -05:00
Mahmood Ali	95b7fc80b8	e2e deflake namespaces: only check namespace jobs Deflake namespace e2e test by only asserting on jobs related to the namespace tests. During our e2e tests, some left over jobs (e.g. prometheus) are left running while being shutdown and cause the test to fail.	2021-01-21 10:26:24 -05:00
Mahmood Ali	2e8bcac261	e2e: deflake events Handle streamCh channel being closed.	2021-01-21 10:25:42 -05:00
Seth Hoenig	991884e715	consul/connect: Enable running multiple ingress gateways per Nomad agent Connect ingress gateway services were being registered into Consul without an explicit deterministic service ID. Consul would generate one automatically, but then Nomad would have no way to register a second gateway on the same agent as it would not supply 'proxy-id' during envoy bootstrap. Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap. Fixes #9834	2021-01-19 12:58:36 -06:00
Mahmood Ali	76ce6306a4	add helper for building ami	2021-01-15 10:49:13 -05:00
Mahmood Ali	e51651c34a	set sha	2021-01-15 10:49:13 -05:00
Mahmood Ali	82637715cf	change ami naming	2021-01-15 10:49:12 -05:00
Mahmood Ali	0af1509a77	move config files to terraform	2021-01-15 10:49:12 -05:00
Seth Hoenig	536747f216	e2e: use jobspec2 Parse for parsing jobfile in e2e utils We directly parse job files in e2eutil, but currently using jobspec package. Instead, use the Parse method from the jobspec2 package so we can parse job files with new features.	2021-01-13 14:00:40 -06:00
James Rasell	d6cab8aa14	Merge pull request #9767 from hashicorp/f-e2e-job-scaling-suite e2e: add job scaling test suite.	2021-01-11 18:35:07 +01:00
Seth Hoenig	64a8b795f2	Merge pull request #9766 from hashicorp/f-bump-cni-plugins-version cni: bump CNI plugins version to v0.9.0	2021-01-11 09:59:43 -06:00
Tim Gross	f97505e384	e2e: remove deprecated terraform syntax Also bumps patch versions of some TF modules	2021-01-11 08:25:22 -05:00
James Rasell	4374d99071	e2e: add job scaling test suite.	2021-01-11 11:34:19 +01:00
Seth Hoenig	fc5f48d936	cni: bump CNI version to v0.9.0 https://github.com/containernetworking/plugins/releases/tag/v0.9.0 Also make the copy-paste install instructions work with arm64 for a better OOTB experience (AWS Graviton, Pi 4's).	2021-01-10 18:03:27 -06:00
James Rasell	108fa33393	Merge pull request #9747 from hashicorp/f-e2e-scaling-policy-suite e2e: add ScalingPolicies test suite with initial test case.	2021-01-08 10:51:48 +01:00
James Rasell	b087d68736	e2e: add ScalingPolicies test suite with initial test case.	2021-01-07 14:39:55 +01:00
James Rasell	02b9d9da87	e2e: move namespace tests into OSS.	2021-01-07 09:15:43 +01:00
Seth Hoenig	7da808b43a	e2e: add terraform lockfile Terraform v0.14 is producing a lockfile after running `terraform init`. The docs suggest we should include this file in the git repository: > You should include this file in your version control repository so > that you can discuss potential changes to your external dependencies > via code review, just as you would discuss potential changes to your > configuration itself. Sounds similar to go.sum https://www.terraform.io/docs/configuration/dependency-lock.html#lock-file-location	2021-01-05 08:55:37 -06:00
Seth Hoenig	59f230714f	e2e: add e2e test for service registration	2021-01-05 08:48:12 -06:00
Chris Baker	57b70a27ec	modified e2e test so that it explicitly tested the use case in #6929	2021-01-04 22:25:39 +00:00

1 2 3 4 5 ...

418 Commits