open-nomad

Author	SHA1	Message	Date
Buck Doyle	528b13f69d	Fix audit workflow action versions (#9877 ) This fixes the version reference error seen in this workflow failure: https://github.com/hashicorp/nomad/actions/runs/504695096 I’ve also included an update to the sticky comment action version to address this warning in the above link: marocchino/sticky-pull-request-comment@33a6cfb looks like the shortened version of a commit SHA. Referencing actions by the short SHA will be disabled soon. Please see https://docs.github.com/en/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actions. We were previously using 33a6cfb after the maintainer merged my PR to allow the comment to be read from a file, there was no released version with that, but it’s now included in v2.0.0.	2021-01-26 09:06:22 -06:00
Mahmood Ali	1ac8b32e08	e2e: Disable Connect tests The connect tests are very disruptive: restart consul/nomad agents with new tokens. The test seems particularly flaky, failing 32 times out of 73 in my sample. The tests are particularly problematic because they are disruptive and affect other tests. On failure, the nomad or consul agent on the client can get into a wedged state, so health/deployment info in subsequent tests may be wrong. In some cases, the node will be deemed as fail, and then the subsequent tests may fail when the node is deemed lost and the test allocations get migrated unexpectedly.	2021-01-26 10:01:14 -05:00
Mahmood Ali	36ce1e73eb	e2e: deflake nodedrain test The nodedrain deadline test asserts that all allocations are migrated by the deadline. However, when the deadline is short (e.g. 10s), the test may fail because of scheduler/client-propagation delays. In one failing test, it took ~15s from the RPC call to the moment to the moment the scheduler issued migration update, and then 3 seconds for the alloc to be stopped. Here, I increase the timeouts to avoid such false positives.	2021-01-26 10:01:14 -05:00
Mahmood Ali	cf8f6f07d7	e2e: vault increase timeout Increase the timeout for vaultsecrets. As the default interval is 0.1s, 10 retries mean it only retries for one second, a very short time for some waiting scenarios in the test (e.g. starting allocs, etc).	2021-01-26 10:01:14 -05:00
Mahmood Ali	94ad40907c	e2e: prefer testutil.WaitForResultRetries Prefer testutil.WaitForResultRetries that emits more descriptive errors on failures. `require.Evatually` fails with opaque "Condition never satisfied" error message.	2021-01-26 10:01:14 -05:00
Mahmood Ali	f3f8f15b7b	e2e: special case "Unexpected EOF" errors This is an attempt at deflaking the e2e exec tests, and a way to improve messages. e2e occasionally fail with "unexpected EOF" even though the exec output matches expectations. I suspect there is a race in handling EOF in server/http handling. Here, we special case this error and ensures we get all failures, to help debug the case better.	2021-01-26 10:01:14 -05:00
Mahmood Ali	925d9ce952	e2e: tweak failure messages Tweak the error messages for the flakiest tests, so that on test failure, we get more output	2021-01-26 09:16:48 -05:00
Mahmood Ali	6aa3dec6cc	e2e: use testify requires instead of t.Fatal testify requires offer better error message that is easier to notice when seeing a wall of text in the builds.	2021-01-26 09:14:47 -05:00
Mahmood Ali	236b4055a7	e2e: deflake consul/CheckRestart test Ensure we pass the alloc ID to status. Otherwise, the test may fail if there is another spurious allocation running from another test.	2021-01-26 09:12:20 -05:00
Mahmood Ali	0aafd9af64	e2e: Fix build script and pass shellcheck	2021-01-26 09:11:37 -05:00
James Rasell	7f8ebb5d10	Merge pull request #9888 from hashicorp/f-docs-gh-9842 docs: clarify where variables can be placed with HCLv2.	2021-01-26 14:33:18 +01:00
James Rasell	9c0c75b226	docs: clarify where variables can be placed with HCLv2.	2021-01-26 12:29:58 +01:00
Michael Lange	9395f16da6	Test coverage for the topology info panel. This fixes a couple bugs 1. Overreporting resources reserved due to counting terminal allocs 2. Overreporting unique client placements due to uniquing on object refs instead of on client ID.	2021-01-25 19:01:11 -08:00
Michael Lange	7d998745ed	Clamp widths at zero to prevent negative width warnings This would only ever realistically happen with fixture data, but still good to not have these warnings.	2021-01-25 18:59:55 -08:00
Michael Lange	93195f8e12	Only count the scheduled allocs on the topo viz node stats bar	2021-01-25 11:29:01 -08:00
Mahmood Ali	4397eda209	Merge pull request #9798 from hashicorp/e2e-terraform-tweaks-20200113 This PR makes two ergonomics changes, meant to get e2e builds more reproducible and ease changes. ### AMI Management First, we pin the server AMIs to the commits associated with the build. No more using the latest AMI a developer build in a test branch, or accidentally using a stale AMI because we forgot to build one! Packer is to tag the AMI images with the commit sha used to generate the image, and then Terraform would look up only the AMIs associated with that sha. To minimize churn, we use the SHA associated with the latest Packer configurations, rather than SHA of all. This has few benefits: reproducibility and avoiding accidental AMI changes and contamination of changes across branches. Also, the change is a stepping stone to an e2e pipeline that builds new AMIs automatically if Packer files changed. The downside is that new AMIs will be generated even for irrelevant changes (e.g. spelling, commits), but I suspect that's OK. Also, an engineer will be forced to build the AMI whenever they change Packer files while iterating on e2e scripts; this hasn't been an issue for me yet, and I'll be open for iterating on that later if it proves to be an issue. ### Config Files and Packer Second, this PR moves e2e config hcl management to Terraform instead of Packer. Currently, the config files live in `./terraform/config`, but they are baked into the servers by Packer and changes are ignored. This current behavior surprised me, as I spent a bit of time debugging why my config changes weren't applied. Having Terraform manage them would ease engineer's iteration. Also, make Packer management more consistent (Packer only works `e2e/terraform/packer`), and easing the logic for AMI change detection. The config directory is very small (100KB), and having it as an upload step adds negligible time to `terraform apply`.	2021-01-25 13:20:28 -05:00
Seth Hoenig	7597f9afea	Merge pull request #9829 from hashicorp/f-terminating-gateway consul/connect: Add support for Connect terminating gateways	2021-01-25 10:56:19 -06:00
Mahmood Ali	39da228964	update readme about profiles and packer build	2021-01-25 11:40:26 -05:00
Seth Hoenig	720780992c	consul/connect: copy bind address map if empty This parameter is now supposed to be non-nil even if empty, and the Copy method should also maintain that invariant.	2021-01-25 10:36:04 -06:00
Seth Hoenig	1ad219c441	consul/connect: remove debug line	2021-01-25 10:36:04 -06:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Drew Bailey	007158ee75	ignore setting job summary when oldstatus == newstatus (#9884 )	2021-01-25 10:34:27 -05:00
Steven Collins	e9f91c1d56	Adds community USB plugin to documentation site	2021-01-25 10:15:36 -05:00
zzhai	899334f2f0	Update syntax.mdx "one label" should be the singular form.	2021-01-25 08:42:59 -05:00
Jeff Escalante	5859ecd806	update dependencies, removed unused dependency (#9878 )	2021-01-22 21:24:26 -05:00
Michael Lange	5243428343	Merge pull request #9876 from hashicorp/b-ui/default-namespace-casing UI: Use the same prefix pattern for both the region switcher and the namespace switcher	2021-01-22 13:43:54 -08:00
Michael Lange	886b1b4384	Clip long namespace names but make sure to keep the full name in the title attribute	2021-01-22 13:18:15 -08:00
Michael Lange	875de74503	Use the same prefix pattern from the region switcher for the namespace switcher	2021-01-22 13:18:15 -08:00
Tim Gross	45a45ebb3f	changelog entry	2021-01-22 13:41:28 -05:00
Tim Gross	987cdb3a69	prefer TrimPrefix to checking HasPrefix first	2021-01-22 13:41:28 -05:00
Huan Wang	ba8b2297b1	fix the inconsistency handling between infra image and normal task image	2021-01-22 13:41:28 -05:00
Tim Gross	555d031283	docs: check_restart is now supported for group services	2021-01-22 10:55:40 -05:00
Tim Gross	0b49e3da12	e2e: added tests for check restart behavior	2021-01-22 10:55:40 -05:00
Tim Gross	64449cddc1	implement alloc runner task restart hook Most allocation hooks don't need to know when a single task within the allocation is restarted. The check watcher for group services triggers the alloc runner to restart all tasks, but the alloc runner's `Restart` method doesn't trigger any of the alloc hooks, including the group service hook. The result is that after the first time a check triggers a restart, we'll never restart the tasks of an allocation again. This commit adds a `RunnerTaskRestartHook` interface so that alloc runner hooks can act if a task within the alloc is restarted. The only implementation is in the group service hook, which will force a re-registration of the alloc's services and fix check restarts.	2021-01-22 10:55:40 -05:00
Drew Bailey	bae0c6cd20	changelog entry for 9768 (#9873 )	2021-01-22 09:22:02 -05:00
Drew Bailey	630babb886	prevent double job status update (#9768 ) * Prevent Job Statuses from being calculated twice https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval insertion iwth job (de-)registration. This change removes a now obsolete guard which checked if the index was equal to the job.CreateIndex, which would empty the status. Now that the job regisration eval insetion is atomic with the registration this check is no longer necessary to set the job statuses correctly. * test to ensure only single job event for job register * periodic e2e * separate job update summary step * fix updatejobstability to use copy instead of modified reference of job * update envoygatewaybindaddresses copy to prevent job diff on null vs empty * set ConsulGatewayBindAddress to empty map instead of nil fix nil assertions for empty map rm unnecessary guard	2021-01-22 09:18:17 -05:00
Kris Hicks	8f9e47a8e7	Clean up Task Validation tests (#9833 ) Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>	2021-01-21 11:53:02 -08:00
Kris Hicks	7694a66414	Don't prepend https to docker cred helper call (#9852 ) Some credential helpers, like the ECR helper, will strip the protocol if given. Others, like the linux "pass" helper, do not.	2021-01-21 11:46:59 -08:00
Charlie Voiselle	4f4d6e6c37	Enable network namespaces for QEMU driver (#9861 ) * Enable network namespaces for QEMU driver * Add CHANGELOG entry	2021-01-21 14:05:46 -05:00
Mahmood Ali	1c6f4549ff	Merge pull request #9868 from hashicorp/e2e-tests-20200121 Deflake some e2e tests	2021-01-21 12:02:52 -05:00
Mahmood Ali	9dcdafe4cf	e2e: show command output on failure When a command fails, it's nice to have the full output, as it contains diagnostic information. The status code isn't sufficient for debugging.	2021-01-21 10:32:16 -05:00
Mahmood Ali	f03e67712a	docs: remove timestamp hcl2 function (#9867 ) timestamp isn't actually implemented	2021-01-21 10:29:50 -05:00
Mahmood Ali	923725bf3d	e2e: deflake TestVolumeMounts After submitting an update, the test ought to wait until the new allocations are placed. Previously, we'd use the original to-be-stopped allocations and the test fails when attempting to exec.	2021-01-21 10:28:41 -05:00
Mahmood Ali	95b7fc80b8	e2e deflake namespaces: only check namespace jobs Deflake namespace e2e test by only asserting on jobs related to the namespace tests. During our e2e tests, some left over jobs (e.g. prometheus) are left running while being shutdown and cause the test to fail.	2021-01-21 10:26:24 -05:00
Mahmood Ali	2e8bcac261	e2e: deflake events Handle streamCh channel being closed.	2021-01-21 10:25:42 -05:00
Buck Doyle	27f73f2b7b	Change to fork of audit to log flaky tests (#9518 ) This will report the names of flaky tests instead of just counting them.	2021-01-21 08:25:16 -06:00
Dennis Schön	3eaf1432aa	validate connect block allowed only within group.service	2021-01-20 14:34:23 -05:00
Drew Bailey	3099eb0c73	fix changelog date (#9862 )	2021-01-20 13:14:21 -05:00
Seth Hoenig	08e323b753	Merge pull request #9849 from hashicorp/b-cc-ig-id consul/connect: Enable running multiple ingress gateways per Nomad agent	2021-01-20 10:08:14 -06:00
Seth Hoenig	53218716b3	docs: fix typo in changelog Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-01-20 09:50:59 -06:00

1 2 3 4 5 ...

20622 commits