open-nomad

Author	SHA1	Message	Date
Seth Hoenig	4644bb9941	Merge pull request #12736 from hashicorp/build-update-go-1.17.9 build: update golang to 1.17.9	2022-04-21 12:13:07 -05:00
Seth Hoenig	c5ce4927c5	build: update golang version script to use .go-version file	2022-04-21 12:10:14 -05:00
Seth Hoenig	382087a6df	Merge pull request #12737 from hashicorp/buid-update-ec2-instances build: update ec2 instance profiles	2022-04-21 11:57:40 -05:00
Seth Hoenig	c87bfe398f	build: update ec2 instance profiles using tools/ec2info	2022-04-21 11:47:40 -05:00
Seth Hoenig	bf54ef26be	build: update golang to 1.17.9	2022-04-21 11:43:01 -05:00
Tim Gross	140dbab832	docker: back out cgroup v2 OOM detection (#12735 ) When shutting down an allocation that ends up needing to be force-killed, we're getting a spurious "OOM Killed (137)" message on the task termination event. We introduced this as part of cgroups v2 support because the Docker daemon isn't detecting the container status correctly. Although exit code 137 is the exit code we get for OOM-killed processes, that's because OOM kill is a `SIGKILL`. So any sigkilled process will get that exit code.	2022-04-21 12:31:34 -04:00
Tim Gross	dc013b5267	E2E: set longer timeout for CSI plugin alloc start (#12732 ) The CSI plugin allocations take a while to be marked healthy, sometimes causing E2E test flakes during the setup phase of the tests. There's nothing CSI specific about marking plugin allocs healthy, as the plugin supervisor hook does all the fingerprinting in the postrun hook (the prestart hook just makes a couple of empty directories). The timeouts we're seeing may be because of where we're pulling the images from; most our jobs pull from a CDN-backed public registry whereas these are pulling from ECR. Set a 1min timeout for these to make sure we have enough time to pull the image and start the task.	2022-04-21 11:11:43 -04:00
James Rasell	716b8e658b	api: Add support for filtering and pagination to the node list endpoint (#12727 )	2022-04-21 17:04:33 +02:00
Tim Gross	79a9d788d2	docs: fix broken link from `template` to client config (#12733 )	2022-04-21 11:04:04 -04:00
Derek Strickland	5e309f3f33	reconciler: Handle canaries when client disconnects (#12539 ) * plan_apply: Allow node updates in disconnected node plans * plan: Keep the job when persisting unknown allocs * reconciler: stop unknown allocs when stopping all * reconcile_util: reorder filtering to handle canaries; skip rescheduling unknown * heartbeat: Fix bug in node heartbeating	2022-04-21 10:05:58 -04:00
Tim Gross	2ad9f6bc5f	E2E: playwright configuration and smoke test (#12721 ) Scripts for running playwright tests in a Docker container that has chromium and webkit preinstalled. Includes a basic smoke test for authentication so that we can be sure the test rig is working end-to-end. Wiring this up in CI will be in an upcoming PR.	2022-04-21 09:13:10 -04:00
James Rasell	c4195c452a	docs: update HCL2 dynamic example to use block with label. (#12715 )	2022-04-21 10:18:04 +02:00
James Rasell	257e1c4f96	autopilot: correctly return errors within state functions. (#12714 )	2022-04-21 08:54:50 +02:00
Luiz Aoqui	bf7110b4c8	ui: fix bug that prevented files streaming (#12719 ) During the Ember dependecy upgrade work, https://github.com/hashicorp/nomad/commit/ce8c039f4ce7359d60ede5dee36b9cef82 moved the `isSupported` method from using Ember's `reopenClass` to a getter, but `reopenClass` creates a static method, so the getter must be static as well.	2022-04-20 14:39:18 -04:00
Gowtham	1ff8b5f759	Add Concurrent Download Support for artifacts (#11531 ) * add concurrent download support - resolves #11244 * format imports * mark `wg.Done()` via `defer` * added tests for successful and failure cases and resolved some goleak * docs: add changelog for #11531 * test typo fixes and improvements Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-20 10:15:56 -07:00
James Rasell	010acce59f	job_hooks: add implicit constraint when using Consul for services. (#12602 )	2022-04-20 14:09:13 +02:00
James Rasell	42068f8823	client: add NOMAD_SHORT_ALLOC_ID allocation env var. (#12603 )	2022-04-20 10:30:48 +02:00
Tim Gross	c4d92205b4	E2E: provide options for reverse proxy for web UI (#12671 ) Our E2E test environment is deployed with mTLS, but it's impractical for us to use mTLS in headless browsers for automated testing (or even in manual testing). Provide certificates for proxying the web UI via Nginx. This proxy uses client certs for proxying to the HTTP endpoint and a self-signed cert for the browser-facing endpoint. We can accept certificate errors in the automated tests we'll be adding in the next step of this work.	2022-04-19 16:55:05 -04:00
Tim Gross	70c262eb95	E2E: terraform provisioner upgrades (#12652 ) While working on infrastructure for testing the UI in E2E, we needed to upgrade the certificate provider. Performing a provider upgrade via the TF `init -upgrade` brought in updates for the file and AWS providers as well. These updates include deprecating the use of `sensitive_content` fields, removing CA algorithm parameters that can be inferred from keys, and removing the requirement to manually specify AWS assume role parameters in the provider config if they're available in the calling environment's AWS config file (as they are via doormat or our E2E environment).	2022-04-19 14:27:14 -04:00
Seth Hoenig	8084dd29a1	Merge pull request #12604 from hashicorp/b-fixup-chroot-test ci: fixup task runner chroot test	2022-04-19 12:58:03 -05:00
Seth Hoenig	46066fb7fb	Merge pull request #12622 from hashicorp/b-fix-docker-logger-test ci: fix docker logger not supported test	2022-04-19 12:57:47 -05:00
Seth Hoenig	d1bda4a954	ci: fixup task runner chroot test This PR is 2 fixes for the flaky TestTaskRunner_TaskEnv_Chroot test. And also the TestTaskRunner_Download_ChrootExec test. - Use TinyChroot to stop copying gigabytes of junk, which causes GHA to fail to create the environment in time. - Pre-create cgroups on V2 systems. Normally the cgroup directory is managed by the cpuset manager, but that is not active in taskrunner tests, so create it by hand in the test framework.	2022-04-19 10:37:46 -05:00
Seth Hoenig	16cab10346	ci: fix docker logger not supported test This test checks for behavior when asking for logs of a docker task configured with a log driver that does not support streaming logs. Previously this was using the 'gelf' log driver, but it seems that no longer returns an error as expected. Instead we can just use the 'none' log driver, which has the desired effect 2022-04-19T10:23:19.129-0500 [ERROR] docklog/docker_logger.go:133: log streaming ended with terminal error: error="API error (501): configured logging driver does not support reading"	2022-04-19 10:27:01 -05:00
Luiz Aoqui	8dccc48f17	changelog: fix entry for #11927 (#12577 )	2022-04-19 10:46:25 -04:00
Luiz Aoqui	950a2109aa	changelog: add entry for #11944 (#12578 )	2022-04-19 10:46:11 -04:00
Seth Hoenig	411158acff	Merge pull request #12586 from hashicorp/f-local-si-token connect: create SI tokens in local scope	2022-04-19 07:53:01 -05:00
Seth Hoenig	a7950e5624	cl: add missing prefix	2022-04-19 07:48:56 -05:00
Derek Strickland	7c6eb47b78	`consul-template`: revert `function_denylist` logic (#12071 ) * consul-template: replace config rather than append Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-04-18 13:57:56 -04:00
chavacava	eb1c42e643	QueryOptions.SetTimeToBlock should take pointer receiver Fixes a bug where blocking queries that are retried don't have their blocking timeout reset, resulting in them running longer than expected.	2022-04-18 10:41:27 -04:00
Tim Gross	0cf14a49cc	CI: build binaries for UI branches (#12594 ) Build binaries for every code change, not just backend code changes. This means that we'll have up-to-date compiled assets for every commit available in CircleCI artifacts.	2022-04-18 10:29:20 -04:00
Seth Hoenig	df587d8263	docs: update documentation with connect acls changes This PR updates the changelog, adds notes the 1.3 upgrade guide, and updates the connect integration docs with documentation about the new requirement on Consul ACL policies of Consul agent default anonymous ACL tokens.	2022-04-18 08:22:33 -05:00
Jorge Marey	707c7f3a11	Change consul SI tokens to be local	2022-04-18 08:22:33 -05:00
Shishir	f5121d261e	Add os to NodeListStub struct. (#12497 ) * Add os to NodeListStub struct. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add test: os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-04-15 17:22:45 -07:00
Tim Gross	826d9d47f9	CSI: replace structs->api with serialization extension (#12583 ) The CSI HTTP API has to transform the CSI volume to redact secrets, remove the claims fields, and to consolidate the allocation stubs into a single slice of alloc stubs. This was done manually in #8590 but this is a large amount of code and has proven both very bug prone (see #8659, #8666, #8699, #8735, and #12150) and requires updating lots of code every time we add a field to volumes or plugins. In #10202 we introduce encoding improvements for the `Node` struct that allow a more minimal transformation. Apply this same approach to serializing `structs.CSIVolume` to API responses. Also, the original reasoning behind #8590 for plugins no longer holds because the counts are now denormalized within the state store, so we can simply remove this transformation entirely.	2022-04-15 14:29:34 -04:00
Tim Gross	b14e53e446	CSI: fix volume status prefix matching in CLI (#12584 ) The API for `CSIVolume.List` sorts by created index and not by ID, which breaks the logic for prefix matching in the `volume status` output when the prefix is also an exact match. Ensure that we're handling this case correctly.	2022-04-15 14:16:30 -04:00
Kevin Wang	c74c06746b	chore: redirects (#12560 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-04-15 13:13:40 -04:00
Derek Strickland	4d3a0aae6d	heartbeat: Handle transitioning from disconnected to down (#12559 )	2022-04-15 09:47:45 -04:00
Derek Strickland	0891218ee9	system_scheduler: support disconnected clients (#12555 ) * structs: Add helper method for checking if alloc is configured to disconnect * system_scheduler: Add support for disconnected clients	2022-04-15 09:31:32 -04:00
Tim Gross	f5d8c636c7	CSI: handle per-alloc volumes in `alloc status -verbose` CLI (#12573 ) The Nomad client's `csi_hook` interpolates the alloc suffix with the volume request's name for CSI volumes with `per_alloc = true`, turning `example` into `example[1]`. We need to do this same behavior in the `alloc status` output so that we show the correct volume.	2022-04-15 09:26:19 -04:00
Seth Hoenig	e8b0b91418	Merge pull request #12579 from hashicorp/ci-missing-packages-oss ci: ensure package coverage of test-core	2022-04-15 08:11:41 -05:00
Lars Lehtonen	81bb1ef030	command/agent: check err before close (#12574 )	2022-04-15 08:54:03 -04:00
Seth Hoenig	47040391bb	ci: ensure package coverage of test-core	2022-04-14 19:04:06 -05:00
Michael Schurter	70a04dd106	docs: add plan for node rejected details and more (#12564 ) - Moved federation docs to the bottom since everyone is potentially affected by the other sections on the page, but only users of federation are affected by it. - Added section on the plan for node rejected bug since it is fairly easy to diagnose and removing affected nodes is a fairly reliable workaround. - Mention 5s cliff for wait_for_index. - Remove the lie that we do not have job status metrics! How old was that?! - Reinforce the importance of monitoring basic system resources	2022-04-14 16:09:33 -07:00
Tim Gross	d62dd5b3fe	E2E: add debugging outputs for disconnected clients test (#12572 ) This test has a failure that's happening only occassionally and not very reproducibly. Print out the allocation status on test failure so that we can do some post-mortum debugging of the test on nightly.	2022-04-14 17:03:57 -04:00
Tim Gross	267c056e0e	ui: remove beta tag from gutter menu for CSI (#12570 )	2022-04-14 14:56:04 -04:00
Tim Gross	82b65899a1	fix data race in dynamic plugin registry tests (#12554 ) These tests have a data race where the test assertion is reading a value that's being set in the `listenFunc` goroutines that are subscribing to registry update events. Move the assertion into the subscribing goroutine to remove the race. This bug was discovered in #12098 but does not impact production Nomad code.	2022-04-14 14:55:56 -04:00
Seth Hoenig	6d042340b4	Merge pull request #12543 from idrennanvmware/add-allocid-to-sidecar Add alloc_id to sidecar bootstrap	2022-04-14 13:27:09 -05:00
Luiz Aoqui	8b2ea6b61b	ci: fix backport target branch pattern (#12571 )	2022-04-14 14:12:41 -04:00
Seth Hoenig	a1c4f16cf1	connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs This PR expands on the work done in #12543 to - prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags - merge into pre-existing envoy_stats_tags fields - update the upgrade guide docs - update changelog	2022-04-14 12:52:52 -05:00
Ian Drennan	70bd32df83	Add alloc_id to sidecar bootstrap	2022-04-14 11:46:06 -05:00

1 2 3 4 5 ...

22987 commits