open-nomad

Commit Graph

Author	SHA1	Message	Date
Danish Prakash	e7e8ce212e	command/operator_debug: add pprof interval (#11938 )	2022-04-04 15:24:12 -04:00
Michael Schurter	a1f1294dc4	Merge pull request #12442 from hashicorp/f-sd-add-mixed-auth-read-endpoints service-disco: add mixed auth to list and read RPC endpoints.	2022-04-04 12:19:29 -07:00
Tim Gross	759310d13a	CSI: volume watcher shutdown fixes (#12439 ) The volume watcher design was based on deploymentwatcher and drainer, but has an important difference: we don't want to maintain a goroutine for the lifetime of the volume. So we stop the volumewatcher goroutine for a volume when that volume has no more claims to free. But the shutdown races with updates on the parent goroutine, and it's possible to drop updates. Fortunately these updates are picked up on the next core GC job, but we're most likely to hit this race when we're replacing an allocation and that's the time we least want to wait. Wait until the volume has "settled" before stopping this goroutine so that the race between shutdown and the parent goroutine sending on `<-updateCh` is pushed to after the window we most care about quick freeing of claims. * Fixes a resource leak when volumewatchers are no longer needed. The volume is nil and can't ever be started again, so the volume's `watcher` should be removed from the top-level `Watcher`. * De-flakes the GC job test: the test throws an error because the claimed node doesn't exist and is unreachable. This flaked instead of failed because we didn't correctly wait for the first pass through the volumewatcher. Make the GC job wait for the volumewatcher to reach the quiescent timeout window state before running the GC eval under test, so that we're sure the GC job's work isn't being picked up by processing one of the earlier claims. Update the claims used so that we're sure the GC pass won't hit a node unpublish error. * Adds trace logging to unpublish operations	2022-04-04 10:46:45 -04:00
Seth Hoenig	bdc9799858	Merge pull request #12403 from hashicorp/dependabot/go_modules/github.com/creack/pty-1.1.18 build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18	2022-04-04 09:43:09 -05:00
dependabot[bot]	3c5bc49329	build(deps): bump github.com/creack/pty from 1.1.17 to 1.1.18 Bumps [github.com/creack/pty](https://github.com/creack/pty) from 1.1.17 to 1.1.18. - [Release notes](https://github.com/creack/pty/releases) - [Commits](https://github.com/creack/pty/compare/v1.1.17...v1.1.18) --- updated-dependencies: - dependency-name: github.com/creack/pty dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-04-04 14:25:02 +00:00
Seth Hoenig	3ce4f52740	Merge pull request #12446 from shoenig/no-pkg-err cleanup: purge github.com/pkg/errors	2022-04-04 09:22:44 -05:00
Tim Gross	806a82dd0c	E2E: ensure that CSI EBS tests are isolated from each other (#12443 ) Tear down the volume-consuming job between subtests, rather than after all the tests are complete. For good measure, use a different ID for the volume-consuming job as well.	2022-04-04 09:44:55 -04:00
James Rasell	ea4d7366fc	service-disco: add mixed auth to list and read RPC endpoints. In the same manner as the delete RPC, the list and read service registration endpoints can be called either by external operators or Nomad nodes. The latter occurs when a template is being rendered which includes Nomad API template funcs. In this case, the auth token is looked up as the node secret ID for auth.	2022-04-04 13:45:43 +01:00
James Rasell	19281bb2fe	Merge pull request #12304 from th0m/tlefebvre/fix-wrong-drivernetworkmanager-interface fix: update incorrect DriverNetworkManager interface implementation	2022-04-04 11:29:22 +02:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Tim Gross	de191e8068	Test lint touchup (#12434 ) * lint: require should not be aliased in core_sched_test * lint: require should not be aliased in volumes_watcher_test * testing: don't alias state package in core_sched_test	2022-04-01 15:17:58 -04:00
Seth Hoenig	1ba8213e9a	Merge pull request #12432 from hashicorp/ci-gha-ignore-subpaths ci: correctly ignore subpaths in gha	2022-04-01 09:58:23 -05:00
Seth Hoenig	f9b0ffafde	Merge pull request #12431 from hashicorp/docs-sysbatch-exists-typo docs: fix typo in system batch description	2022-04-01 09:58:06 -05:00
Seth Hoenig	4381aa122f	ci: correctly ignore subpaths in gha	2022-04-01 09:49:40 -05:00
Seth Hoenig	e9eacb1153	docs: fix typo in system batch description	2022-04-01 09:46:03 -05:00
Bryce Kalow	9b0d77ae78	website: redirect /api to api-docs and update internal links (#12410 )	2022-03-31 11:33:27 -05:00
Tim Gross	8dccc43c2f	docs: remove deprecated client options parameters docs (#12416 ) The client configuration options for drivers have been deprecated since 0.9. We haven't torn them out completely but because they're deprecated it's been hard to guarantee correct behavior. Remove the documentation so that users aren't misled about their viability.	2022-03-31 11:45:51 -04:00
Seth Hoenig	ebf69f93be	Merge pull request #12417 from hashicorp/tests-remove-08-groups-services tests: remove update 08 groups services test	2022-03-31 10:40:50 -05:00
Seth Hoenig	8dcf01e94b	tests: remove update 08 groups services test This is a test around upgrading from Nomad 0.8, which is long since no longer supported. The test is slow, flaky, and imports consul/sdk. Remove this test as it is no longer relevant.	2022-03-31 10:14:22 -05:00
Seth Hoenig	8fe9e3c084	Merge pull request #12414 from hashicorp/tests-docker-dns-sadness tests: create fresh harness for each docker dns test	2022-03-31 10:05:30 -05:00
Seth Hoenig	174a7532a1	tests: create fresh harness for each docker dns test Not actually sure this fixes the flaky tests, but seems like it could be related.	2022-03-31 08:17:34 -05:00
Seth Hoenig	a897ad63aa	Merge pull request #12404 from hashicorp/tests-client-waits tests: wait on client in a couple of tests	2022-03-30 15:23:47 -05:00
Seth Hoenig	fa0dc05b7a	tests: wait on client in a couple of tests These tend to fail on GHA, where I believe the client is not starting up fast enough before making requests. So wait on the client agent first. ``` === RUN TestDebug_CapturedFiles operator_debug_test.go:422: serverName: TestDebug_CapturedFiles.global, clientID, 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4 operator_debug_test.go:492: Error Trace: operator_debug_test.go:492 Error: Should be empty, but was No node(s) with prefix "1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4" found Failed to retrieve clients, 0 nodes found in list: 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4 Test: TestDebug_CapturedFiles --- FAIL: TestDebug_CapturedFiles (0.08s) ```	2022-03-30 08:48:23 -05:00
Seth Hoenig	61bf8022df	Merge pull request #12405 from hashicorp/ci-format-release-metadata-file ci: hcl format release metadata file	2022-03-30 08:13:15 -05:00
Seth Hoenig	d672bc46fd	ci: add trailing newline to release metadata	2022-03-30 08:12:55 -05:00
Tim Gross	3030f954a2	E2E disconnected clients test refactor (#12402 ) * Wait longer for node to go down in disconnected clients test. The existing helper only waits 10s, but there's a jitter on heartbeats that we need to account for. Wait for 30s for node to go down to give us plenty of room * Port disconnected clients to stdlib-style test	2022-03-30 09:12:44 -04:00
Seth Hoenig	0f20bb0e8c	ci: hcl format release metadata file	2022-03-30 08:02:55 -05:00
Michele Degges	f474ed6f51	[RelAPI Onboarding] Add release API metadata file (#12353 )	2022-03-29 15:38:50 -07:00
Michael Schurter	cae69ba8ce	Merge pull request #12312 from hashicorp/f-writeToFile template: disallow `writeToFile` by default	2022-03-29 13:41:59 -07:00
Tim Gross	03c1904112	csi: allow `namespace` field to be passed in volume spec (#12400 ) Use the volume spec's `namespace` field to override the value of the `-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.	2022-03-29 14:46:39 -04:00
Michael Schurter	33fe04ff6a	template: fix comments and docs Review notes from @lgfa29 Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-03-29 09:25:23 -07:00
Tim Gross	19703e3316	E2E: test exercising node drain behavior for CSI volumes (#12384 )	2022-03-29 11:19:23 -04:00
dependabot[bot]	2df08852bf	build(deps): bump github.com/mitchellh/hashstructure from 1.0.0 to 1.1.0 (#12399 ) Bumps [github.com/mitchellh/hashstructure](https://github.com/mitchellh/hashstructure) from 1.0.0 to 1.1.0. - [Release notes](https://github.com/mitchellh/hashstructure/releases) - [Commits](https://github.com/mitchellh/hashstructure/compare/v1.0.0...v1.1.0) --- updated-dependencies: - dependency-name: github.com/mitchellh/hashstructure dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-29 11:17:09 -04:00
Tim Gross	a6652bffad	CSI: reorder controller volume detachment (#12387 ) In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.	2022-03-29 09:44:00 -04:00
Michael Schurter	7a28fcb8af	template: disallow `writeToFile` by default Resolves #12095 by WONTFIXing it. This approach disables `writeToFile` as it allows arbitrary host filesystem writes and is only a small quality of life improvement over multiple `template` stanzas. This approach has the significant downside of leaving people who have altered their `template.function_denylist` still vulnerable! I added an upgrade note, but we should have implemented the denylist as a `map[string]bool` so that new funcs could be denied without overriding custom configurations. This PR also includes a bug fix that broke enabling all consul-template funcs. We repeatedly failed to differentiate between a nil (unset) denylist and an empty (allow all) one.	2022-03-28 17:05:42 -07:00
Ryo Nakao	e11894a0cb	Ensure to close StreamFrame channel (#12248 )	2022-03-28 10:28:23 -04:00
Tim Gross	bc455fc69c	docs: changelog entry (#12393 )	2022-03-28 09:44:58 -04:00
Shishir	afcce3eea5	Display OS name in nomad node status command. (#12388 ) Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-03-28 09:28:14 -04:00
Seth Hoenig	e3c8a86e2e	Merge pull request #12381 from hashicorp/ci-gha-off ci: set test log level off in gha	2022-03-25 15:13:42 -05:00
Tim Gross	5c7f2bad0b	E2E: namespace HCP vault and consul policies to avoid collisions (#12386 ) Concurrent E2E runs can collide when provisioning policies on HCP Consul and HCP Vault. Namespace these by the test run name, as we do for most everything else.	2022-03-25 16:05:59 -04:00
Tim Gross	3c15236fd5	E2E: move example test to use golangs stdlib test runner (#12383 ) Our E2E "framework" has a bunch of features around test discovery and standing up infra that were never completed or fully used, and we ended up building out a large test suite that ignored all that in lieu of Terraform-provided infrastructure for the last couple years. This changeset is a proposal (and demonstration) for gradually migrating our E2E tests off the framework code so that developers can write fairly ordinary golang stdlib testing tests.	2022-03-25 14:44:16 -04:00
Seth Hoenig	e256afdfee	ci: set test log level off in gha	2022-03-25 13:43:33 -05:00
Seth Hoenig	4b895a436a	ci: set count to bypass caching	2022-03-25 13:43:33 -05:00
James Rasell	67b467983e	Merge pull request #12368 from hashicorp/f-1.3-boogie-nights service discovery: add initial MVP implementation	2022-03-25 18:04:47 +01:00
Tim Gross	67b87e46f1	e2e: test for allocations replacement on disconnected clients (#12375 ) This test exercises the behavior of clients that become disconnected and have their allocations replaced. Future test cases will exercise the `max_client_disconnect` field on the job spec.	2022-03-25 12:26:43 -04:00
Luiz Aoqui	c387e2d97e	ci: fix semgrep rule for RPC authentication	2022-03-25 12:00:48 -04:00
Hunter Morris	dcaf99dcc1	client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371 )	2022-03-25 11:50:52 -04:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
Luiz Aoqui	848a3b271f	docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378 )	2022-03-25 10:11:46 -04:00
Seth Hoenig	42ccdf6db3	Merge pull request #12380 from hashicorp/ci-gha-verbose ci: cleanup verbose mode and enable for gha	2022-03-25 08:01:57 -05:00

1 2 3 4 5 ...

22785 Commits All Branches Search

22785 Commits

All Branches