open-nomad

Author	SHA1	Message	Date
Tim Gross	31e72e93ff	E2E: fix flaky event stream test (#12548 ) This changeset fixes two sources of flakiness in the event stream test. First, the stream request gets the event closest to the index, not the exact match. Although events are written before raft entries they're written asynchronously, so it's possible to race and get a raft index from this query higher than the current head of the event buffer. Ensure the job is running before we try to get the index, so that we've given the event enough time to land in the buffer. Second, the assertion that the found index is greater than the start index is only true if the `PlanResult` event manages to land before we do the second registration. Although it should now with the first fix above, it's not a correct assertion for what we're testing.	2022-04-12 08:35:39 -04:00
Luiz Aoqui	bc78c8617f	ci: change notification channel to feed-nomad-releases (#12550 )	2022-04-11 19:12:58 -04:00
claire labry	76fc79ce46	move nomad.service out of etc (#12541 )	2022-04-11 18:26:10 -04:00
Seth Hoenig	f59488bda6	Merge pull request #12532 from greut/feat/remove-consul-lib feat: remove dependency to consul/lib	2022-04-11 13:52:05 -05:00
Karan Sharma	37c907a8d2	feat: add nomctx and nomad-events-sink (#12542 )	2022-04-11 14:47:03 -04:00
Yoan Blanc	3e79d58e4a	fix: use NewSafeTimer Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-11 19:37:14 +02:00
Tim Gross	77ab8d92f1	E2E: oversubscription assertion needs to wait for stats (#12540 ) The oversubscription test expects an output that requires the client has polled the task for stats at least once. Wait long enough to ensure that we've polled the stats before failing the test.	2022-04-11 11:40:51 -04:00
Tim Gross	c9c3cbd878	E2E: test for nodes disconnected by netsplit (#12407 )	2022-04-11 11:34:27 -04:00
Tim Gross	57b3a0028f	allocs without max_client_disconnect should be lost on disconnect (#12529 ) In the reconciler's filtering for tainted nodes, we use whether the server supports disconnected clients as a gate to a bunch of our logic, but this doesn't account for cases where the job doesn't have `max_client_disconnect`. The only real consequence of this appears to be that allocs on disconnected nodes are marked "complete" instead of "lost".	2022-04-11 11:24:49 -04:00
Seth Hoenig	fecf4b46eb	Merge pull request #12527 from fynxiu/plugins/drivers/ctxdone fix(plugins): should return when ctx.Done	2022-04-11 07:46:39 -05:00
James Rasell	bc800a18d1	e2e: add initial service discovery tests. (#12512 ) Some tests may chose to deregister jobs to check Nomad cleanup logic, however, it is still possible for the test to fail and exit before this is hit. This therefore adds a cancellable cleanup func which can be deferred, using context to control whether it gets run or not.	2022-04-11 11:12:24 +02:00
Yoan Blanc	5e8254beda	feat: remove dependency to consul/lib Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-09 13:22:44 +02:00
Tim Gross	9e53906782	set minimum version for disconnected client mode to 1.3.0 (#12530 )	2022-04-08 16:48:37 -04:00
Luiz Aoqui	16e3a1028e	changelog: update #12476 entry to highlight the feature (#12528 )	2022-04-08 13:28:23 -04:00
Luiz Aoqui	b829957f52	Merge pull request #12506 from hashicorp/merge-release-1.3.0-beta.1-branch	2022-04-08 13:21:33 -04:00
fyn	1174bc2052	fix(plugins): should return when ctx.Done	2022-04-09 01:04:29 +08:00
Seth Hoenig	44481b35b6	Merge pull request #12524 from hashicorp/docs-cleanup-up-docs docs: fixup title formatting in upgrade guide	2022-04-08 11:58:49 -05:00
Seth Hoenig	a75bc27601	docs: fixup title formatting in upgrade guide	2022-04-08 11:50:54 -05:00
Luiz Aoqui	0190f378a7	docs: fix upgrade specific broken link and conflict tag (#12521 )	2022-04-08 12:36:47 -04:00
Luiz Aoqui	5e642a4742	add Nomad v1.3.0-beta.1 download box (#12517 )	2022-04-08 12:04:14 -04:00
James Rasell	6ac5fd9768	docs: add nomad services template jobspec example. (#12514 )	2022-04-08 17:29:19 +02:00
Luiz Aoqui	45ab5d6308	ci: add semgrep rule to catch usage of invalid string extensions (#12509 )	2022-04-08 10:58:32 -04:00
Seth Hoenig	79d11e6f87	Merge pull request #12508 from twunderlich-grapl/custom-variable-validation Add custom variable validation to docs	2022-04-08 08:53:03 -05:00
Seth Hoenig	e7aa81d3cb	docs: tweak hcl2 validation example	2022-04-08 08:43:42 -05:00
Thomas Wunderlich	3f6465f078	Add custom variable validation to docs Custom variable validation is a useful feature that is supported by Nomad and not just Terraform. As such it should be documented on the input variable page. I've cribbed the content from the terraform docs so this should be consistent across projects	2022-04-07 19:06:06 -04:00
Luiz Aoqui	5c15cafc89	remove generated files and prepare for next release	2022-04-07 18:51:18 -04:00
Luiz Aoqui	d96ffb065f	Merge remote-tracking branch 'origin/release/1.3.0-beta.1' into merge-release-1.3.0-beta.1-branch	2022-04-07 18:46:18 -04:00
Jasmine Dahilig	386f2fac3a	docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505 )	2022-04-07 15:12:41 -07:00
hc-github-team-nomad-core	07c6d10c86	Generate files for release	2022-04-07 20:21:26 +00:00
Luiz Aoqui	43991dc868	update ci.hcl, version.go and CHANGELOG to v1.3.0-beta.1	2022-04-07 16:13:49 -04:00
Luiz Aoqui	cd15e3386c	ci: skip prerelease if triggered by the generate assets workflow (#12504 )	2022-04-07 16:04:53 -04:00
Phil Renaud	311a6d82c9	Importing string methods directly from @ember/string (#12499 ) * Capitalize methods * Let ESLint yell at us again * Dasherize	2022-04-07 15:51:41 -04:00
Tim Gross	09b5e8d388	Fix flaky `operator debug` test (#12501 ) We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes: * Make first pprof collection synchronous to preserve the existing behavior for the common case where the pprof interval matches the duration. * Clamp `operator debug` pprof timing to that of the command. The `pprof-duration` should be no more than `duration` and the `pprof-interval` should be no more than `pprof-duration`. Clamp the values rather than throwing errors, which could change the commands that existing users might already have in debugging scripts * Testing: remove test parallelism The `operator debug` tests that stand up servers can't be run in parallel, because we don't have a way of canceling the API calls for pprof. The agent will still be running the last pprof when we exit, and that breaks the next test that talks to that same agent. (Because you can only run one pprof at a time on any process!) We could split off each subtest into its own server, but this test suite is already very slow. In future work we should fix this "for real" by making the API call cancelable. * Testing: assert against unexpected errors in `operator debug` tests. If we assert there are no unexpected error outputs, it's easier for the developer to debug when something is going wrong with the tests because the error output will be presented as a failing test, rather than just a failing exit code check. Or worse, no failing exit code check! This also forces us to be explicit about which tests will return 0 exit codes but still emit (presumably ignorable) error outputs. Additional minor bug fixes (mostly in tests) and test refactorings: * Fix text alignment on pprof Duration in `operator debug` output * Remove "done" channel from `operator debug` event stream test. The goroutine we're blocking for here already tells us it's done by sending a value, so block on that instead of an extraneous channel * Event stream test timer should start at current time, not zero * Remove noise from `operator debug` test log output. The `t.Logf` calls already are picked out from the rest of the test output by being prefixed with the filename. * Remove explicit pprof args so we use the defaults clamped from duration/interval	2022-04-07 15:00:07 -04:00
Seth Hoenig	839bc21bd1	Merge pull request #12496 from hashicorp/f-cores-env client: set environment variable indicating set of reserved cpu cores	2022-04-07 12:07:57 -05:00
James Rasell	dbf28a06c1	e2e: fix eventual consistency failure within consultemplate suite. (#12494 )	2022-04-07 17:03:10 +02:00
Seth Hoenig	9236fe3904	docs: update cl	2022-04-07 10:02:00 -05:00
Lars Lehtonen	df1edf5cf4	nomad/state: fix dropped test errors (#12406 )	2022-04-07 10:48:10 -04:00
Seth Hoenig	0870aa31dc	client: set environment variable indicating set of reserved cpu cores This PR injects the 'NOMAD_CPU_CORES' environment variable into tasks that have been allocated reserved cpu cores. The value uses normal cpuset notation, as found in cpuset.cpu cgroup interface files. Note this value is not necessiarly the same as the content of the actual cpuset.cpus interface file, which will also include shared cpu cores when using cgroups v2. This variable is a workaround for users who used to be able to read the reserved cgroup cpuset file, but lose the information about distinct reserved cores when using cgroups v2. Side discussion in: https://github.com/hashicorp/nomad/issues/12374	2022-04-07 09:09:35 -05:00
Derek Strickland	065b3ed886	plan_apply: Add missing unit test for validating plans for disconnected clients (#12495 )	2022-04-07 09:58:09 -04:00
Tim Gross	1724765096	api: use `cleanhttp.DefaultPooledTransport` for default API client (#12492 ) We expect every Nomad API client to use a single connection to any given agent, so take advantage of keep-alive by switching the default transport to `DefaultPooledClient`. Provide a facility to close idle connections for testing purposes. Restores the previously reverted #12409 Co-authored-by: Ben Buzbee <bbuzbee@cloudflare.com>	2022-04-06 16:14:53 -04:00
Luiz Aoqui	0b13ea6920	changelog: make breaking change note for raft v3 (#12493 )	2022-04-06 16:00:38 -04:00
Luiz Aoqui	697e82a665	changelog: add entry for #12435 (#12491 )	2022-04-06 14:22:09 -04:00
Seth Hoenig	42f094c311	Merge pull request #12484 from hashicorp/tests-handler-exec-failure exec: fix exec handler test	2022-04-06 13:13:07 -05:00
Luiz Aoqui	111af0a936	changelog: minor fixes (#12487 )	2022-04-06 14:05:10 -04:00
James Rasell	9bc16b1333	client: account for service provider namespace updates in hooks. (#12479 ) When a service is updated, the service hooks update a number of internal fields which helps generate the new workload. This also needs to update the namespace for the service provider. It is possible for these to be different, and in the case of Nomad and Consul running OSS, this is to be expected.	2022-04-06 19:26:22 +02:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Seth Hoenig	bae42fad7c	exec: fix exec handler test Fixup this test to handle cgroups v2, as well as the :misc: cgroup	2022-04-06 12:11:37 -05:00
Jasmine Dahilig	38efb3c8d8	metrics: emit stats for vault token next_renewal & last_renewal #5222 (#12435 )	2022-04-06 10:03:11 -07:00
Jasmine Dahilig	f67b108f9f	docs: update vault-token note in job run command #8040 (#12385 )	2022-04-06 10:01:38 -07:00
Tim Gross	92ae1e9c81	Revert "Use cleanhttp.DefaultPooledTransport for the default API client (#12409 )" (#12480 ) This reverts commit 6e1270dd08e513bdbb6fbb7378f207f1afef9fc3.	2022-04-06 12:58:51 -04:00

... 4 5 6 7 8 ...

23129 commits