open-nomad

Commit Graph

Author	SHA1	Message	Date
Lars Lehtonen	81bb1ef030	command/agent: check err before close (#12574 )	2022-04-15 08:54:03 -04:00
Michael Schurter	70a04dd106	docs: add plan for node rejected details and more (#12564 ) - Moved federation docs to the bottom since everyone is potentially affected by the other sections on the page, but only users of federation are affected by it. - Added section on the plan for node rejected bug since it is fairly easy to diagnose and removing affected nodes is a fairly reliable workaround. - Mention 5s cliff for wait_for_index. - Remove the lie that we do not have job status metrics! How old was that?! - Reinforce the importance of monitoring basic system resources	2022-04-14 16:09:33 -07:00
Tim Gross	d62dd5b3fe	E2E: add debugging outputs for disconnected clients test (#12572 ) This test has a failure that's happening only occassionally and not very reproducibly. Print out the allocation status on test failure so that we can do some post-mortum debugging of the test on nightly.	2022-04-14 17:03:57 -04:00
Tim Gross	267c056e0e	ui: remove beta tag from gutter menu for CSI (#12570 )	2022-04-14 14:56:04 -04:00
Tim Gross	82b65899a1	fix data race in dynamic plugin registry tests (#12554 ) These tests have a data race where the test assertion is reading a value that's being set in the `listenFunc` goroutines that are subscribing to registry update events. Move the assertion into the subscribing goroutine to remove the race. This bug was discovered in #12098 but does not impact production Nomad code.	2022-04-14 14:55:56 -04:00
Seth Hoenig	6d042340b4	Merge pull request #12543 from idrennanvmware/add-allocid-to-sidecar Add alloc_id to sidecar bootstrap	2022-04-14 13:27:09 -05:00
Luiz Aoqui	8b2ea6b61b	ci: fix backport target branch pattern (#12571 )	2022-04-14 14:12:41 -04:00
Seth Hoenig	a1c4f16cf1	connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs This PR expands on the work done in #12543 to - prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags - merge into pre-existing envoy_stats_tags fields - update the upgrade guide docs - update changelog	2022-04-14 12:52:52 -05:00
Ian Drennan	70bd32df83	Add alloc_id to sidecar bootstrap	2022-04-14 11:46:06 -05:00
Michael Schurter	7351f45672	test: test the buffered pipe used by nsd (#12563 ) Nomad Service Discovery uses an in-memory buffered pipe implementation to connect consul-template to the Nomad API. This adds a basic test for that helper functionality.	2022-04-14 08:38:25 -07:00
James Rasell	5a67866ae1	jobspec: add max_client_disconnect to hcl1 group parsing. (#12568 )	2022-04-14 14:56:58 +02:00
Derek Strickland	3f871973f9	Update E2E terraform output command (#12561 )	2022-04-13 16:46:09 -04:00
James Rasell	4cdc46ae75	service discovery: add pagination and filtering support to info requests (#12552 ) * services: add pagination and filter support to info RPC. * cli: add filter flag to service info command. * docs: add pagination and filter details to services info API. * paginator: minor updates to comment and func signature.	2022-04-13 07:41:44 +02:00
claire labry	d2a3fa1921	updates for backport assistant (#12311 )	2022-04-12 14:01:19 -04:00
Tim Gross	a135d9b260	CSI: fix data race in plugin manager (#12553 ) The plugin manager for CSI hands out instances of a plugin for callers that need to mount a volume. The `MounterForPlugin` method accesses the internal instances map without a lock, and can be called concurrently from outside the plugin manager's main run-loop. The original commit for the instances map included a warning that it needed to be accessed only from the main loop but that comment was unfortunately ignored shortly thereafter, so this bug has existed in the code for a couple years without being detected until we ran tests with `-race` in #12098. Lesson learned here: comments make for lousy enforcement of invariants!	2022-04-12 12:18:04 -04:00
Luiz Aoqui	82027edb2f	add some godocs for the API pagination tokenizer options (#12547 )	2022-04-12 10:27:22 -04:00
Tim Gross	4078e6ea0e	scripts: fix interpreter for bash (#12549 ) Many of our scripts have a non-portable interpreter line for bash and use bash-specific variables like `BASH_SOURCE`. Update the interpreter line to be portable between various Linuxes and macOS without complaint from posix shell users.	2022-04-12 10:08:21 -04:00
Tim Gross	31e72e93ff	E2E: fix flaky event stream test (#12548 ) This changeset fixes two sources of flakiness in the event stream test. First, the stream request gets the event closest to the index, not the exact match. Although events are written before raft entries they're written asynchronously, so it's possible to race and get a raft index from this query higher than the current head of the event buffer. Ensure the job is running before we try to get the index, so that we've given the event enough time to land in the buffer. Second, the assertion that the found index is greater than the start index is only true if the `PlanResult` event manages to land before we do the second registration. Although it should now with the first fix above, it's not a correct assertion for what we're testing.	2022-04-12 08:35:39 -04:00
Luiz Aoqui	bc78c8617f	ci: change notification channel to feed-nomad-releases (#12550 )	2022-04-11 19:12:58 -04:00
claire labry	76fc79ce46	move nomad.service out of etc (#12541 )	2022-04-11 18:26:10 -04:00
Seth Hoenig	f59488bda6	Merge pull request #12532 from greut/feat/remove-consul-lib feat: remove dependency to consul/lib	2022-04-11 13:52:05 -05:00
Karan Sharma	37c907a8d2	feat: add nomctx and nomad-events-sink (#12542 )	2022-04-11 14:47:03 -04:00
Yoan Blanc	3e79d58e4a	fix: use NewSafeTimer Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-11 19:37:14 +02:00
Tim Gross	77ab8d92f1	E2E: oversubscription assertion needs to wait for stats (#12540 ) The oversubscription test expects an output that requires the client has polled the task for stats at least once. Wait long enough to ensure that we've polled the stats before failing the test.	2022-04-11 11:40:51 -04:00
Tim Gross	c9c3cbd878	E2E: test for nodes disconnected by netsplit (#12407 )	2022-04-11 11:34:27 -04:00
Tim Gross	57b3a0028f	allocs without max_client_disconnect should be lost on disconnect (#12529 ) In the reconciler's filtering for tainted nodes, we use whether the server supports disconnected clients as a gate to a bunch of our logic, but this doesn't account for cases where the job doesn't have `max_client_disconnect`. The only real consequence of this appears to be that allocs on disconnected nodes are marked "complete" instead of "lost".	2022-04-11 11:24:49 -04:00
Seth Hoenig	fecf4b46eb	Merge pull request #12527 from fynxiu/plugins/drivers/ctxdone fix(plugins): should return when ctx.Done	2022-04-11 07:46:39 -05:00
James Rasell	bc800a18d1	e2e: add initial service discovery tests. (#12512 ) Some tests may chose to deregister jobs to check Nomad cleanup logic, however, it is still possible for the test to fail and exit before this is hit. This therefore adds a cancellable cleanup func which can be deferred, using context to control whether it gets run or not.	2022-04-11 11:12:24 +02:00
Yoan Blanc	5e8254beda	feat: remove dependency to consul/lib Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-09 13:22:44 +02:00
Tim Gross	9e53906782	set minimum version for disconnected client mode to 1.3.0 (#12530 )	2022-04-08 16:48:37 -04:00
Luiz Aoqui	16e3a1028e	changelog: update #12476 entry to highlight the feature (#12528 )	2022-04-08 13:28:23 -04:00
Luiz Aoqui	b829957f52	Merge pull request #12506 from hashicorp/merge-release-1.3.0-beta.1-branch	2022-04-08 13:21:33 -04:00
fyn	1174bc2052	fix(plugins): should return when ctx.Done	2022-04-09 01:04:29 +08:00
Seth Hoenig	44481b35b6	Merge pull request #12524 from hashicorp/docs-cleanup-up-docs docs: fixup title formatting in upgrade guide	2022-04-08 11:58:49 -05:00
Seth Hoenig	a75bc27601	docs: fixup title formatting in upgrade guide	2022-04-08 11:50:54 -05:00
Luiz Aoqui	0190f378a7	docs: fix upgrade specific broken link and conflict tag (#12521 )	2022-04-08 12:36:47 -04:00
Luiz Aoqui	5e642a4742	add Nomad v1.3.0-beta.1 download box (#12517 )	2022-04-08 12:04:14 -04:00
James Rasell	6ac5fd9768	docs: add nomad services template jobspec example. (#12514 )	2022-04-08 17:29:19 +02:00
Luiz Aoqui	45ab5d6308	ci: add semgrep rule to catch usage of invalid string extensions (#12509 )	2022-04-08 10:58:32 -04:00
Seth Hoenig	79d11e6f87	Merge pull request #12508 from twunderlich-grapl/custom-variable-validation Add custom variable validation to docs	2022-04-08 08:53:03 -05:00
Seth Hoenig	e7aa81d3cb	docs: tweak hcl2 validation example	2022-04-08 08:43:42 -05:00
Thomas Wunderlich	3f6465f078	Add custom variable validation to docs Custom variable validation is a useful feature that is supported by Nomad and not just Terraform. As such it should be documented on the input variable page. I've cribbed the content from the terraform docs so this should be consistent across projects	2022-04-07 19:06:06 -04:00
Luiz Aoqui	5c15cafc89	remove generated files and prepare for next release	2022-04-07 18:51:18 -04:00
Luiz Aoqui	d96ffb065f	Merge remote-tracking branch 'origin/release/1.3.0-beta.1' into merge-release-1.3.0-beta.1-branch	2022-04-07 18:46:18 -04:00
Jasmine Dahilig	386f2fac3a	docs: add token_last_renewal and token_next_renewal to server metrics and key metrics #12435 (#12505 )	2022-04-07 15:12:41 -07:00
hc-github-team-nomad-core	07c6d10c86	Generate files for release	2022-04-07 20:21:26 +00:00
Luiz Aoqui	43991dc868	update ci.hcl, version.go and CHANGELOG to v1.3.0-beta.1	2022-04-07 16:13:49 -04:00
Luiz Aoqui	cd15e3386c	ci: skip prerelease if triggered by the generate assets workflow (#12504 )	2022-04-07 16:04:53 -04:00
Phil Renaud	311a6d82c9	Importing string methods directly from @ember/string (#12499 ) * Capitalize methods * Let ESLint yell at us again * Dasherize	2022-04-07 15:51:41 -04:00
Tim Gross	09b5e8d388	Fix flaky `operator debug` test (#12501 ) We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes: * Make first pprof collection synchronous to preserve the existing behavior for the common case where the pprof interval matches the duration. * Clamp `operator debug` pprof timing to that of the command. The `pprof-duration` should be no more than `duration` and the `pprof-interval` should be no more than `pprof-duration`. Clamp the values rather than throwing errors, which could change the commands that existing users might already have in debugging scripts * Testing: remove test parallelism The `operator debug` tests that stand up servers can't be run in parallel, because we don't have a way of canceling the API calls for pprof. The agent will still be running the last pprof when we exit, and that breaks the next test that talks to that same agent. (Because you can only run one pprof at a time on any process!) We could split off each subtest into its own server, but this test suite is already very slow. In future work we should fix this "for real" by making the API call cancelable. * Testing: assert against unexpected errors in `operator debug` tests. If we assert there are no unexpected error outputs, it's easier for the developer to debug when something is going wrong with the tests because the error output will be presented as a failing test, rather than just a failing exit code check. Or worse, no failing exit code check! This also forces us to be explicit about which tests will return 0 exit codes but still emit (presumably ignorable) error outputs. Additional minor bug fixes (mostly in tests) and test refactorings: * Fix text alignment on pprof Duration in `operator debug` output * Remove "done" channel from `operator debug` event stream test. The goroutine we're blocking for here already tells us it's done by sending a value, so block on that instead of an extraneous channel * Event stream test timer should start at current time, not zero * Remove noise from `operator debug` test log output. The `t.Logf` calls already are picked out from the rest of the test output by being prefixed with the filename. * Remove explicit pprof args so we use the defaults clamped from duration/interval	2022-04-07 15:00:07 -04:00

1 2 3 4 5 ...

22896 Commits All Branches Search

22896 Commits

All Branches