Commit Graph

23107 Commits

Author SHA1 Message Date
Seth Hoenig df587d8263 docs: update documentation with connect acls changes
This PR updates the changelog, adds notes the 1.3 upgrade guide, and
updates the connect integration docs with documentation about the new
requirement on Consul ACL policies of Consul agent default anonymous ACL
tokens.
2022-04-18 08:22:33 -05:00
Jorge Marey 707c7f3a11 Change consul SI tokens to be local 2022-04-18 08:22:33 -05:00
Shishir f5121d261e
Add os to NodeListStub struct. (#12497)
* Add os to NodeListStub struct.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>

* Add test: os as a query param to /v1/nodes.

Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-04-15 17:22:45 -07:00
Tim Gross 826d9d47f9
CSI: replace structs->api with serialization extension (#12583)
The CSI HTTP API has to transform the CSI volume to redact secrets,
remove the claims fields, and to consolidate the allocation stubs into
a single slice of alloc stubs. This was done manually in #8590 but
this is a large amount of code and has proven both very bug prone
(see #8659, #8666, #8699, #8735, and #12150) and requires updating
lots of code every time we add a field to volumes or plugins.

In #10202 we introduce encoding improvements for the `Node` struct
that allow a more minimal transformation. Apply this same approach to
serializing `structs.CSIVolume` to API responses.

Also, the original reasoning behind #8590 for plugins no longer holds
because the counts are now denormalized within the state store, so we
can simply remove this transformation entirely.
2022-04-15 14:29:34 -04:00
Tim Gross b14e53e446
CSI: fix volume status prefix matching in CLI (#12584)
The API for `CSIVolume.List` sorts by created index and not by ID,
which breaks the logic for prefix matching in the `volume status`
output when the prefix is also an exact match. Ensure that we're
handling this case correctly.
2022-04-15 14:16:30 -04:00
Kevin Wang c74c06746b
chore: redirects (#12560)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-04-15 13:13:40 -04:00
Derek Strickland 4d3a0aae6d
heartbeat: Handle transitioning from disconnected to down (#12559) 2022-04-15 09:47:45 -04:00
Derek Strickland 0891218ee9
system_scheduler: support disconnected clients (#12555)
* structs: Add helper method for checking if alloc is configured to disconnect
* system_scheduler: Add support for disconnected clients
2022-04-15 09:31:32 -04:00
Tim Gross f5d8c636c7
CSI: handle per-alloc volumes in `alloc status -verbose` CLI (#12573)
The Nomad client's `csi_hook` interpolates the alloc suffix with the
volume request's name for CSI volumes with `per_alloc = true`, turning
`example` into `example[1]`. We need to do this same behavior in the
`alloc status` output so that we show the correct volume.
2022-04-15 09:26:19 -04:00
Seth Hoenig e8b0b91418
Merge pull request #12579 from hashicorp/ci-missing-packages-oss
ci: ensure package coverage of test-core
2022-04-15 08:11:41 -05:00
Lars Lehtonen 81bb1ef030
command/agent: check err before close (#12574) 2022-04-15 08:54:03 -04:00
Seth Hoenig 47040391bb ci: ensure package coverage of test-core 2022-04-14 19:04:06 -05:00
Michael Schurter 70a04dd106
docs: add plan for node rejected details and more (#12564)
- Moved federation docs to the bottom since *everyone* is potentially
  affected by the other sections on the page, but only users of
  federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
  easy to diagnose and removing affected nodes is a fairly reliable
  workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
  that?!
- Reinforce the importance of monitoring basic system resources
2022-04-14 16:09:33 -07:00
Tim Gross d62dd5b3fe
E2E: add debugging outputs for disconnected clients test (#12572)
This test has a failure that's happening only occassionally and not
very reproducibly. Print out the allocation status on test failure so
that we can do some post-mortum debugging of the test on nightly.
2022-04-14 17:03:57 -04:00
Tim Gross 267c056e0e
ui: remove beta tag from gutter menu for CSI (#12570) 2022-04-14 14:56:04 -04:00
Tim Gross 82b65899a1
fix data race in dynamic plugin registry tests (#12554)
These tests have a data race where the test assertion is reading a
value that's being set in the `listenFunc` goroutines that are
subscribing to registry update events. Move the assertion into the
subscribing goroutine to remove the race. This bug was discovered
in #12098 but does not impact production Nomad code.
2022-04-14 14:55:56 -04:00
Seth Hoenig 6d042340b4
Merge pull request #12543 from idrennanvmware/add-allocid-to-sidecar
Add alloc_id to sidecar bootstrap
2022-04-14 13:27:09 -05:00
Luiz Aoqui 8b2ea6b61b
ci: fix backport target branch pattern (#12571) 2022-04-14 14:12:41 -04:00
Seth Hoenig a1c4f16cf1 connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs
This PR expands on the work done in #12543 to
- prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags
- merge into pre-existing envoy_stats_tags fields
- update the upgrade guide docs
- update changelog
2022-04-14 12:52:52 -05:00
Ian Drennan 70bd32df83 Add alloc_id to sidecar bootstrap 2022-04-14 11:46:06 -05:00
Michael Schurter 7351f45672
test: test the buffered pipe used by nsd (#12563)
Nomad Service Discovery uses an in-memory buffered pipe implementation
to connect consul-template to the Nomad API.

This adds a basic test for that helper functionality.
2022-04-14 08:38:25 -07:00
James Rasell 5a67866ae1
jobspec: add max_client_disconnect to hcl1 group parsing. (#12568) 2022-04-14 14:56:58 +02:00
Derek Strickland 3f871973f9
Update E2E terraform output command (#12561) 2022-04-13 16:46:09 -04:00
James Rasell 4cdc46ae75
service discovery: add pagination and filtering support to info requests (#12552)
* services: add pagination and filter support to info RPC.
* cli: add filter flag to service info command.
* docs: add pagination and filter details to services info API.
* paginator: minor updates to comment and func signature.
2022-04-13 07:41:44 +02:00
claire labry d2a3fa1921
updates for backport assistant (#12311) 2022-04-12 14:01:19 -04:00
Tim Gross a135d9b260
CSI: fix data race in plugin manager (#12553)
The plugin manager for CSI hands out instances of a plugin for callers
that need to mount a volume. The `MounterForPlugin` method accesses
the internal instances map without a lock, and can be called
concurrently from outside the plugin manager's main run-loop.

The original commit for the instances map included a warning that it
needed to be accessed only from the main loop but that comment was
unfortunately ignored shortly thereafter, so this bug has existed in
the code for a couple years without being detected until we ran tests
with `-race` in #12098. Lesson learned here: comments make for lousy
enforcement of invariants!
2022-04-12 12:18:04 -04:00
Luiz Aoqui 82027edb2f
add some godocs for the API pagination tokenizer options (#12547) 2022-04-12 10:27:22 -04:00
Tim Gross 4078e6ea0e
scripts: fix interpreter for bash (#12549)
Many of our scripts have a non-portable interpreter line for bash and
use bash-specific variables like `BASH_SOURCE`. Update the interpreter
line to be portable between various Linuxes and macOS without
complaint from posix shell users.
2022-04-12 10:08:21 -04:00
Tim Gross 31e72e93ff
E2E: fix flaky event stream test (#12548)
This changeset fixes two sources of flakiness in the event stream test.

First, the stream request gets the event *closest* to the index, not
the exact match. Although events are written before raft entries
they're written asynchronously, so it's possible to race and get a
raft index from this query higher than the current head of the event
buffer. Ensure the job is running before we try to get the index, so
that we've given the event enough time to land in the buffer.

Second, the assertion that the found index is greater than the start
index is only true if the `PlanResult` event manages to land before we
do the second registration. Although it should now with the first fix
above, it's not a correct assertion for what we're testing.
2022-04-12 08:35:39 -04:00
Luiz Aoqui bc78c8617f
ci: change notification channel to feed-nomad-releases (#12550) 2022-04-11 19:12:58 -04:00
claire labry 76fc79ce46
move nomad.service out of etc (#12541) 2022-04-11 18:26:10 -04:00
Seth Hoenig f59488bda6
Merge pull request #12532 from greut/feat/remove-consul-lib
feat: remove dependency to consul/lib
2022-04-11 13:52:05 -05:00
Karan Sharma 37c907a8d2
feat: add nomctx and nomad-events-sink (#12542) 2022-04-11 14:47:03 -04:00
Yoan Blanc 3e79d58e4a
fix: use NewSafeTimer
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-11 19:37:14 +02:00
Tim Gross 77ab8d92f1
E2E: oversubscription assertion needs to wait for stats (#12540)
The oversubscription test expects an output that requires the client
has polled the task for stats at least once. Wait long enough to
ensure that we've polled the stats before failing the test.
2022-04-11 11:40:51 -04:00
Tim Gross c9c3cbd878
E2E: test for nodes disconnected by netsplit (#12407) 2022-04-11 11:34:27 -04:00
Tim Gross 57b3a0028f
allocs without max_client_disconnect should be lost on disconnect (#12529)
In the reconciler's filtering for tainted nodes, we use whether the
server supports disconnected clients as a gate to a bunch of our
logic, but this doesn't account for cases where the job doesn't have
`max_client_disconnect`. The only real consequence of this appears to
be that allocs on disconnected nodes are marked "complete" instead of
"lost".
2022-04-11 11:24:49 -04:00
Seth Hoenig fecf4b46eb
Merge pull request #12527 from fynxiu/plugins/drivers/ctxdone
fix(plugins): should return when ctx.Done
2022-04-11 07:46:39 -05:00
James Rasell bc800a18d1
e2e: add initial service discovery tests. (#12512)
Some tests may chose to deregister jobs to check Nomad cleanup
logic, however, it is still possible for the test to fail and exit
before this is hit. This therefore adds a cancellable cleanup func
which can be deferred, using context to control whether it gets
run or not.
2022-04-11 11:12:24 +02:00
Yoan Blanc 5e8254beda
feat: remove dependency to consul/lib
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2022-04-09 13:22:44 +02:00
Tim Gross 9e53906782
set minimum version for disconnected client mode to 1.3.0 (#12530) 2022-04-08 16:48:37 -04:00
Luiz Aoqui 16e3a1028e
changelog: update #12476 entry to highlight the feature (#12528) 2022-04-08 13:28:23 -04:00
Luiz Aoqui b829957f52
Merge pull request #12506 from hashicorp/merge-release-1.3.0-beta.1-branch 2022-04-08 13:21:33 -04:00
fyn 1174bc2052
fix(plugins): should return when ctx.Done 2022-04-09 01:04:29 +08:00
Seth Hoenig 44481b35b6
Merge pull request #12524 from hashicorp/docs-cleanup-up-docs
docs: fixup title formatting in upgrade guide
2022-04-08 11:58:49 -05:00
Seth Hoenig a75bc27601 docs: fixup title formatting in upgrade guide 2022-04-08 11:50:54 -05:00
Luiz Aoqui 0190f378a7
docs: fix upgrade specific broken link and conflict tag (#12521) 2022-04-08 12:36:47 -04:00
Luiz Aoqui 5e642a4742
add Nomad v1.3.0-beta.1 download box (#12517) 2022-04-08 12:04:14 -04:00
James Rasell 6ac5fd9768
docs: add nomad services template jobspec example. (#12514) 2022-04-08 17:29:19 +02:00
Luiz Aoqui 45ab5d6308
ci: add semgrep rule to catch usage of invalid string extensions (#12509) 2022-04-08 10:58:32 -04:00