Commit graph

22750 commits

Author SHA1 Message Date
Tim Gross 19703e3316
E2E: test exercising node drain behavior for CSI volumes (#12384) 2022-03-29 11:19:23 -04:00
dependabot[bot] 2df08852bf
build(deps): bump github.com/mitchellh/hashstructure from 1.0.0 to 1.1.0 (#12399)
Bumps [github.com/mitchellh/hashstructure](https://github.com/mitchellh/hashstructure) from 1.0.0 to 1.1.0.
- [Release notes](https://github.com/mitchellh/hashstructure/releases)
- [Commits](https://github.com/mitchellh/hashstructure/compare/v1.0.0...v1.1.0)

---
updated-dependencies:
- dependency-name: github.com/mitchellh/hashstructure
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-29 11:17:09 -04:00
Tim Gross a6652bffad
CSI: reorder controller volume detachment (#12387)
In #12112 and #12113 we solved for the problem of races in releasing
volume claims, but there was a case that we missed. During a node
drain with a controller attach/detach, we can hit a race where we call
controller publish before the unpublish has completed. This is
discouraged in the spec but plugins are supposed to handle it
safely. But if the storage provider's API is slow enough and the
plugin doesn't handle the case safely, the volume can get "locked"
into a state where the provider's API won't detach it cleanly.

Check the claim before making any external controller publish RPC
calls so that Nomad is responsible for the canonical information about
whether a volume is currently claimed.

This has a couple side-effects that also had to get fixed here:

* Changing the order means that the volume will have a past claim
  without a valid external node ID because it came from the client, and
  this uncovered a separate bug where we didn't assert the external node
  ID was valid before returning it. Fallthrough to getting the ID from
  the plugins in the state store in this case. We avoided this
  originally because of concerns around plugins getting lost during node
  drain but now that we've fixed that we may want to revisit it in
  future work.
* We should make sure we're handling `FailedPrecondition` cases from
  the controller plugin the same way we handle other retryable cases.
* Several tests had to be updated because they were assuming we fail
  in a particular order that we're no longer doing.
2022-03-29 09:44:00 -04:00
Ryo Nakao e11894a0cb
Ensure to close StreamFrame channel (#12248) 2022-03-28 10:28:23 -04:00
Tim Gross bc455fc69c
docs: changelog entry (#12393) 2022-03-28 09:44:58 -04:00
Shishir afcce3eea5
Display OS name in nomad node status command. (#12388)
Signed-off-by: Shishir Mahajan <smahajan@roblox.com>
2022-03-28 09:28:14 -04:00
Seth Hoenig e3c8a86e2e
Merge pull request #12381 from hashicorp/ci-gha-off
ci: set test log level off in gha
2022-03-25 15:13:42 -05:00
Tim Gross 5c7f2bad0b
E2E: namespace HCP vault and consul policies to avoid collisions (#12386)
Concurrent E2E runs can collide when provisioning policies on HCP
Consul and HCP Vault. Namespace these by the test run name, as we do
for most everything else.
2022-03-25 16:05:59 -04:00
Tim Gross 3c15236fd5
E2E: move example test to use golangs stdlib test runner (#12383)
Our E2E "framework" has a bunch of features around test discovery and
standing up infra that were never completed or fully used, and we
ended up building out a large test suite that ignored all that in lieu
of Terraform-provided infrastructure for the last couple years.

This changeset is a proposal (and demonstration) for gradually
migrating our E2E tests off the framework code so that developers can
write fairly ordinary golang stdlib testing tests.
2022-03-25 14:44:16 -04:00
Seth Hoenig e256afdfee ci: set test log level off in gha 2022-03-25 13:43:33 -05:00
Seth Hoenig 4b895a436a ci: set count to bypass caching 2022-03-25 13:43:33 -05:00
James Rasell 67b467983e
Merge pull request #12368 from hashicorp/f-1.3-boogie-nights
service discovery: add initial MVP implementation
2022-03-25 18:04:47 +01:00
Tim Gross 67b87e46f1
e2e: test for allocations replacement on disconnected clients (#12375)
This test exercises the behavior of clients that become disconnected
and have their allocations replaced. Future test cases will exercise
the `max_client_disconnect` field on the job spec.
2022-03-25 12:26:43 -04:00
Luiz Aoqui c387e2d97e
ci: fix semgrep rule for RPC authentication 2022-03-25 12:00:48 -04:00
Hunter Morris dcaf99dcc1
client: Add AWS EC2 instance-life-cycle from metadata to client fingerprint (#12371) 2022-03-25 11:50:52 -04:00
James Rasell 9449e1c3e2
Merge branch 'main' into f-1.3-boogie-nights 2022-03-25 16:40:32 +01:00
Luiz Aoqui 848a3b271f
docs: fix link and add note about Nomad v1.3.0 on raft v3 upgrade (#12378) 2022-03-25 10:11:46 -04:00
Seth Hoenig 42ccdf6db3
Merge pull request #12380 from hashicorp/ci-gha-verbose
ci: cleanup verbose mode and enable for gha
2022-03-25 08:01:57 -05:00
James Rasell 1604f46026
Merge pull request #12357 from hashicorp/f-update-cli-namespace-wildcard-support-wording
cli: update namespace wildcard help to be non-specific.
2022-03-25 11:24:39 +01:00
dgotlieb f53f61c6ce
Add grpc and http2 listeners to gateway docs (#12367)
Stating at Nomad version 1.2.0 `grpc` and `http2` [protocols are supported](https://github.com/hashicorp/nomad/pull/11187)
2022-03-24 17:09:19 -04:00
Karthick Ramachandran 122115c0ba
make stop job message clearer (#12252) 2022-03-24 16:38:43 -04:00
Seth Hoenig e85fbaf0ac ci: cleanup verbose mode and enable for gha
test_checks.sh was removed in 2019 and now just breaks if VERBOSE is
set when running tests via make targets

in GHA, use verbose mode to display what tests are running
2022-03-24 15:15:05 -05:00
Luiz Aoqui e7382a6b45
tests: fix rpc limit tests (#12364) 2022-03-24 15:31:47 -04:00
Seth Hoenig 987dda3092
Merge pull request #12274 from hashicorp/f-cgroupsv2
client: enable cpuset support for cgroups.v2
2022-03-24 14:22:54 -05:00
Michael Schurter 654d458960
core: add deprecated mvn tag to serf (#12327)
Revert a small part of #11600 after @lgfa29 discovered it would break
compatibility with Nomad <= v1.2!

Nomad <= v1.2 expects the `vsn` tag to exist in Serf. It has always been
`1`. It has no functional purpose. However it causes a parsing error if
it is not set:

https://github.com/hashicorp/nomad/blob/v1.2.6/nomad/util.go#L103-L108

This means Nomad servers at version v1.2 or older will not allow servers
without this tag to join.

The `mvn` minor version tag is also checked, but soft fails. I'm not
setting that because I want as much of this cruft gone as possible.
2022-03-24 14:44:21 -04:00
Luiz Aoqui 64b558c14c
core: store and check for Raft version changes (#12362)
Downgrading the Raft version protocol is not a supported operation.
Checking for a downgrade is hard since this information is not stored in
any persistent place. When a server re-joins a cluster with a prior Raft
version, the Serf tag is updated so Nomad can't tell that the version
changed.

Mixed version clusters must be supported to allow for zero-downtime
rolling upgrades. During this it's expected that the cluster will have
mixed Raft versions. Enforcing consistency strong version consistency
would disrupt this flow.

The approach taken here is to store the Raft version on disk. When the
server starts the `raft_protocol` value is written to the file
`data_dir/raft/version`. If that file already exists, its content is
checked against the current `raft_protocol` value to detect downgrades
and prevent the server from starting.

Any other types of errors are ignore to prevent disruptions that are
outside the control of operators. The only option in cases of an invalid
or corrupt file would be to delete it, making this check useless. So
just overwrite its content with the new version and provide guidance on
how to check that their cluster is an expected state.
2022-03-24 14:42:00 -04:00
Seth Hoenig 113b7eb727 client: cgroups v2 code review followup 2022-03-24 13:40:42 -05:00
Tim Gross ff1bed38cd
csi: add -secret and -parameter flag to volume snapshot create (#12360)
Pass-through the `-secret` and `-parameter` flags to allow setting
parameters for the snapshot and overriding the secrets we've stored on
the CSI volume in the state store.
2022-03-24 10:29:50 -04:00
Seth Hoenig 65c950baf4
Merge pull request #12369 from hashicorp/b-peers-perms
core: write peers.json file with correct permissions
2022-03-24 09:18:24 -05:00
Seth Hoenig a6c905616d core: write peers.json file with correct permissions 2022-03-24 08:26:31 -05:00
James Rasell 16b1f19ffe
api: move serviceregistration client to servics to match CLI.
The service registration client name was used to provide a
distinction between the service block and the service client. This
however creates new wording to understand and does not match the
CLI, therefore this change fixes that so we have a Services
client.

Consul specific objects within the service file have been moved to
the consul location to create a clearer separation.
2022-03-24 09:08:45 +01:00
James Rasell 96d8512c85
test: move remaining tests to use ci.Parallel. 2022-03-24 08:45:13 +01:00
dependabot[bot] 92021045b6
build(deps): bump github.com/stretchr/testify from 1.7.0 to 1.7.1 (#12306) 2022-03-23 19:12:51 -04:00
Seth Hoenig 2e5c6de820 client: enable support for cgroups v2
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.

Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.

Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.

When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.

Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.

The new cpuset management strategy fixes #11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Closes #11289
Fixes #11705 #11773 #11933
2022-03-23 11:35:27 -05:00
Tim Gross 5c91bc877c
csi: set gRPC authority header for unix domain socket (#12359)
The go-grpc library used by most CSI plugins doesn't require the
authority header to be set, which violates the HTTP2 spec but doesn't
impact Nomad because both sides of the connection are using the same
library. But plugins written in other languages (`democratic-csi` for
example) may have more strictly conforming gRPC server libraries and
we need to set the authority header manually.
2022-03-23 12:01:08 -04:00
James Rasell cd42572f0d
Merge pull request #12330 from hashicorp/f-gh-263
cli: add service commands for list, info, and delete.
2022-03-23 15:49:29 +01:00
Tim Gross 1743648901
CSI: fix timestamp from volume snapshot responses (#12352)
Listing snapshots was incorrectly returning nanoseconds instead of
seconds, and formatting of timestamps both list and create snapshot
was treating the timestamp as though it were nanoseconds instead of
seconds. This resulted in create timestamps always being displayed as
zero values.

Fix the unit conversion error in the command line and the incorrect
extraction in the CSI plugin client code. Beef up the unit tests to
make sure this code is actually exercised.
2022-03-23 10:39:28 -04:00
Tim Gross b7075f04fd
CSI: enforce single access mode at validation time (#12337)
A volume that has single-use access mode is feasibility checked during
scheduling to ensure that only a single reader or writer claim
exists. However, because feasibility checking is done one alloc at a
time before the plan is written, a job that's misconfigured to have
count > 1 that mounts one of these volumes will pass feasibility
checking.

Enforce the check at validation time instead to prevent us from even
trying to evaluation a job that's misconfigured this way.
2022-03-23 09:21:26 -04:00
James Rasell 0a9f54c525
cli: update namespace wildcard help to be non-specific.
A number of commands support namespace wildcard querying, so it
should be up to the sub-command to detail support, rather than
keeping this list up to date.
2022-03-23 12:56:48 +01:00
James Rasell bb8514fc75
core: remove node service registrations when node is down.
When a node fails its heart beating a number of actions are taken
to ensure state is cleaned. Service registrations a loosely tied
to nodes, therefore we should remove these from state when a node
is considered terminally down.
2022-03-23 09:42:46 +01:00
James Rasell a646333263
Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Tim Gross 33558cb51e
csi: fix handling of garbage collected node in node unpublish (#12350)
When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
2022-03-22 15:40:24 -04:00
Luiz Aoqui f8973d364e
core: use the new Raft API when removing peers (#12340)
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.

Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.

This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
2022-03-22 15:07:31 -04:00
Luiz Aoqui b5a42cd55d
set raft v3 as the default config (#12341) 2022-03-22 15:06:25 -04:00
Tim Gross 60cfeacd76
drainer: defer CSI plugins until last (#12324)
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
2022-03-22 10:26:56 -04:00
Jonathan Tey 4635be07ab
demo: add missing file for Kadalu CSI demo (#12336) 2022-03-22 09:50:50 -04:00
Tim Gross 2a2ebd0537
CSI: presentation improvements (#12325)
* Fix plugin capability sorting.
  The `sort.StringSlice` method in the stdlib doesn't actually sort, but
  instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
  Present allocations in modify index order so that newest allocations
  show up at the top of the list. This results in sorted allocs in
  `nomad volume status :id`, just like `nomad job status :id`.
  This is implemented in the HTTP response and not in the state store
  because the state store maintains two separate lists of allocs that
  are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
2022-03-22 09:48:38 -04:00
James Rasell f1e4db70d2
Merge pull request #12332 from hashicorp/b-node-fixup-drainupdate-msg-spelling
core: fixup node drain update message spelling.
2022-03-22 10:44:03 +01:00
James Rasell 93f4f6f09c
Merge pull request #12329 from hashicorp/f-gh-266-touchdown
client: hookup service wrapper for use in clients
2022-03-22 10:42:05 +01:00
Tim Gross e687a21da9
CSI: set plugin CSI_ENDPOINT env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00