The service registration client name was used to provide a
distinction between the service block and the service client. This
however creates new wording to understand and does not match the
CLI, therefore this change fixes that so we have a Services
client.
Consul specific objects within the service file have been moved to
the consul location to create a clearer separation.
This PR introduces support for using Nomad on systems with cgroups v2 [1]
enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux
distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems
for Nomad users.
Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer,
but not so for managing cpuset cgroups. Before, Nomad has been making use of
a feature in v1 where a PID could be a member of more than one cgroup. In v2
this is no longer possible, and so the logic around computing cpuset values
must be modified. When Nomad detects v2, it manages cpuset values in-process,
rather than making use of cgroup heirarchy inheritence via shared/reserved
parents.
Nomad will only activate the v2 logic when it detects cgroups2 is mounted at
/sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2
mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to
use the v1 logic, and should operate as before. Systems that do not support
cgroups v2 are also not affected.
When v2 is activated, Nomad will create a parent called nomad.slice (unless
otherwise configured in Client conifg), and create cgroups for tasks using
naming convention <allocID>-<task>.scope. These follow the naming convention
set by systemd and also used by Docker when cgroups v2 is detected.
Client nodes now export a new fingerprint attribute, unique.cgroups.version
which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by
Nomad.
The new cpuset management strategy fixes#11705, where docker tasks that
spawned processes on startup would "leak". In cgroups v2, the PIDs are
started in the cgroup they will always live in, and thus the cause of
the leak is eliminated.
[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.htmlCloses#11289Fixes#11705#11773#11933
The go-grpc library used by most CSI plugins doesn't require the
authority header to be set, which violates the HTTP2 spec but doesn't
impact Nomad because both sides of the connection are using the same
library. But plugins written in other languages (`democratic-csi` for
example) may have more strictly conforming gRPC server libraries and
we need to set the authority header manually.
Listing snapshots was incorrectly returning nanoseconds instead of
seconds, and formatting of timestamps both list and create snapshot
was treating the timestamp as though it were nanoseconds instead of
seconds. This resulted in create timestamps always being displayed as
zero values.
Fix the unit conversion error in the command line and the incorrect
extraction in the CSI plugin client code. Beef up the unit tests to
make sure this code is actually exercised.
A volume that has single-use access mode is feasibility checked during
scheduling to ensure that only a single reader or writer claim
exists. However, because feasibility checking is done one alloc at a
time before the plan is written, a job that's misconfigured to have
count > 1 that mounts one of these volumes will pass feasibility
checking.
Enforce the check at validation time instead to prevent us from even
trying to evaluation a job that's misconfigured this way.
A number of commands support namespace wildcard querying, so it
should be up to the sub-command to detail support, rather than
keeping this list up to date.
When a node fails its heart beating a number of actions are taken
to ensure state is cleaned. Service registrations a loosely tied
to nodes, therefore we should remove these from state when a node
is considered terminally down.
When a node is garbage collected, we assume that the volume is no
longer attached to it and ignore the `ErrUnknownNode` error. But we
used `errors.Is` to check for a wrapped error, and RPC flattens the
errors during serialization. This results in an error check that works
in automated testing but not in real clusters. Use a string contains
check instead.
Raft v3 introduced a new API for adding and removing peers that takes
the peer ID instead of the address.
Prior to this change, Nomad would use the remote peer Raft version for
deciding which API to use, but this would not work in the scenario where
a Raft v3 server tries to remove a Raft v2 server; the code running uses
v3 so it's unable to call the v2 API.
This change uses the Raft version of the server running the code to
decide which API to use. If the remote peer is a Raft v2, it uses the
server address as the ID.
When a node is drained, system jobs are left until last so that
operators can rely on things like log shippers running even as their
applications are getting drained off. Include CSI plugins in this set
so that Controller plugins deployed as services can be handled as
gracefully as Node plugins that are running as system jobs.
* Fix plugin capability sorting.
The `sort.StringSlice` method in the stdlib doesn't actually sort, but
instead constructs a sorting type which you call `Sort()` on.
* Sort allocations for plugins by modify index.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad plugin status :id`, just like `nomad job status :id`.
* Sort allocations for volumes in HTTP response.
Present allocations in modify index order so that newest allocations
show up at the top of the list. This results in sorted allocs in
`nomad volume status :id`, just like `nomad job status :id`.
This is implemented in the HTTP response and not in the state store
because the state store maintains two separate lists of allocs that
are merged before sending over the API.
* Fix length of alloc IDs in `nomad volume status` output
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
`CSI_ENDPOINT` variable, and unfortunately not all plugins
agree. Allow the user to override the `CSI_ENDPOINT` to workaround
those cases.
* Update all demos and tests with CSI_ENDPOINT
The `ConnectACLsE2ETest` checks that the SI tokens have been properly
cleaned up between tests, but following the change to use HCP the
previous `Connect` test suite will often have SI tokens that haven't
been cleaned up by the time this test suite runs. Wait for the SI
tokens to be cleaned up at the start of the test to ensure we have a
clean state.
Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255
This PR makes it so gotestsum is invoked only in CircleCI. Also the
HCLogger(t) is plumbed more correctly in TestServer and TestAgent so
that they respect NOMAD_TEST_LOG_LEVEL.
The reason for these is we'll want to disable logging in GHA,
where spamming the disk with logs really drags performance.
Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits:
* Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting.
* Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting
* Our E2E exercises HCP, which is important to us as an organization
* With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner.
* Packer builds for Linux and Windows become much simpler.
tl;dr way less janky shell scripting!
The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.
This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.
This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.
The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.
Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>
When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.
In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.
This is a followup to having tests run in serial in CI.
The e2e package isn't in CI, but lets use the helper anyway
so we can setup semgrep rules covering the entire repository.
This is a followup to running tests in serial in CI.
Since the API package cannot import anything outside of api/,
copy the ci.Parallel function into api/internal/testutil, and
have api tests use that.