CSI `CreateVolume` RPC is idempotent given that the topology,
capabilities, and parameters are unchanged. CSI volumes have many
user-defined fields that are immutable once set, and many fields that
are not user-settable.
Update the `Register` RPC so that updating a volume via the API merges
onto any existing volume without touching Nomad-controlled fields,
while validating it with the same strict requirements expected for
idempotent `CreateVolume` RPCs.
Also, clarify that this state store method is used for everything, not just
for the `Register` RPC.
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
The `CreateSnapshot` RPC expects a plugin ID to be set by the API, but
in the common case of the `nomad volume snapshot create` command, we
don't ask the user for the plugin ID because it's available from the
volume we're snapshotting.
Change the order of the RPC so that we get the volume first and then
use the volume's plugin ID for the plugin if the API didn't set the
value.
The `CSIPlugin.List` RPC was intended to accept a prefix to filter the
list of plugins being listed. This was being accidentally being done
in the state store instead, which contributed to incorrect filtering
behavior for plugins in the `volume plugin status` command.
Move the prefix matching into the RPC so that it calls the
prefix-matching method in the state store if we're looking for a
prefix.
Update the `plugin status command` to accept a prefix for the plugin
ID argument so that it matches the expected behavior of other commands.
If any E2E test hangs, it'll eventually timeout and panic, causing the
all the remaining tests to fail. External commands should use a short
context whenever possible so we can fail the test quickly and move on
to the next test.
The `TestRescheduleProgressDeadlineFail` E2E test failed during test
cleanup because the error message "progress deadline expired" that it
emits when we stop the job does not match the one expected from
monitoring the `job stop` command. Update the `StopJob` helper to
tolerate this use case as well.
The `Metrics` suite uses prometheus to scrape Nomad metrics so that
we're testing the full user experience of extracting metrics from
Nomad. With the addition of mTLS, we need to make sure prometheus also
has mTLS configuration because the metrics endpoint is protected.
Update the Nomad client configuration and prometheus job to bind-mount
the client's certs into the task so that the job can use these certs
to scrape the server. This is a temporary solution that gets the job
passing; we should give the job its own certificates (issued by
Vault?) when we've done some of the infrastructure rework we'd like.
When we set the headers for CSI secrets in the `WriteOptions`, it
turns out that we're not always passing a non-nil object. In that
case, instanstiate it on demand in the API.
When using a prefix value and the * wildcard for namespace, the endpoint
would not take the prefix value into consideration due to the order in
which the checks were executed but also the logic for retrieving volumes
from the state store.
This commit changes the order to check for a prefix first and wraps the
result iterator of the state store query in a filter to apply the
prefix.
The AWS EBS plugin appears to use the name field of the volume as an
idempotency token that persists across the entire AWS account, not
just the plugin lifespan.
Also fix the regex for the volume ID, which was originally taken from
the job ID regex but isn't actually the same. This hasn't failed tests
for us because we've always passed in the same volume ID.
With mTLS enabled, using `curl` in a bash script for validation
involves having to configure arguments to `curl` based on whether or
not the test infrastructure is using mTLS, whether ACLs are enabled,
etc. Use the new `operator api` command instead to pick up the client
configuration from the test environment automatically.
The HTTP endpoint for CSI manually serializes the internal struct to
the API struct for purposes of redaction (see also #10470). Add fields
that were missing from this serialization so they don't show up as
always empty in the API response.
The paginator logic was built when go-memdb iterators would return items
ordered lexicographically by their ID prefixes, but #12054 added the
option for some tables to return results ordered by their `CreateIndex`
instead, which invalidated the previous paginator assumption.
The iterator used for pagination must still return results in some order
so that the paginator can properly handle requests where the next_token
value is not present in the results anymore (e.g., the eval was GC'ed).
In these situations, the paginator will start the returned page in the
first element right after where the requested token should've been.
This commit moves the logic to generate pagination tokens from the
elements being paginated to the iterator itself so that callers can have
more control over the token format to make sure they are properly
ordered and stable.
It also allows configuring the paginator as being ordered in ascending
or descending order, which is relevant when looking for a token that may
not be present anymore.
The Prestart hook for task runner hooks doesn't get called when we
restore a task, because the task is already running. The Postrun hook
for CSI plugin supervisors needs the socket path to have been
populated so that the client has a valid path.
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.
The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.
Closes https://github.com/hashicorp/nomad/issues/11075
This PR
- upgrades the serf library
- has the test start the join process using the un-joined server first
- disables schedulers on the servers
- uses the WaitForLeader and wantPeers helpers
Not sure which, if any of these actually improves the flakiness of this test.
* chore: bump to latest docs-page
* fix: bump to react-consent-manager patch
* chore: bump to consent-manager with events dep
* chore: bump to stable consent-manager release
In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.
The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.
Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.