The paginator logic was built when go-memdb iterators would return items
ordered lexicographically by their ID prefixes, but #12054 added the
option for some tables to return results ordered by their `CreateIndex`
instead, which invalidated the previous paginator assumption.
The iterator used for pagination must still return results in some order
so that the paginator can properly handle requests where the next_token
value is not present in the results anymore (e.g., the eval was GC'ed).
In these situations, the paginator will start the returned page in the
first element right after where the requested token should've been.
This commit moves the logic to generate pagination tokens from the
elements being paginated to the iterator itself so that callers can have
more control over the token format to make sure they are properly
ordered and stable.
It also allows configuring the paginator as being ordered in ascending
or descending order, which is relevant when looking for a token that may
not be present anymore.
The Prestart hook for task runner hooks doesn't get called when we
restore a task, because the task is already running. The Postrun hook
for CSI plugin supervisors needs the socket path to have been
populated so that the client has a valid path.
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
The advertise.rpc config option is not intuitive. At first glance you'd
assume it works like advertise.http or advertise.serf, but it does not.
The current behavior is working as intended, but the documentation is
very hard to parse and doesn't draw a clear picture of what the setting
actually does.
Closes https://github.com/hashicorp/nomad/issues/11075
This PR
- upgrades the serf library
- has the test start the join process using the un-joined server first
- disables schedulers on the servers
- uses the WaitForLeader and wantPeers helpers
Not sure which, if any of these actually improves the flakiness of this test.
* chore: bump to latest docs-page
* fix: bump to react-consent-manager patch
* chore: bump to consent-manager with events dep
* chore: bump to stable consent-manager release
In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.
The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.
Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.
When the alloc runner claims a volume, an allocation for a previous
version of the job may still have the volume claimed because it's
still shutting down. In this case we'll receive an error from the
server. Retry this error until we succeed or until a very long timeout
expires, to give operators a chance to recover broken plugins.
Make the alloc runner hook tolerant of temporary RPC failures.
* Remove redundant schedulable check in `FreeWriteClaims`. If a volume
has been created but not yet claimed, its capabilities will be checked
in `WriteSchedulable` at both scheduling time and claim time. We don't
need to also check them in the `FreeWriteClaims` method.
* Enforce maximum volume claims for writers.
When the scheduler checks feasibility for CSI volumes, the check is
fairly loose: earlier versions of the same job are not counted as
active claims. This allows the scheduler to place new allocations
for the new version of a job, under the assumption that we'll replace
the existing allocations and their volume claims.
But when the alloc runner claims the volume, we need to enforce the
active claims even if they're for allocations of an earlier version of
the job. Otherwise we'll try to mount a volume that's currently being
unmounted, and this will cause replacement allocations to frequently
fail.
* Enforce single-node reader check for read-only volumes. When the
alloc runner makes a claim for a read-only volume, we only check that
the volume is potentially schedulable and not that it actually has
free read claims.
If a plugin job fails before successfully fingerprinting the plugins,
the plugin will not exist when we try to delete the job. Tolerate
missing plugins.