The signature of the `raftApply` function requires that the caller unwrap the
first returned value (the response from `FSM.Apply`) to see if it's an
error. This puts the burden on the caller to remember to check two different
places for errors, and we've done so inconsistently.
Update `raftApply` to do the unwrapping for us and return any `FSM.Apply` error
as the error value. Similar work was done in Consul in
https://github.com/hashicorp/consul/pull/9991. This eliminates some boilerplate
and surfaces a few minor bugs in the process:
* job deregistrations of already-GC'd jobs were still emitting evals
* reconcile job summaries does not return scheduler errors
* node updates did not report errors associated with inconsistent service
discovery or CSI plugin states
Note that although _most_ of the `FSM.Apply` functions return only errors (which
makes it tempting to remove the first return value entirely), there are few that
return `bool` for some reason and Variables relies on the response value for
proper CAS checking.
This changeset configures the RPC rate metrics that were added in #15515 to all
the RPCs that support authenticated HTTP API requests. These endpoints already
configured with pre-forwarding authentication in #15870, and a handful of others
were done already as part of the proof-of-concept work. So this changeset is
entirely copy-and-pasting one method call into a whole mess of handlers.
Upcoming PRs will wire up pre-forwarding auth and rate metrics for the remaining
set of RPCs that have no API consumers or aren't authenticated, in smaller
chunks that can be more thoughtfully reviewed.
This changeset allows Workload Identities to authenticate to all the RPCs that
support HTTP API endpoints, for use with PR #15864.
* Extends the work done for pre-forwarding authentication to all RPCs that
support a HTTP API endpoint.
* Consolidates the auth helpers used by the CSI, Service Registration, and Node
endpoints that are currently used to support both tokens and client secrets.
Intentionally excluded from this changeset:
* The Variables endpoint still has custom handling because of the implicit
policies. Ideally we'll figure out an efficient way to resolve those into real
policies and then we can get rid of that custom handling.
* The RPCs that don't currently support auth tokens (i.e. those that don't
support HTTP endpoints) have not been updated with the new pre-forwarding auth
We'll be doing this under a separate PR to support RPC rate metrics.
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) requires that we thread the `RPCContext` through all RPC
handlers so that we can access the underlying connection. This changeset adds
the context to everywhere we intend to initially support it and intentionally
excludes streaming RPCs and client RPCs.
To improve the ergonomics of adding the context everywhere its needed and to
clarify the requirements of dynamic vs static handlers, I've also done a good
bit of refactoring here:
* canonicalized the RPC handler fields so they're as close to identical as
possible without introducing unused fields (i.e. I didn't add loggers if the
handler doesn't use them already).
* canonicalized the imports in the handler files.
* added a `NewExampleEndpoint` function for each handler that ensures we're
constructing the handlers with the required arguments.
* reordered the registration in server.go to match the order of the files (to
make it easier to see if we've missed one), and added a bunch of commentary
there as to what the difference between static and dynamic handlers is.
* One-time tokens are not replicated between regions, so we don't want to enforce
that the version check across all of serf, just members in the same region.
* Scheduler: Disconnected clients handling is specific to a single region, so we
don't want to enforce that the version check across all of serf, just members in
the same region.
* Variables: enforce version check in Apply RPC
* Cleans up a bunch of legacy checks.
This changeset is specific to 1.4.x and the changes for previous versions of
Nomad will be manually backported in a separate PR.
Nomad's original autopilot was importing from a private package in Consul. It
has been moved out to a shared library. Switch Nomad to use this library so that
we can eliminate the import of Consul, which is necessary to build Nomad ENT
with the current version of the Consul SDK. This also will let us pick up
autopilot improvements shared with Consul more easily.
* core: allow pause/un-pause of eval broker on region leader.
* agent: add ability to pause eval broker via scheduler config.
* cli: add operator scheduler commands to interact with config.
* api: add ability to pause eval broker via scheduler config
* e2e: add operator scheduler test for eval broker pause.
* docs: include new opertor scheduler CLI and pause eval API info.