API and RPC endpoints for ACLAuthMethods and ACLBindingRules should allow users
to send incomplete objects in order to, e.g., update single fields. This PR
provides "merging" functionality for these endpoints.
This change add the RPC ACL binding rule handlers. These handlers
are responsible for the creation, updating, reading, and deletion
of binding rules.
The write handlers are feature gated so that they can only be used
when all federated servers are running the required version.
The HTTP API handlers and API SDK have also been added where
required. This allows the endpoints to be called from the API by users
and clients.
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) require that we authenticate a RPC request before forwarding. Add a
new top-level `Authenticate` method to the server and have it return an
`AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of
this identity for performing authorization.
This changeset includes:
* The main implementation of `Authenticate`
* Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint
returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At
some point we might want to give this an equivalent HTTP endpoint but I didn't
want to add that to our public API until some of the other Workload Identity
work is solidified, especially if we don't need it yet.
* A full coverage test of the `Authenticate` method. This sets up two server
nodes with mTLS and ACLs, some tokens, and some allocations with workload
identities.
* Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and
see how authorization happens after forwarding.
* A new semgrep rule for `Authenticate`, which we'll need to update once we're
ready to wire up more RPC endpoints with authorization steps.
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) requires that we thread the `RPCContext` through all RPC
handlers so that we can access the underlying connection. This changeset adds
the context to everywhere we intend to initially support it and intentionally
excludes streaming RPCs and client RPCs.
To improve the ergonomics of adding the context everywhere its needed and to
clarify the requirements of dynamic vs static handlers, I've also done a good
bit of refactoring here:
* canonicalized the RPC handler fields so they're as close to identical as
possible without introducing unused fields (i.e. I didn't add loggers if the
handler doesn't use them already).
* canonicalized the imports in the handler files.
* added a `NewExampleEndpoint` function for each handler that ensures we're
constructing the handlers with the required arguments.
* reordered the registration in server.go to match the order of the files (to
make it easier to see if we've missed one), and added a bunch of commentary
there as to what the difference between static and dynamic handlers is.
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.
This PR fixes it.
Relates to #13120
ACL tokens are granted permissions either by direct policy links
or via ACL role links. Callers should therefore be able to read
policies directly assigned to the caller token or indirectly by
ACL role links.
This changes adds ACL role creation and deletion to the event
stream. It is exposed as a single topic with two types; the filter
is primarily the role ID but also includes the role name.
While conducting this work it was also discovered that the events
stream has its own ACL resolution logic. This did not account for
ACL tokens which included role links, or tokens with expiry times.
ACL role links are now resolved to their policies and tokens are
checked for expiry correctly.
* One-time tokens are not replicated between regions, so we don't want to enforce
that the version check across all of serf, just members in the same region.
* Scheduler: Disconnected clients handling is specific to a single region, so we
don't want to enforce that the version check across all of serf, just members in
the same region.
* Variables: enforce version check in Apply RPC
* Cleans up a bunch of legacy checks.
This changeset is specific to 1.4.x and the changes for previous versions of
Nomad will be manually backported in a separate PR.
An ACL roles name must be unique, however, a bug meant multiple
roles of the same same could be created. This fixes that problem
with checks in the RPC handler and state store.
Making the ACL Role listing return object a stub future-proofs the
endpoint. In the event the role object grows, we are not bound by
having to return all fields within the list endpoint or change the
signature of the endpoint to reduce the list return size.
ACL Roles along with policies and global token will be replicated
from the authoritative region to all federated regions. This
involves a new replication loop running on the federated leader.
Policies and roles may be replicated at different times, meaning
the policies and role references may not be present within the
local state upon replication upsert. In order to bypass the RPC
and state check, a new RPC request parameter has been added. This
is used by the replication process; all other callers will trigger
the ACL role policy validation check.
There is a new ACL RPC endpoint to allow the reading of a set of
ACL Roles which is required by the replication process and matches
ACL Policies and Tokens. A bug within the ACL Role listing RPC has
also been fixed which returned incorrect data during blocking
queries where a deletion had occurred.
ACL tokens can now utilize ACL roles in order to provide API
authorization. Each ACL token can be created and linked to an
array of policies as well as an array of ACL role links. The link
can be provided via the role name or ID, but internally, is always
resolved to the ID as this is immutable whereas the name can be
changed by operators.
When resolving an ACL token, the policies linked from an ACL role
are unpacked and combined with the policy array to form the
complete auth set for the token.
The ACL token creation endpoint handles deduplicating ACL role
links as well as ensuring they exist within state.
When reading a token, Nomad will also ensure the ACL role link is
current. This handles ACL roles being deleted from under a token
from a UX standpoint.
New ACL Role RPC endpoints have been created to allow the creation,
update, read, and deletion of ACL roles. All endpoints require a
management token; in the future readers will also be allowed to
view roles associated to their ACL token.
The create endpoint in particular is responsible for deduplicating
ACL policy links and ensuring named policies are found within
state. This is done within the RPC handler so we perform a single
loop through the links for slight efficiency.
When applying a raft log to expire ACL tokens, we need to use a
timestamp provided by the leader so that the result is deterministic
across servers. Use leader's timestamp from RPC call
The ACL token state schema has been updated to utilise two new
indexes which track expiration of tokens that are configured with
an expiration TTL or time. A new state function allows listing
ACL expired tokens which will be used by internal garbage
collection.
The ACL endpoint has been modified so that all validation happens
within a single function call. This is easier to understand and
see at a glance. The ACL token validation now also includes logic
for expiry TTL and times. The ACL endpoint upsert tests have been
condensed into a single, table driven test.
There is a new token canonicalize which provides a single place
for token canonicalization, rather than logic spread in the RPC
handler.
Fix a panic in handling one-time auth tokens, used to support `nomad ui
--authenticate`.
If the nomad leader is a 1.1.x with some servers running as 1.0.x, the
pre-1.1.0 servers risk crashing and the cluster may lose quorum. That
can happen when `nomad authenticate -ui` command is issued, or when the
leader scans for expired tokens every 10 minutes.
Fixed#10943 .
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
If ACL Request is unauthenticated, we should honor the anonymous token.
This PR makes few changes:
* `GetPolicy` endpoints may return policy if anonymous policy allows it,
or return permission denied otherwise.
* `ListPolicies` returns an empty policy list, or one with anonymous
policy if one exists.
Without this PR, the we return an incomprehensible error.
Before:
```
$ curl http://localhost:4646/v1/acl/policy/doesntexist; echo
acl token lookup failed: index error: UUID must be 36 characters
$ curl http://localhost:4646/v1/acl/policies; echo
acl token lookup failed: index error: UUID must be 36 characters
```
After:
```
$ curl http://localhost:4646/v1/acl/policy/doesntexist; echo
Permission denied
$ curl http://localhost:4646/v1/acl/policies; echo
[]
```
Noticed that ACL endpoints return 500 status code for user errors. This
is confusing and can lead to false monitoring alerts.
Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message. Codes for now match
HTTP codes to ease reasoning.
```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```