Main Changes:
• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.
Secondary Changes:
• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
* ACL Authorizer overhaul
To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.
Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.
* Include stub for extra enterprise rules in the global management policy
* Allow for an upgrade of the global-management policy
This PR is almost a complete rewrite of the ACL system within Consul. It brings the features more in line with other HashiCorp products. Obviously there is quite a bit left to do here but most of it is related docs, testing and finishing the last few commands in the CLI. I will update the PR description and check off the todos as I finish them over the next few days/week.
Description
At a high level this PR is mainly to split ACL tokens from Policies and to split the concepts of Authorization from Identities. A lot of this PR is mostly just to support CRUD operations on ACLTokens and ACLPolicies. These in and of themselves are not particularly interesting. The bigger conceptual changes are in how tokens get resolved, how backwards compatibility is handled and the separation of policy from identity which could lead the way to allowing for alternative identity providers.
On the surface and with a new cluster the ACL system will look very similar to that of Nomads. Both have tokens and policies. Both have local tokens. The ACL management APIs for both are very similar. I even ripped off Nomad's ACL bootstrap resetting procedure. There are a few key differences though.
Nomad requires token and policy replication where Consul only requires policy replication with token replication being opt-in. In Consul local tokens only work with token replication being enabled though.
All policies in Nomad are globally applicable. In Consul all policies are stored and replicated globally but can be scoped to a subset of the datacenters. This allows for more granular access management.
Unlike Nomad, Consul has legacy baggage in the form of the original ACL system. The ramifications of this are:
A server running the new system must still support other clients using the legacy system.
A client running the new system must be able to use the legacy RPCs when the servers in its datacenter are running the legacy system.
The primary ACL DC's servers running in legacy mode needs to be a gate that keeps everything else in the entire multi-DC cluster running in legacy mode.
So not only does this PR implement the new ACL system but has a legacy mode built in for when the cluster isn't ready for new ACLs. Also detecting that new ACLs can be used is automatic and requires no configuration on the part of administrators. This process is detailed more in the "Transitioning from Legacy to New ACL Mode" section below.
Prior to this change, prepared queries had the following behavior for
ACLs, which will need to change to support templates:
1. A management token, or a token with read access to the service being
queried needed to be provided in order to create a prepared query.
2. The token used to create the prepared query was stored with the query
in the state store and used to execute the query.
3. A management token, or the token used to create the query needed to be
supplied to perform and CRUD operations on an existing prepared query.
This was pretty subtle and complicated behavior, and won't work for
templates since the service name is computed at execution time. To solve
this, we introduce a new "prepared-query" ACL type, where the prefix
applies to the query name for static prepared query types and to the
prefix for template prepared query types.
With this change, the new behavior is:
1. A management token, or a token with "prepared-query" write access to
the query name or (soon) the given template prefix is required to do
any CRUD operations on a prepared query, or to list prepared queries
(the list is filtered by this ACL).
2. You will no longer need a management token to list prepared queries,
but you will only be able to see prepared queries that you have access
to (you get an empty list instead of permission denied).
3. When listing or getting a query, because it was easy to capture
management tokens given the past behavior, this will always blank out
the "Token" field (replacing the contents as <hidden>) for all tokens
unless a management token is supplied. Going forward, we should
discourage people from binding tokens for execution unless strictly
necessary.
4. No token will be captured by default when a prepared query is created.
If the user wishes to supply an execution token then can pass it in via
the "Token" field in the prepared query definition. Otherwise, this
field will default to empty.
5. At execution time, we will use the captured token if it exists with the
prepared query definition, otherwise we will use the token that's passed
in with the request, just like we do for other RPCs (or you can use the
agent's configured token for DNS).
6. Prepared queries with no name (accessible only by ID) will not require
ACLs to create or modify (execution time will depend on the service ACL
configuration). Our argument here is that these are designed to be
ephemeral and the IDs are as good as an ACL. Management tokens will be
able to list all of these.
These changes enable templates, but also enable delegation of authority to
manage the prepared query namespace.