Commit Graph

67 Commits

Author SHA1 Message Date
Daniel Nephin b73b68d696 acl: remove ACL.GetPolicy endpoint and resolve legacy acls
And all code that was no longer used once those two were removed.
2021-09-29 14:33:19 -04:00
Daniel Nephin 33a5448604
Merge pull request #11136 from hashicorp/dnephin/acl-resolver-fix-default-authz
acl: fix default Authorizer for down_policy extend-cache/async-cache
2021-09-29 13:45:12 -04:00
Daniel Nephin 30fe14eed3 acl: fix default authorizer for down_policy
This was causing a nil panic because a nil authorizer is no longer valid after the cleanup done
in https://github.com/hashicorp/consul/pull/10632.
2021-09-23 18:12:22 -04:00
Daniel Nephin a6a7069ecf Remove t.Parallel from TestACLResolver_DownPolicy
These tests run in under 10ms, t.Parallel does nothing but slow them down and
make failures harder to debug when one panics.
2021-09-23 18:12:22 -04:00
Daniel Nephin b40bdc9e98 acl: remove remaining tests that use ACL.Apply
In preparation for removing ACL.Apply.

Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.

The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
2021-09-21 19:35:26 -04:00
Daniel Nephin ad9748adc3 acl: remove two commented out tests for legacy ACL replication
They were commented out in 2018.
2021-09-21 17:57:29 -04:00
R.B. Boyer 7fa8f19077
acl: ensure the global management policy grants all necessary partition privileges (#11072) 2021-09-16 15:53:10 -05:00
R.B. Boyer 4e7b6888e3
acl: fix intention:*:write checks (#11061)
This is a partial revert of #10793
2021-09-16 11:08:45 -05:00
Freddy 494764ee2d
acl: small resolver changes to account for partitions (#11052)
Also refactoring the enterprise side of a test to make it easier to reason about.
2021-09-16 09:17:02 -05:00
R.B. Boyer 4206f585f0 acl: adding a new mesh resource 2021-09-03 09:12:03 -04:00
Daniel Nephin 5a82859ee1 acl: small improvements to ACLResolver disable due to RPC error
Remove the error return, so that not handling is not reported as an
error by errcheck. It was returning the error passed as an arg
unmodified so there is no reason to return the same value that was
passed in.

Remove the term upstreams to remove any confusion with the term used in
service mesh.

Remove the AutoDisable field, and replace it with the TTL value, using 0
to indicate the setting is turned off.

Replace "not Before" with "After".

Add some test coverage to show the behaviour is still correct.
2021-08-17 13:34:18 -04:00
Daniel Nephin 75baa22e64 acl: remove ACLResolver config fields from consul.Config 2021-08-17 13:32:52 -04:00
Daniel Nephin 454f62eacc acl: replace ACLResolver.Config with its own struct
This is step toward decoupling ACLResolver from the agent/consul
package.
2021-08-17 13:32:52 -04:00
Daniel Nephin 047abdd73c acl: remove ACLDatacenter
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.

This change was made by GoLand refactor, which did most of the work for me.
2021-08-06 18:27:00 -04:00
Daniel Nephin 695963acb7 acl: recouple acl filtering from ACLResolver
ACL filtering only needs an authorizer and a logger. We can decouple filtering from
the ACLResolver by passing in the necessary logger.

This change is being made in preparation for moving the ACLResolver into an acl package
2021-08-05 15:20:13 -04:00
Daniel Nephin c8eedabc7c acl: move vetRegisterWithACL and vetDeregisterWithACL
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.

This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
2021-08-05 15:17:54 -04:00
Daniel Nephin 9b41e7287f acl: use acl.ManangeAll when ACLs are disabled
Instead of returning nil and checking for nilness

Removes a bunch of nil checks, and fixes one test failures.
2021-07-30 12:58:24 -04:00
Daniel Nephin db29c51cd2 acl: use SetHash consistently in testPolicyForID
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
2021-07-16 17:59:56 -04:00
Daniel Nephin 27871498f0 Fix a data race in TestACLResolver_Client
By setting the hash when we create the policy.

```
WARNING: DATA RACE
Read at 0x00c0028b4b10 by goroutine 1182:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:701 +0x40d
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Previous write at 0x00c0028b4b10 by goroutine 1509:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:730 +0x3a8
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Goroutine 1182 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1669 +0x459
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 1509 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1668 +0x415
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
2021-07-14 18:58:16 -04:00
Matt Keeler aa0eb60f57
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
Daniel Nephin 27c38bfebb
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Daniel Nephin b9306d8827 acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Matt Keeler 1f40f51a58
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
freddygv 21c4708fe9 Add topology ACL filter 2020-10-05 10:24:50 -06:00
André Cruz 673bd69f36
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
Daniel Nephin 80ff174880 testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Daniel Nephin 8b6036c077 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00
Daniel Nephin 1ef8279ac9
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin 5f24171f13 ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Matt Keeler cdc4b20afa
ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 12:54:27 -04:00
Hans Hasselberg 7404712854
acl: do not resolve local tokens from remote dcs (#8068) 2020-06-09 21:13:09 +02:00
R.B. Boyer 16db20b1f3
acl: remove the deprecated `acl_enforce_version_8` option (#7991)
Fixes #7292
2020-05-29 16:16:03 -05:00
Matt Keeler 849eedd142
Fix identity resolution on clients and in secondary dcs (#7862)
Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests. 

With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.
2020-05-13 13:00:08 -04:00
R.B. Boyer a7fb26f50f
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Chris Piraino 3dd0b59793
Allow users to configure either unstructured or JSON logging (#7130)
* hclog Allow users to choose between unstructured and JSON logging
2020-01-28 17:50:41 -06:00
Matt Keeler 442924c35a
Sync of OSS changes to support namespaces (#6909) 2019-12-09 21:26:41 -05:00
Matt Keeler 87c44a3b8d
Ensure that cache entries for tokens are prefixed “token-secret… (#6688)
This will be necessary once we store other types of identities in here.
2019-10-25 13:05:43 -04:00
Matt Keeler a688ea952d
Update the ACL Resolver to allow for Consul Enterprise specific hooks. (#6687) 2019-10-25 11:06:16 -04:00
Matt Keeler 1270a93274
Updates to allow for Namespacing ACL resources in Consul Enterp… (#6675)
Main Changes:

• method signature updates everywhere to account for passing around enterprise meta.
• populate the EnterpriseAuthorizerContext for all ACL related authorizations.
• ACL resource listings now operate like the catalog or kv listings in that the returned entries are filtered down to what the token is allowed to see. With Namespaces its no longer all or nothing.
• Modified the acl.Policy parsing to abstract away basic decoding so that enterprise can do it slightly differently. Also updated method signatures so that when parsing a policy it can take extra ent metadata to use during rules validation and policy creation.

Secondary Changes:

• Moved protobuf encoding functions out of the agentpb package to eliminate circular dependencies.
• Added custom JSON unmarshalers for a few ACL resource types (to support snake case and to get rid of mapstructure)
• AuthMethod validator cache is now an interface as these will be cached per-namespace for Consul Enterprise.
• Added checks for policy/role link existence at the RPC API so we don’t push the request through raft to have it fail internally.
• Forward ACL token delete request to the primary datacenter when the secondary DC doesn’t have the token.
• Added a bunch of ACL test helpers for inserting ACL resource test data.
2019-10-24 14:38:09 -04:00
Matt Keeler f9a43a1e2d
ACL Authorizer overhaul (#6620)
* ACL Authorizer overhaul

To account for upcoming features every Authorization function can now take an extra *acl.EnterpriseAuthorizerContext. These are unused in OSS and will always be nil.

Additionally the acl package has received some thorough refactoring to enable all of the extra Consul Enterprise specific authorizations including moving sentinel enforcement into the stubbed structs. The Authorizer funcs now return an acl.EnforcementDecision instead of a boolean. This improves the overall interface as it makes multiple Authorizers easily chainable as they now indicate whether they had an authoritative decision or should use some other defaults. A ChainedAuthorizer was added to handle this Authorizer enforcement chain and will never itself return a non-authoritative decision.

* Include stub for extra enterprise rules in the global management policy

* Allow for an upgrade of the global-management policy
2019-10-15 16:58:50 -04:00
Aestek 04a52a967b acl: allow service deregistration with node write permission (#5217)
With ACLs enabled if an agent is wiped and restarted without a leave
it can no longer deregister the services it had previously registered
because it no longer has the tokens the services were registered with.
To remedy that we allow service deregistration from tokens with node
write permission.
2019-06-27 14:24:34 +02:00
R.B. Boyer 5a505c5b3a acl: adding support for kubernetes auth provider login (#5600)
* auth providers
* binding rules
* auth provider for kubernetes
* login/logout
2019-04-26 14:49:25 -05:00
R.B. Boyer 9542fdc9bc acl: adding Roles to Tokens (#5514)
Roles are named and can express the same bundle of permissions that can
currently be assigned to a Token (lists of Policies and Service
Identities). The difference with a Role is that it not itself a bearer
token, but just another entity that can be tied to a Token.

This lets an operator potentially curate a set of smaller reusable
Policies and compose them together into reusable Roles, rather than
always exploding that same list of Policies on any Token that needs
similar permissions.

This also refactors the acl replication code to be semi-generic to avoid
3x copypasta.
2019-04-26 14:49:12 -05:00
R.B. Boyer b3956e511c acl: ACL Tokens can now be assigned an optional set of service identities (#5390)
These act like a special cased version of a Policy Template for granting
a token the privileges necessary to register a service and its connect
proxy, and read upstreams from the catalog.
2019-04-26 14:48:04 -05:00
Jeff Mitchell d3c7d57209
Move internal/ to sdk/ (#5568)
* Move internal/ to sdk/

* Add a readme to the SDK folder
2019-03-27 08:54:56 -04:00
Jeff Mitchell a41c865059
Convert to Go Modules (#5517)
* First conversion

* Use serf 0.8.2 tag and associated updated deps

* * Move freeport and testutil into internal/

* Make internal/ its own module

* Update imports

* Add replace statements so API and normal Consul code are
self-referencing for ease of development

* Adapt to newer goe/values

* Bump to new cleanhttp

* Fix ban nonprintable chars test

* Update lock bad args test

The error message when the duration cannot be parsed changed in Go 1.12
(ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test.

* Update another test as well

* Bump travis

* Bump circleci

* Bump go-discover and godo to get rid of launchpad dep

* Bump dockerfile go version

* fix tar command

* Bump go-cleanhttp
2019-03-26 17:04:58 -04:00
R.B. Boyer d65008700a
acl: reduce complexity of token resolution process with alternative singleflighting (#5480)
acl: reduce complexity of token resolution process with alternative singleflighting

Switches acl resolution to use golang.org/x/sync/singleflight. For the
identity/legacy lookups this is a drop-in replacement with the same
overall approach to request coalescing.

For policies this is technically a change in behavior, but when
considered holistically is approximately performance neutral (with the
benefit of less code).

There are two goals with this blob of code (speaking specifically of
policy resolution here):

  1) Minimize cross-DC requests.
  2) Minimize client-to-server LAN requests.

The previous iteration of this code was optimizing for the case of many
possibly different tokens being resolved concurrently that have a
significant overlap in linked policies such that deduplication would be
worth the complexity. While this is laudable there are some things to
consider that can help to adjust expectations:

  1) For v1.4+ policies are always replicated, and once a single policy
  shows up in a secondary DC the replicated data is considered
  authoritative for requests made in that DC. This means that our
  earlier concerns about minimizing cross-DC requests are irrelevant
  because there will be no cross-DC policy reads that occur.

  2) For Server nodes the in-memory ACL policy cache is capped at zero,
  meaning it has no caching. Only Client nodes run with a cache. This
  means that instead of having an entire DC's worth of tokens (what a
  Server might see) that can have policy resolutions coalesced these
  nodes will only ever be seeing node-local token resolutions. In a
  reasonable worst-case scenario where a scheduler like Kubernetes has
  "filled" a node with Connect services, even that will only schedule
  ~100 connect services per node. If every service has a unique token
  there will only be 100 tokens to coalesce and even then those requests
  have to occur concurrently AND be hitting an empty consul cache.

Instead of seeing a great coalescing opportunity for cutting down on
redundant Policy resolutions, in practice it's far more likely given
node densities that you'd see requests for the same token concurrently
than you would for two tokens sharing a policy concurrently (to a degree
that would warrant the overhead of the current variation of
singleflighting.

Given that, this patch switches the Policy resolution process to only
singleflight by requesting token (but keeps the cache as by-policy).
2019-03-14 09:35:34 -05:00
R.B. Boyer e9614ee92f
acl: correctly extend the cache for acl identities during resolution (#5475) 2019-03-12 10:23:43 -05:00