Commit Graph

1129 Commits

Author SHA1 Message Date
Freddy e4e306210a
Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 10:14:48 -07:00
Kyle Havlovitz c8d4a40a87 connect: update some function comments in CA manager 2020-11-17 16:00:19 -08:00
Daniel Nephin b9306d8827 acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Daniel Nephin 9e7c8dd19d Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Matt Keeler 4bca029be9
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Kit Patella 4dfcdbab26
Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-16 15:54:50 -08:00
Matt Keeler 197a37a860
Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 17:08:17 -05:00
Daniel Nephin de88ceed1c
Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 14:20:07 -05:00
Kit Patella 0b18f5612e trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 374748dafc merge master 2020-11-16 10:46:53 -08:00
Kit Patella af719981f3 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kyle Havlovitz 0a86533e20 Reorganize some CA manager code for correctness/readability 2020-11-13 14:46:01 -08:00
Kyle Havlovitz 5de81c1375 connect: Add CAManager for synchronizing CA operations 2020-11-13 14:33:44 -08:00
Kyle Havlovitz 0b4876f906 connect: Add logic for updating secondary DC intermediate on config set 2020-11-13 14:33:44 -08:00
R.B. Boyer db1184c094
server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 16:19:12 -06:00
Kit Patella b486c1bce8 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
R.B. Boyer e323014faf
server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 14:42:21 -06:00
R.B. Boyer 6300abed18
server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 13:56:41 -06:00
Mike Morris a343365da7
ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 13:02:59 -05:00
R.B. Boyer 758384893d
server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 09:15:39 -06:00
Kit Patella 9533372ded first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
R.B. Boyer a5bd1ba323
agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 10:38:32 -06:00
Matt Keeler 2badb01d30
Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 11:21:42 -05:00
Matt Keeler 1f40f51a58
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
Matt Keeler 755fb72994
Switch to using the external autopilot module 2020-11-09 09:22:11 -05:00
Daniel Nephin e4a78c977d stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
Daniel Nephin 4fc073b1f4 stream: rename FilterByKey 2020-11-05 19:21:16 -05:00
Daniel Nephin d4cd2fa6a8 stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 8a26bca020 stream: move event filtering to PayloadEvents
Removes the weirdness around PayloadEvents.FilterByKey
2020-11-05 17:50:17 -05:00
Daniel Nephin dcacfd3548 stream: Remove unused method 2020-11-05 16:49:59 -05:00
Daniel Nephin 621f1db766
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin cd220e5d6c
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
Daniel Nephin f6b629852f state: test EventPayloadCheckServiceNode.FilterByKey
Also fix a bug in that function when only one of key or namespace were the empty string.
2020-10-30 14:35:57 -04:00
Daniel Nephin 60df44df4f stream: Add tests for filterByKey with namespace
And fix a bug where a request with a Namespace but no Key would not be properly filtered
2020-10-30 14:35:42 -04:00
Daniel Nephin 318dfbe6e4 stream: Move FilterByKey events to a table
In preparation for adding new tests.
2020-10-30 14:35:28 -04:00
Daniel Nephin 2d0030da39 state: use enterprise meta for creating events 2020-10-30 14:34:04 -04:00
Daniel Nephin b57c7afcbb stream: include the namespace in the snap cache key
Otherwise the wrong snapshot could be returned when the same key is used in different namespaces
2020-10-30 14:34:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
R.B. Boyer 67a0d0c426
state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 15:28:31 -05:00
R.B. Boyer 78014653b3 restore prior signature of test helper so enterprise compiles 2020-10-29 13:52:15 -05:00
Daniel Nephin 61ce0964a4 stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin 8ef4c0fcc5 state: use go-cmp for comparison
The output of the previous assertions made it impossible to debug the tests without code changes.

With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
2020-10-28 16:33:00 -04:00
Daniel Nephin 44da869ed4 stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin eea87e1acf store: use a ReadDB for snapshots
to remove the cyclic dependency between the snapshot handlers and the state.Store
2020-10-28 13:07:42 -04:00
Daniel Nephin cfe0ffde15
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin 03d2be03e7
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Daniel Nephin abd8cfcfe9 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
R.B. Boyer 0a80e82f21
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin f9b2834171 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin 26387cdc0e
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy d23038f94f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
Daniel Nephin fb8b68a6ec stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Pierre Souchay 54f9f247f8
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
s-christoff a62705101f
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00
Kyle Havlovitz 707f4a8d26 Stop intermediate renew routine on leader stop 2020-10-09 12:30:57 -07:00
Kyle Havlovitz 926a393a5c
Merge pull request #8784 from hashicorp/renew-intermediate-primary
connect: Enable renewing the intermediate cert in the primary DC
2020-10-09 12:18:59 -07:00
Daniel Nephin dd0e8d42c4
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Chris Piraino 4f77f87065
Emit service usage metrics with correct labeling strategy (#8856)
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
2020-10-09 11:01:45 -05:00
Kyle Havlovitz 50543d678e Fix intermediate refresh test comments 2020-10-09 08:53:33 -07:00
R.B. Boyer d2f09ca306
upstream some differences from enterprise (#8902) 2020-10-09 09:42:53 -05:00
Kyle Havlovitz 968fd8660d Update CI for leader renew CA test using Vault 2020-10-09 05:48:15 -07:00
Kyle Havlovitz 62270c3f9a
Merge branch 'master' into renew-intermediate-primary 2020-10-09 04:40:34 -07:00
Kyle Havlovitz b78f618beb connect: Check for expired root cert when cross-signing 2020-10-09 04:35:56 -07:00
Freddy 89d52f41c4
Add protocol to the topology endpoint response (#8868) 2020-10-08 17:31:54 -06:00
Matt Keeler 141eb60f06
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Daniel Nephin 05df7b18a9 config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Freddy de4af766f3
Support ingress gateways in mesh viz endpoint (#8864)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-08 09:47:09 -06:00
Daniel Nephin a94fe054f0
Merge pull request #8809 from hashicorp/streaming/materialize-view
Add StreamingHealthServices cache-type
2020-10-07 21:26:38 -04:00
Daniel Nephin e0236b5a9f
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin 783627aeef
Merge pull request #8768 from hashicorp/streaming/add-subscribe-service
subscribe: add subscribe service for streaming change events
2020-10-07 21:24:03 -04:00
Freddy 7d1f50d2e6
Return intention info in svc topology endpoint (#8853) 2020-10-07 18:35:34 -06:00
R.B. Boyer 35c4efd220
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
R.B. Boyer d6dce2332a
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Daniel Nephin 83401194ab streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin f857aef4a8 submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin ad29cf4f94 stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin fa115c6249 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00
Daniel Nephin 011109a6f6 subscirbe: extract streamID and logging from Subscribe
By extracting all of the tracing logic the core logic of the Subscribe
endpoint is much easier to read.
2020-10-06 12:49:35 -04:00
Daniel Nephin 4c4441997a subscribe: add integration test for acl token updates 2020-10-06 12:49:35 -04:00
Daniel Nephin 371ec2d70a subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin ae433947a4
Merge pull request #8799 from hashicorp/streaming/rename-framing-events
stream: remove EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-06 12:42:58 -04:00
R.B. Boyer a77b518542
server: create new memdb table for storing system metadata (#8703)
This adds a new very tiny memdb table and corresponding raft operation
for updating a very small effective map[string]string collection of
"system metadata". This can persistently record a fact about the Consul
state machine itself.

The first use of this feature will come in a later PR.
2020-10-06 10:08:37 -05:00
Daniel Nephin 2706cf9b2a
Merge pull request #8802 from hashicorp/dnephin/extract-lib-retry
lib/retry - extract a new package from lib/retry.go
2020-10-05 14:22:37 -04:00
freddygv 82a17ccee6 Do not evaluate discovery chain for topology upstreams 2020-10-05 10:24:50 -06:00
freddygv 63c50e15bc Single DB txn for ServiceTopology and other PR comments 2020-10-05 10:24:50 -06:00
freddygv 263bd9dd92 Add topology HTTP endpoint 2020-10-05 10:24:50 -06:00
freddygv 7c11580e93 Add topology RPC endpoint 2020-10-05 10:24:50 -06:00
freddygv 21c4708fe9 Add topology ACL filter 2020-10-05 10:24:50 -06:00
freddygv ac54bf99b3 Add func to combine up+downstream queries 2020-10-05 10:24:50 -06:00
freddygv 160a6539d1 factor in discovery chain when querying up/downstreams 2020-10-05 10:24:50 -06:00
freddygv 214b25919f support querying upstreams/downstreams from registrations 2020-10-05 10:24:50 -06:00
freddygv 3653045cb0 Add method for downstreams from disco chain 2020-10-05 10:24:50 -06:00
Daniel Nephin 40aac46cf4 lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin 0c7f9c72d7 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Daniel Nephin 9c5181c897 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin 0769f54fe1 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 5ef630f664
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
R.B. Boyer e84d52ba3a
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer ccd0200bd9
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2956313f2d connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00