Commit Graph

642 Commits

Author SHA1 Message Date
R.B. Boyer 9a51ecc98b
agent: clients should only attempt to remove pruned nodes once per call (#6591) 2019-10-07 16:15:23 -05:00
Sarah Christoff 9b93dd93c9
Prune Unhealthy Agents (#6571)
* Add -prune flag to ForceLeave
2019-10-04 16:10:02 -05:00
Matt Keeler b0b57588d1
Implement Leader Routine Management (#6580)
* Implement leader routine manager

Switch over the following to use it for go routine management:

• Config entry Replication
• ACL replication - tokens, policies, roles and legacy tokens
• ACL legacy token upgrade
• ACL token reaping
• Intention Replication
• Secondary CA Roots Watching
• CA Root Pruning

Also added the StopAll call into the Server Shutdown method to ensure all leader routines get killed off when shutting down.

This should be mostly unnecessary as `revokeLeadership` should manually stop each one but just in case we really want these to go away (eventually).
2019-10-04 13:08:45 -04:00
Matt Keeler 9bd378a95c
Add EnterpriseConfig stubs (#6566) 2019-10-01 14:34:55 -04:00
R.B. Boyer 8433ef02a8
connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate (#6513)
This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.
2019-09-26 11:54:14 -05:00
Matt Keeler 5b83f589da
Expand the QueryOptions and QueryMeta interfaces (#6545)
In a previous PR I made it so that we had interfaces that would work enough to allow blockingQueries to work. However to complete this we need all fields to be settable and gettable.

Notes:
   • If Go ever gets contracts/generics then we could get rid of all the Getters/Setters
   • protoc / protoc-gen-gogo are going to generate all the getters for us.
   • I copied all the getters/setters from the protobuf funcs into agent/structs/protobuf_compat.go
   • Also added JSON marshaling funcs that use jsonpb for protobuf types.
2019-09-26 09:55:02 -04:00
Freddy 5eace88ce2
Expose HTTP-based paths through Connect proxy (#6446)
Fixes: #5396

This PR adds a proxy configuration stanza called expose. These flags register
listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only
listening on the loopback interface, while still accepting traffic from non
Connect-enabled services.

Under expose there is a boolean checks flag that would automatically expose all
registered HTTP and gRPC check paths.

This stanza also accepts a paths list to expose individual paths. The primary
use case for this functionality would be to expose paths for third parties like
Prometheus or the kubelet.

Listeners for requests to exposed paths are be configured dynamically at run
time. Any time a proxy, or check can be registered, a listener can also be
created.

In this initial implementation requests to these paths are not
authenticated/encrypted.
2019-09-25 20:55:52 -06:00
Matt Keeler 8885c8d318
Allow for enterprise only leader routines (#6533)
Eventually I am thinking we may need a way to register these at different priority levels but for now sticking this here is fine
2019-09-23 20:09:56 -04:00
R.B. Boyer cc889443a5
connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492)
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.

The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
2019-09-23 12:52:35 -05:00
Matt Keeler 8431c5f533
Add support for implementing new requests with protobufs instea… (#6502)
* Add build system support for protobuf generation

This is done generically so that we don’t have to keep updating the makefile to add another proto generation.

Note: anything not in the vendor directory and with a .proto extension will be run through protoc if the corresponding namespace.pb.go file is not up to date.

If you want to rebuild just a single proto file you can do so with: make proto-rebuild PROTOFILES=<list of proto files to rebuild>

Providing the PROTOFILES var will override the default behavior of finding all the .proto files.

* Start adding types to the agent/proto package

These will be needed for some other work and are by no means comprehensive.

* Add ability to resolve/fixup the agentpb.ACLLinks structure in the state store.

* Use protobuf marshalling of raft requests instead of msgpack for protoc generated types.

This does not change any encoding of existing types.

* Removed structs package automatically encoding with protobuf marshalling

Instead the caller of raftApply that wants to opt-in to protobuf encoding will have to call `raftApplyProtobuf`

* Run update-vendor to fixup modules.txt

Nothing changed as far as dependencies go but the ordering of modules in that file depends on the time they are first seen and its not alphabetical.

* Rename some things and implement the structs.RPCInfo interface bits

agentpb.QueryOptions and agentpb.WriteRequest implement 3 of the 4 RPCInfo funcs and the new TargetDatacenter message type implements the fourth.

* Use the right encoding function.

* Renamed agent/proto package to agent/agentpb to prevent package name conflicts

* Update modules.txt to fix ordering

* Change blockingQuery to take in interfaces for the query options and meta

* Add %T to error output.

* Add/Update some comments
2019-09-20 14:37:22 -04:00
R.B. Boyer 5c5f21088c sdk: add freelist tracking and ephemeral port range skipping to freeport
This should cut down on test flakiness.

Problems handled:

- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.

- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).

- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
2019-09-17 14:30:43 -05:00
R.B. Boyer edf5347d3c fix typo of 'unknown' in log messages 2019-09-13 15:59:49 -05:00
Hans Hasselberg f025a7440d
agent: handleEnterpriseLeave (#6453) 2019-09-11 11:01:37 +02:00
Pierre Souchay 6d13efa828 Distinguish between DC not existing and not being available (#6399) 2019-09-03 09:46:24 -06:00
Matt Keeler 31d9d2e557
Store primaries root in secondary after intermediate signature (#6333)
* Store primaries root in secondary after intermediate signature

This ensures that the intermediate exists within the CA root stored in raft and not just in the CA provider state. This has the very nice benefit of actually outputting the intermediate cert within the ca roots HTTP/RPC endpoints.

This change means that if signing the intermediate fails it will not set the root within raft. So far I have not come up with a reason why that is bad. The secondary CA roots watch will pull the root again and go through all the motions. So as soon as getting an intermediate CA works the root will get set.

* Make TestAgentAntiEntropy_Check_DeferSync less flaky

I am not sure this is the full fix but it seems to help for me.
2019-08-30 11:38:46 -04:00
Pierre Souchay 35d90fc899 Display IPs of machines when node names conflict to ease troubleshooting
When there is an node name conflicts, such messages are displayed within Consul:

`consul.fsm: EnsureRegistration failed: failed inserting node: Error while renaming Node ID: "e1d456bc-f72d-98e5-ebb3-26ae80d785cf": Node name node001 is reserved by node 05f10209-1b9c-b90c-e3e2-059e64556d4a with name node001`

While it is easy to find the node that has reserved the name, it is hard to find
the node trying to aquire the name since it is not registered, because it
is not part of `consul members` output

This PR will display the IP of the offender and solve far more easily those issues.
2019-08-28 15:57:05 -04:00
Alvin Huang e4e9381851
revert commits on master (#6413) 2019-08-27 17:45:58 -04:00
tradel 2838a1550a update tests to match new method signatures 2019-08-27 14:16:39 -07:00
tradel 93c839b76c confi\gure providers with DC and domain 2019-08-27 14:16:25 -07:00
tradel 1acde6e30a create a common name for autoTLS agent certs 2019-08-27 14:15:53 -07:00
Alvin Huang 9662b7c01a
add nil pointer check for pointer to ACLToken struct (#6407) 2019-08-27 11:23:28 -04:00
Hans Hasselberg 4f7a3e8fa8 make sure auto_encrypt has private key type and bits 2019-08-26 13:09:50 +02:00
R.B. Boyer 2d4a3b51d0
Merge pull request #6388 from hashicorp/release/1-6
merging release/1-6 into master
2019-08-23 13:44:46 -05:00
Matt Keeler 89ac998e8b
Secondary CA `establishLeadership` fix (#6383)
This prevents ACL issues (or other issues) during intermediate CA cert signing from failing leader establishment.
2019-08-23 11:32:37 -04:00
Hans Hasselberg aada537d87
auto_encrypt: use server-port (#6287)
AutoEncrypt needs the server-port because it wants to talk via RPC. Information from gossip might not be available at that point and thats why the server-port is being used.
2019-08-23 10:18:46 +02:00
Matt Keeler 8cb0560f52
Ensure that config entry writes are forwarded to the primary DC (#6339) 2019-08-20 12:01:13 -04:00
R.B. Boyer 0675e0606e
connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340) 2019-08-19 13:03:03 -05:00
R.B. Boyer d6456fddeb
connect: introduce ExternalSNI field on service-defaults (#6324)
Compiling this will set an optional SNI field on each DiscoveryTarget.
When set this value should be used for TLS connections to the instances
of the target. If not set the default should be used.

Setting ExternalSNI will disable mesh gateway use for that target. It also 
disables several service-resolver features that do not make sense for an 
external service.
2019-08-19 12:19:44 -05:00
R.B. Boyer f84f509ce4
connect: updating a service-defaults config entry should leave an unset protocol alone (#6342)
If the entry is updated for reasons other than protocol it is surprising
that the value is explicitly persisted as 'tcp' rather than leaving it
empty and letting it fall back dynamically on the proxy-defaults value.
2019-08-19 10:44:06 -05:00
Matt Keeler 73888eed36
Filter out left/leaving serf members when determining if new AC… (#6332) 2019-08-16 10:34:18 -04:00
R.B. Boyer 22ee60d1ba
agent: blocking central config RPCs iterations should not interfere with each other (#6316) 2019-08-14 09:08:46 -05:00
hashicorp-ci 29767157ed Merge Consul OSS branch 'master' at commit 8f7586b339dbb518eff3a2eec27d7b8eae7a3fbb 2019-08-13 02:00:43 +00:00
Sarah Adams 2f7a90bc52
add flag to allow /operator/keyring requests to only hit local servers (#6279)
Add parameter local-only to operator keyring list requests to force queries to only hit local servers (no WAN traffic).

HTTP API: GET /operator/keyring?local-only=true
CLI: consul keyring -list --local-only

Sending the local-only flag with any non-GET/list request will result in an error.
2019-08-12 11:11:11 -07:00
Mike Morris 88df658243
connect: remove managed proxies (#6220)
* connect: remove managed proxies implementation and all supporting config options and structs

* connect: remove deprecated ProxyDestination

* command: remove CONNECT_PROXY_TOKEN env var

* agent: remove entire proxyprocess proxy manager

* test: remove all managed proxy tests

* test: remove irrelevant managed proxy note from TestService_ServerTLSConfig

* test: update ContentHash to reflect managed proxy removal

* test: remove deprecated ProxyDestination test

* telemetry: remove managed proxy note

* http: remove /v1/agent/connect/proxy endpoint

* ci: remove deprecated test exclusion

* website: update managed proxies deprecation page to note removal

* website: remove managed proxy configuration API docs

* website: remove managed proxy note from built-in proxy config

* website: add note on removing proxy subdirectory of data_dir
2019-08-09 15:19:30 -04:00
R.B. Boyer 357ca39868
connect: ensure intention replication continues to work when the replication ACL token changes (#6288) 2019-08-07 11:34:09 -05:00
hashicorp-ci 3ac803da5e Merge Consul OSS branch 'master' at commit d84863799deca45ccf4bec5ab9f645ccae6b3aeb 2019-08-06 02:00:30 +00:00
Sarah Adams 9ed3e64510
fallback to proxy config global protocol when upstream services' protocol is unset (#6277)
fallback to proxy config global protocol when upstream services' protocol is unset

Fixes #5857
2019-08-05 12:52:35 -07:00
R.B. Boyer 64fc002e03
connect: fix failover through a mesh gateway to a remote datacenter (#6259)
Failover is pushed entirely down to the data plane by creating envoy
clusters and putting each successive destination in a different load
assignment priority band. For example this shows that normally requests
go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080:

- name: foo
  load_assignment:
    cluster_name: foo
    policy:
      overprovisioning_factor: 100000
    endpoints:
    - priority: 0
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 1.2.3.4
              port_value: 8080
    - priority: 1
      lb_endpoints:
      - endpoint:
          address:
            socket_address:
              address: 6.7.8.9
              port_value: 8080

Mesh gateways route requests based solely on the SNI header tacked onto
the TLS layer. Envoy currently only lets you configure the outbound SNI
header at the cluster layer.

If you try to failover through a mesh gateway you ideally would
configure the SNI value per endpoint, but that's not possible in envoy
today.

This PR introduces a simpler way around the problem for now:

1. We identify any target of failover that will use mesh gateway mode local or
   remote and then further isolate any resolver node in the compiled discovery
   chain that has a failover destination set to one of those targets.

2. For each of these resolvers we will perform a small measurement of
   comparative healths of the endpoints that come back from the health API for the
   set of primary target and serial failover targets. We walk the list of targets
   in order and if any endpoint is healthy we return that target, otherwise we
   move on to the next target.

3. The CDS and EDS endpoints both perform the measurements in (2) for the
   affected resolver nodes.

4. For CDS this measurement selects which TLS SNI field to use for the cluster
   (note the cluster is always going to be named for the primary target)

5. For EDS this measurement selects which set of endpoints will populate the
   cluster. Priority tiered failover is ignored.

One of the big downsides to this approach to failover is that the failover
detection and correction is going to be controlled by consul rather than
deferring that entirely to the data plane as with the prior version. This also
means that we are bound to only failover using official health signals and
cannot make use of data plane signals like outlier detection to affect
failover.

In this specific scenario the lack of data plane signals is ok because the
effectiveness is already muted by the fact that the ultimate destination
endpoints will have their data plane signals scrambled when they pass through
the mesh gateway wrapper anyway so we're not losing much.

Another related fix is that we now use the endpoint health from the
underlying service, not the health of the gateway (regardless of
failover mode).
2019-08-05 13:30:35 -05:00
R.B. Boyer 0165e93517
connect: expose an API endpoint to compile the discovery chain (#6248)
In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand.

Also removed ability to configure the envoy OverprovisioningFactor.
2019-08-02 15:34:54 -05:00
Todd Radel 295abd82c3
connect: generate intermediate at same time as root (#6272)
Generate intermediate at same time as root
Co-Authored-By: Freddy <freddygv@users.noreply.github.com>
2019-08-02 15:36:03 -04:00
R.B. Boyer 4e2fb5730c
connect: detect and prevent circular discovery chain references (#6246) 2019-08-02 09:18:45 -05:00
R.B. Boyer 6c9edb17c2
server: if inserting bootstrap config entries fails don't silence the errors (#6256) 2019-08-01 23:07:11 -05:00
R.B. Boyer 782c647bf4
connect: simplify the compiled discovery chain data structures (#6242)
This should make them better for sending over RPC or the API.

Instead of a chain implemented explicitly like a linked list (nodes
holding pointers to other nodes) instead switch to a flat map of named
nodes with nodes linking other other nodes by name. The shipped
structure is just a map and a string to indicate which key to start
from.

Other changes:

* inline the compiler option InferDefaults as true

* introduce compiled target config to avoid needing to send back
  additional maps of Resolvers; future target-specific compiled state
  can go here

* move compiled MeshGateway out of the Resolver and into the
  TargetConfig where it makes more sense.
2019-08-01 22:44:05 -05:00
R.B. Boyer 4666599e18
connect: reconcile how upstream configuration works with discovery chains (#6225)
* connect: reconcile how upstream configuration works with discovery chains

The following upstream config fields for connect sidecars sanely
integrate into discovery chain resolution:

- Destination Namespace/Datacenter: Compilation occurs locally but using
different default values for namespaces and datacenters. The xDS
clusters that are created are named as they normally would be.

- Mesh Gateway Mode (single upstream): If set this value overrides any
value computed for any resolver for the entire discovery chain. The xDS
clusters that are created may be named differently (see below).

- Mesh Gateway Mode (whole sidecar): If set this value overrides any
value computed for any resolver for the entire discovery chain. If this
is specifically overridden for a single upstream this value is ignored
in that case. The xDS clusters that are created may be named differently
(see below).

- Protocol (in opaque config): If set this value overrides the value
computed when evaluating the entire discovery chain. If the normal chain
would be TCP or if this override is set to TCP then the result is that
we explicitly disable L7 Routing and Splitting. The xDS clusters that
are created may be named differently (see below).

- Connect Timeout (in opaque config): If set this value overrides the
value for any resolver in the entire discovery chain. The xDS clusters
that are created may be named differently (see below).

If any of the above overrides affect the actual result of compiling the
discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op
override to "tcp") then the relevant parameters are hashed and provided
to the xDS layer as a prefix for use in naming the Clusters. This is to
ensure that if one Upstream discovery chain has no overrides and
tangentially needs a cluster named "api.default.XXX", and another
Upstream does have overrides for "api.default.XXX" that they won't
cross-pollinate against the operator's wishes.

Fixes #6159
2019-08-01 22:03:34 -05:00
Paul Banks a5c70d79d0 Revert "connect: support AWS PCA as a CA provider" (#6251)
This reverts commit 3497b7c00d49c4acbbf951d84f2bba93f3da7510.
2019-07-31 09:08:10 -04:00
Todd Radel d3b7fd83fe
connect: support AWS PCA as a CA provider (#6189)
Port AWS PCA provider from consul-ent
2019-07-30 22:57:51 -04:00
Todd Radel 1b14d6595e
connect: Support RSA keys in addition to ECDSA (#6055)
Support RSA keys in addition to ECDSA
2019-07-30 17:47:39 -04:00
Matt Keeler a7c4b7af7c
Fix CA Replication when ACLs are enabled (#6201)
Secondary CA initialization steps are:

• Wait until the primary will be capable of signing intermediate certs. We use serf metadata to check the versions of servers in the primary which avoids needing a token like the previous implementation that used RPCs. We require at least one alive server in the primary and the all alive servers meet the version requirement.
• Initialize the secondary CA by getting the primary to sign an intermediate

When a primary dc is configured, if no existing CA is initialized and for whatever reason we cannot initialize a secondary CA the secondary DC will remain without a CA. As soon as it can it will initialize the secondary CA by pulling the primaries roots and getting the primary to sign an intermediate.

This also fixes a segfault that can happen during leadership revocation. There was a spot in the secondaryCARootsWatch that was getting the CA Provider and executing methods on it without nil checking. Under normal circumstances it wont be nil but during leadership revocation it gets nil'ed out. Therefore there is a period of time between closing the stop chan and when the go routine is actually stopped where it could read a nil provider and cause a segfault.
2019-07-26 15:57:57 -04:00
R.B. Boyer 1b95d2e5e3 Merge Consul OSS branch master at commit b3541c4f34d43ab92fe52256420759f17ea0ed73 2019-07-26 10:34:24 -05:00
Matt Keeler c4a34602b6
Allow forwarding of some status RPCs (#6198)
* Allow forwarding of some status RPCs

* Update docs

* add comments about not using the regular forward
2019-07-25 14:26:22 -04:00