Commit Graph

87 Commits

Author SHA1 Message Date
Matt Keeler a7c4b7af7c
Fix CA Replication when ACLs are enabled (#6201)
Secondary CA initialization steps are:

• Wait until the primary will be capable of signing intermediate certs. We use serf metadata to check the versions of servers in the primary which avoids needing a token like the previous implementation that used RPCs. We require at least one alive server in the primary and the all alive servers meet the version requirement.
• Initialize the secondary CA by getting the primary to sign an intermediate

When a primary dc is configured, if no existing CA is initialized and for whatever reason we cannot initialize a secondary CA the secondary DC will remain without a CA. As soon as it can it will initialize the secondary CA by pulling the primaries roots and getting the primary to sign an intermediate.

This also fixes a segfault that can happen during leadership revocation. There was a spot in the secondaryCARootsWatch that was getting the CA Provider and executing methods on it without nil checking. Under normal circumstances it wont be nil but during leadership revocation it gets nil'ed out. Therefore there is a period of time between closing the stop chan and when the go routine is actually stopped where it could read a nil provider and cause a segfault.
2019-07-26 15:57:57 -04:00
R.B. Boyer 1b95d2e5e3 Merge Consul OSS branch master at commit b3541c4f34d43ab92fe52256420759f17ea0ed73 2019-07-26 10:34:24 -05:00
Jeff Mitchell e0068431f5 Chunking support (#6172)
* Initial chunk support

This uses the go-raft-middleware library to allow for chunked commits to the KV
2019-07-24 17:06:39 -04:00
Matt Keeler 39bb0e3e77 Implement Mesh Gateways
This includes both ingress and egress functionality.
2019-07-01 16:28:30 -04:00
Matt Keeler 0fc4da6861 Implement intention replication and secondary CA initialization 2019-07-01 16:28:30 -04:00
Sarah Christoff 8a930f7d3a
Remove failed nodes from serfWAN (#6028)
* Prune Servers from WAN and LAN

* cleaned up and fixed LAN to WAN

* moving things around

* force-leave remove from serfWAN, create pruneSerfWAN

* removed serfWAN remove, reduced complexity, fixed comments

* add another place to remove from serfWAN

* add nil check

* Update agent/consul/server.go

Co-Authored-By: Paul Banks <banks@banksco.de>
2019-06-28 12:40:07 -05:00
Hans Hasselberg 73c4e9f07c
tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597) 2019-06-27 22:22:07 +02:00
Hans Hasselberg 0d8d7ae052
agent: transfer leadership when establishLeadership fails (#5247) 2019-06-19 14:50:48 +02:00
Paul Banks e90fab0aec
Add rate limiting to RPCs sent within a server instance too (#5927) 2019-06-13 04:26:27 -05:00
R.B. Boyer 5a505c5b3a acl: adding support for kubernetes auth provider login (#5600)
* auth providers
* binding rules
* auth provider for kubernetes
* login/logout
2019-04-26 14:49:25 -05:00
R.B. Boyer 76321aa952 acl: tokens can be created with an optional expiration time (#5353) 2019-04-26 14:47:51 -05:00
Matt Keeler 3ea9fe3bff
Implement bootstrapping proxy defaults from the config file (#5714) 2019-04-26 14:25:03 -04:00
Matt Keeler 3b5d38fb49
Implement config entry replication (#5706) 2019-04-26 13:38:39 -04:00
Hans Hasselberg d511e86491
agent: enable reloading of tls config (#5419)
This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.
2019-03-13 10:29:06 +01:00
Hans Hasselberg 75ababb54f
Centralise tls configuration part 1 (#5366)
In order to be able to reload the TLS configuration, we need one way to generate the different configurations.

This PR introduces a `tlsutil.Configurator` which holds a `tlsutil.Config`. Afterwards it is responsible for rendering every `tls.Config`. In this particular PR I moved `IncomingHTTPSConfig`, `IncomingTLSConfig`, and `OutgoingTLSWrapper` into `tlsutil.Configurator`.

This PR is a pure refactoring - not a single feature added. And not a single test added. I only slightly modified existing tests as necessary.
2019-02-26 16:52:07 +01:00
R.B. Boyer a3e0fb8370 ensure that we plumb our configured logger into all parts of the raft library 2019-02-13 13:02:09 -06:00
Matt Keeler fa2c7059a2
Move autopilot initialization to prevent race (#5322)
`establishLeadership` invoked during leadership monitoring may use autopilot to do promotions etc. There was a race with doing that and having autopilot initialized and this fixes it.
2019-02-11 11:12:24 -05:00
Matt Keeler ec9934b6f8 Remaining ACL Unit Tests (#4852)
* Add leader token upgrade test and fix various ACL enablement bugs

* Update the leader ACL initialization tests.

* Add a StateStore ACL tests for ACLTokenSet and ACLTokenGetBy* functions

* Advertise the agents acl support status with the agent/self endpoint.

* Make batch token upsert CAS’able to prevent consistency issues with token auto-upgrade

* Finish up the ACL state store token tests

* Finish the ACL state store unit tests

Also rename some things to make them more consistent.

* Do as much ACL replication testing as I can.
2018-10-31 13:00:46 -07:00
Matt Keeler 99e0a124cb
New ACLs (#4791)
This PR is almost a complete rewrite of the ACL system within Consul. It brings the features more in line with other HashiCorp products. Obviously there is quite a bit left to do here but most of it is related docs, testing and finishing the last few commands in the CLI. I will update the PR description and check off the todos as I finish them over the next few days/week.
Description

At a high level this PR is mainly to split ACL tokens from Policies and to split the concepts of Authorization from Identities. A lot of this PR is mostly just to support CRUD operations on ACLTokens and ACLPolicies. These in and of themselves are not particularly interesting. The bigger conceptual changes are in how tokens get resolved, how backwards compatibility is handled and the separation of policy from identity which could lead the way to allowing for alternative identity providers.

On the surface and with a new cluster the ACL system will look very similar to that of Nomads. Both have tokens and policies. Both have local tokens. The ACL management APIs for both are very similar. I even ripped off Nomad's ACL bootstrap resetting procedure. There are a few key differences though.

    Nomad requires token and policy replication where Consul only requires policy replication with token replication being opt-in. In Consul local tokens only work with token replication being enabled though.
    All policies in Nomad are globally applicable. In Consul all policies are stored and replicated globally but can be scoped to a subset of the datacenters. This allows for more granular access management.
    Unlike Nomad, Consul has legacy baggage in the form of the original ACL system. The ramifications of this are:
        A server running the new system must still support other clients using the legacy system.
        A client running the new system must be able to use the legacy RPCs when the servers in its datacenter are running the legacy system.
        The primary ACL DC's servers running in legacy mode needs to be a gate that keeps everything else in the entire multi-DC cluster running in legacy mode.

So not only does this PR implement the new ACL system but has a legacy mode built in for when the cluster isn't ready for new ACLs. Also detecting that new ACLs can be used is automatic and requires no configuration on the part of administrators. This process is detailed more in the "Transitioning from Legacy to New ACL Mode" section below.
2018-10-19 12:04:07 -04:00
Kyle Havlovitz 96a35f8abc re-add Connect multi-dc config changes
This reverts commit 8bcfbaffb6588b024cd1a3cf0952e6bfa7d9e900.
2018-10-19 08:41:03 -07:00
Jack Pearkes 847a0a5266 Revert "Connect multi-dc config" (#4784) 2018-10-11 17:32:45 +01:00
Kyle Havlovitz 0cbd176a48 connect/ca: more OSS split for multi-dc 2018-10-10 12:17:59 -07:00
Armon Dadgar a343392f63 consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
MagnumOpus21 0b50b84429 Agent/Proxy: Formatting and test cases fix 2018-07-09 12:46:10 -04:00
Kyle Havlovitz 3c520019e9
connect/ca: add logic for pruning old stale RootCA entries 2018-07-02 10:35:05 -07:00
Matt Keeler 02719c52ff
Move starting enterprise functionality 2018-06-29 17:38:29 -04:00
mkeeler 1da3c42867 Merge remote-tracking branch 'connect/f-connect' 2018-06-25 19:42:51 +00:00
Paul Banks 824a9b4943 Actually return Intermediate certificates bundled with a leaf! 2018-06-25 12:25:40 -07:00
Kyle Havlovitz d1265bc38b
Rename some of the CA structs/files 2018-06-14 09:42:15 -07:00
Kyle Havlovitz c90b353eea
Move connect CA provider to separate package 2018-06-14 09:42:15 -07:00
Kyle Havlovitz aa10fb2f48
Clarify some comments and names around CA bootstrapping 2018-06-14 09:42:04 -07:00
Kyle Havlovitz fc9ef9741b
Hook the CA RPC endpoint into the provider interface 2018-06-14 09:41:59 -07:00
Matt Keeler c5d9c2362f Merge branch 'master' of github.com:hashicorp/consul into rpc-limiting
# Conflicts:
#	agent/agent.go
#	agent/consul/client.go
2018-06-11 16:11:36 -04:00
Matt Keeler c589991452 Apply the limits to the clients rpcLimiter 2018-06-11 15:51:17 -04:00
Matt Keeler 14661a417b Allow for easy enterprise/oss coexistence
Uses struct/interface embedding with the embedded structs/interfaces being empty for oss. Also methods on the server/client types are defaulted to do nothing for OSS
2018-05-24 10:36:42 -04:00
Preetha Appan 17a011b9bd
fix typo and remove comment 2018-03-27 14:28:05 -05:00
Preetha Appan 6d16afc65c
Remove unnecessary nil checks 2018-03-27 10:59:42 -05:00
Preetha Appan c21c2da690
Fix test and remove unused method 2018-03-27 09:44:41 -05:00
Preetha Appan 512f9a50fc
Allows disabling WAN federation by setting serf WAN port to -1 2018-03-26 14:21:06 -05:00
Kyle Havlovitz 6d1dbe6cc4
Move autopilot health loop into leader operations 2018-01-23 11:17:41 -08:00
Kyle Havlovitz dfc165a47b
Move autopilot initializing to oss file 2017-12-18 18:02:44 -08:00
Kyle Havlovitz 044c38aa7b
Move autopilot setup to a separate file 2017-12-18 16:55:51 -08:00
Kyle Havlovitz 9e1ba6fb4e
Make some final tweaks to autopilot package 2017-12-18 12:26:47 -08:00
Kyle Havlovitz f347c8a531
More refactoring to make autopilot consul-agnostic 2017-12-12 17:46:28 -08:00
Kyle Havlovitz 8546a1d3c6
Move autopilot to a standalone package 2017-12-11 16:45:33 -08:00
James Phillips 3e7ea1931c
Moves the FSM into its own package.
This will help make it clearer what happens when we add some registration
plumbing for the different operations and snapshots.
2017-11-29 18:36:53 -08:00
James Phillips 36bb30e67a
Creates a registration mechanism for RPC endpoints. 2017-11-29 18:36:52 -08:00
James Phillips d1ad538345 Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
James Phillips fcaa889116 Bumps default Raft protocol to version 3. (#3477)
* Changes default Raft protocol to 3.

* Changes numPeers() to report only voters.

This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.

* Fixes TestLeader_RollRaftServer.

* Fixes TestOperator_RaftRemovePeerByAddress.

* Fixes TestServer_*.

Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.

* Fixes TestACL_*.

These now just check for Raft replication to be set up, and don't
care about the number of voter peers.

* Fixes TestOperator_Raft_ListPeers.

* Fixes TestAutopilot_CleanupDeadServerPeriodic.

* Fixes TestCatalog_ListNodes_ConsistentRead_Fail.

* Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away
sockets when it sees io.EOF.

* Changes version to 1.0.0 in the options doc.

* Makes metrics test more deterministic with autopilot metrics possible.
2017-09-25 15:27:04 -07:00
Preetha Appan 8394ad08db Introduce Code Policy validation via sentinel, with a noop implementation 2017-09-25 13:44:55 -05:00