Commit Graph

194 Commits

Author SHA1 Message Date
James Phillips 6055a7c0bd
Switches to reliable Raft leader notifications.
This fixes #2896 by switching to the `notifyCh` instead of the `leaderCh`,
so we get all up/down events from Raft regarding leadership. We also wait
for the old leader loop to shut down before we ever consider starting a
new one, which keeps that single-threaded and fixes the panic in that issue.
2017-04-13 14:17:32 -07:00
Kyle Havlovitz c7c2ebaf24
Use the new raft api function for leaving if applicable 2017-03-27 12:31:38 -07:00
James Phillips 97e761f50f
Adds node ID integrity checking for cluster merges. 2017-03-27 00:15:42 -07:00
Kyle Havlovitz 37ea20cb44
Add advanced autopilot features 2017-03-22 15:25:16 -07:00
James Phillips 36a0abe10f Merge pull request #2801 from hashicorp/spoken-hub-oss
Adds support for WAN soft fail and join flooding.
2017-03-20 16:24:07 -07:00
James Phillips 5ee1256137
Converts the stats fetch from serial to parallel and snaps the last index. 2017-03-19 20:48:42 -07:00
James Phillips cfc01419c8
Adds a stats fetcher to make sure we don't block the autopilot loop. 2017-03-17 18:42:28 -07:00
James Phillips 44d3e207ed
Makes the flood goroutine more reusable. 2017-03-16 16:42:19 -07:00
James Phillips 9176e77111
Shuts down flooder when either Serf is shut down. 2017-03-16 16:42:19 -07:00
James Phillips 91a861337b
Adds LAN -> WAN join flooding. 2017-03-16 16:42:19 -07:00
James Phillips 28f8aa5559
Removes remoteConsuls in favor of the new router.
This has the next wave of RTT integration with the router and also
factors some common RTT-related helpers out to lib. While we were
in here we also got rid of the coordinate disable config so we don't
need to deal with the complexity in the router (there was never a
user-visible way to disable coordinates).
2017-03-16 16:42:19 -07:00
James Phillips 0269bd0e41
Cleans up after merge. 2017-03-16 16:42:18 -07:00
James Phillips 82b6fbd844
Adds router into RPC paths with work in progress on coordinates. 2017-03-16 16:42:18 -07:00
Kyle Havlovitz c40279e012
Fix an issue with changing server IDs and add a few UX enhancements around autopilot features 2017-03-15 16:09:55 -07:00
Kyle Havlovitz 8130f9b1c1
Cleaned up and reorganized some autopilot-related code 2017-03-09 18:21:40 -08:00
Kyle Havlovitz a5cbee0e99
Add AutopilotPolicy interface and BasicAutopilot 2017-03-08 12:26:58 -08:00
Kyle Havlovitz 8bcab6c6d7
Add autopilot server health tracking
This adds two goroutines to perform autopilot tasks on the leader - one
to monitor the health of servers and another to periodically clean up
dead servers with a limit on removal count. Also adds a new http endpoint,
`/v1/operator/autopilot/health`, for querying this information through an
operator RPC endpoint.
2017-03-06 16:00:10 -08:00
Kyle Havlovitz f9588b8d7f
Add raft version 2/3 compatibility 2017-02-22 12:53:32 -08:00
Kyle Havlovitz 2c9001a389
Add configurable cleanup of dead servers when a new server joins 2017-02-17 10:49:16 -08:00
James Phillips abe7b33986
Fixes a startup ordering issue between Raft and Serf.
This fixes #2663 and fixes #1899. It's not super related to this PR,
but the startup time changes that this PR brings made this a lot worse
so I was able to track it down.
2017-01-18 15:06:15 -08:00
James Phillips 96bff003b7
Adds basic support for node IDs. 2017-01-17 22:47:59 -08:00
Kyle Havlovitz c6f461aa25 Enable snapshots in dev mode (#2453) 2016-10-31 14:39:47 -04:00
James Phillips bc29610124 Adds support for snapshots and restores. (#2396)
* Updates Raft library to get new snapshot/restore API.

* Basic backup and restore working, but need some cleanup.

* Breaks out a snapshot module and adds a SHA256 integrity check.

* Adds snapshot ACL and fills in some missing comments.

* Require a consistent read for snapshots.

* Make sure snapshot works if ACLs aren't enabled.

* Adds a bit of package documentation.

* Returns an empty response from restore to avoid EOF errors.

* Adds API client support for snapshots.

* Makes internal file names match on-disk file snapshots.

* Adds DC and token coverage for snapshot API test.

* Adds missing documentation.

* Adds a unit test for the snapshot client endpoint.

* Moves the connection pool out of the client for easier testing.

* Fixes an incidental issue in the prepared query unit test.

I realized I had two servers in bootstrap mode so this wasn't a good setup.

* Adds a half close to the TCP stream and fixes panic on error.

* Adds client and endpoint tests for snapshots.

* Moves the pool back into the snapshot RPC client.

* Adds a TLS test and fixes half-closes for TLS connections.

* Tweaks some comments.

* Adds a low-level snapshot test.

This is independent of Consul so we can pull this out into a library
later if we want to.

* Cleans up snapshot and archive and completes archive tests.

* Sends a clear error for snapshot operations in dev mode.

Snapshots require the Raft snapshots to be readable, which isn't supported
in dev mode. Send a clear error instead of a deep-down Raft one.

* Adds docs for the snapshot endpoint.

* Adds a stale mode and index feedback for snapshot saves.

This gives folks a way to extract data even if the cluster has no
leader.

* Changes the internal format of a snapshot from zip to tgz.

* Pulls in Raft fix to cancel inflight before a restore.

* Pulls in new Raft restore interface.

* Adds metadata to snapshot saves and a verify function.

* Adds basic save and restore snapshot CLI commands.

* Gets rid of tarball extensions and adds restore message.

* Fixes an incidental bad link in the KV docs.

* Adds documentation for the snapshot CLI commands.

* Scuttle any request body when a snapshot is saved.

* Fixes archive unit test error message check.

* Allows for nil output writers in snapshot RPC handlers.

* Renames hash list Decode to DecodeAndVerify.

* Closes the client connection for snapshot ops.

* Lowers timeout for restore ops.

* Updates Raft vendor to get new Restore signature and integrates with Consul.

* Bounces the leader's internal state when we do a restore.
2016-10-25 19:20:24 -07:00
James Phillips b423dee303
Fixes port numbers in peers.info. 2016-10-05 18:09:15 -07:00
James Phillips 1b7a16b7d3
Adds new consul operator endpoint, CLI, and ACL and some basic Raft commands. 2016-08-30 00:02:50 -07:00
James Phillips 359587f70e
Removes support for muxado and protocol version 1. 2016-08-09 18:10:04 -07:00
James Phillips ff6d42389c Merge pull request #2222 from hashicorp/f-raft-v2
Integrates Consul with "stage one" of HashiCorp Raft library v2.
2016-08-09 16:04:48 -07:00
James Phillips cce38f9a4b
Moves the peers.info content down into a constant. 2016-08-09 11:56:39 -07:00
James Phillips 06a510a808
Removes unsafe "recover to empty" code.
This isn't safe because it would implicitly commit all outstanding log
entries. The new Raft library already has logic to not start a vote if
the current node isn't in the configuration, so this shoudn't be needed.
2016-08-08 19:19:19 -07:00
James Phillips dd3169b395
Tweaks recovery based on interface changes. 2016-08-08 19:19:18 -07:00
James Phillips 19004e7095
Moves to a safer design where we don't ingest the initial peers.json file. 2016-08-08 19:19:18 -07:00
James Phillips 44c468995f
Touches up Raft integration after latest changes. 2016-08-08 19:19:18 -07:00
James Phillips fc25145e85
Formats log messages to be consistent. 2016-08-08 19:19:18 -07:00
James Phillips 6b157eada0
Adds more comments about the raftSafeFn. 2016-08-08 19:19:18 -07:00
James Phillips 2bf633f206
Adds back "safing" the configuration when a server leaves. 2016-08-08 19:19:18 -07:00
James Phillips 6c8e8271e2
Integrates Consul with new version of Raft library. 2016-08-08 19:19:17 -07:00
James Phillips 4a931ae12e
Adds an ACL replication status endpoint. 2016-08-04 23:30:16 -07:00
James Phillips 796933b45b
Activates fallback to replicated ACLs. 2016-08-03 21:24:09 -07:00
James Phillips 9cece515c0
Adds basic ACL replication plumbing. 2016-08-03 21:24:04 -07:00
James Phillips 8358df599d Merge pull request #2127 from hashicorp/b-remote-consuls-locking
Ensure locking of `Server`'s `remoteConsuls`.
2016-06-21 10:00:04 -07:00
Sean Chittenden ebdb72ce0a
Ensure locking of `Server`'s `remoteConsuls`. 2016-06-20 22:59:49 -07:00
Sean Chittenden 9bf6e61655
Initialize a non-empty number of Consul Datacenters. No functional change. 2016-06-20 15:26:59 -07:00
James Phillips ffcba3df58 Merge pull request #2028 from hashicorp/f-atomic-kv
Adds support for atomic transactions spanning multiple KV entries.
2016-05-15 13:46:05 -07:00
Sean Chittenden 3756fb23a6
Remove unused peers variable from setupRaft(). 2016-05-15 06:40:46 -07:00
James Phillips d980cbcd9d Moves txn code into a new endpoint, not specific to KV. 2016-05-10 21:58:02 -07:00
Sean Chittenden c16b1ca178 Add the list of Raft peers to Consul's Stats
```
% consul info
[snip]
raft:
[snip]
	raft_peers = 127.0.0.1:8300
[snip]
```

Poached from: Nomad Project
2016-04-28 15:08:48 -07:00
Sean Chittenden 5ae7835988 Rename server_details package to agent 2016-03-29 17:39:19 -07:00
Sean Chittenden f3a69c939d Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
James Phillips e4ca18089f Removes leader from members and changes name since it's an address. 2016-03-18 17:07:11 -07:00
Sergey Romanov 11b73bb1a5 #735 add information about leader to consul members 2016-03-18 17:05:40 -07:00