open-consul

Commit Graph

Author	SHA1	Message	Date
James Phillips	97e761f50f	Adds node ID integrity checking for cluster merges.	2017-03-27 00:15:42 -07:00
James Phillips	1e5a442420	Walks back the changes to change pool address interface into strings.	2017-03-16 16:42:19 -07:00
James Phillips	28f8aa5559	Removes remoteConsuls in favor of the new router. This has the next wave of RTT integration with the router and also factors some common RTT-related helpers out to lib. While we were in here we also got rid of the coordinate disable config so we don't need to deal with the complexity in the router (there was never a user-visible way to disable coordinates).	2017-03-16 16:42:19 -07:00
James Phillips	838d85b7ae	Changes pool's dial address to a string and adds a timeout.	2017-03-16 16:42:18 -07:00
James Phillips	96bff003b7	Adds basic support for node IDs.	2017-01-17 22:47:59 -08:00
James Phillips	bc29610124	Adds support for snapshots and restores. (#2396 ) * Updates Raft library to get new snapshot/restore API. * Basic backup and restore working, but need some cleanup. * Breaks out a snapshot module and adds a SHA256 integrity check. * Adds snapshot ACL and fills in some missing comments. * Require a consistent read for snapshots. * Make sure snapshot works if ACLs aren't enabled. * Adds a bit of package documentation. * Returns an empty response from restore to avoid EOF errors. * Adds API client support for snapshots. * Makes internal file names match on-disk file snapshots. * Adds DC and token coverage for snapshot API test. * Adds missing documentation. * Adds a unit test for the snapshot client endpoint. * Moves the connection pool out of the client for easier testing. * Fixes an incidental issue in the prepared query unit test. I realized I had two servers in bootstrap mode so this wasn't a good setup. * Adds a half close to the TCP stream and fixes panic on error. * Adds client and endpoint tests for snapshots. * Moves the pool back into the snapshot RPC client. * Adds a TLS test and fixes half-closes for TLS connections. * Tweaks some comments. * Adds a low-level snapshot test. This is independent of Consul so we can pull this out into a library later if we want to. * Cleans up snapshot and archive and completes archive tests. * Sends a clear error for snapshot operations in dev mode. Snapshots require the Raft snapshots to be readable, which isn't supported in dev mode. Send a clear error instead of a deep-down Raft one. * Adds docs for the snapshot endpoint. * Adds a stale mode and index feedback for snapshot saves. This gives folks a way to extract data even if the cluster has no leader. * Changes the internal format of a snapshot from zip to tgz. * Pulls in Raft fix to cancel inflight before a restore. * Pulls in new Raft restore interface. * Adds metadata to snapshot saves and a verify function. * Adds basic save and restore snapshot CLI commands. * Gets rid of tarball extensions and adds restore message. * Fixes an incidental bad link in the KV docs. * Adds documentation for the snapshot CLI commands. * Scuttle any request body when a snapshot is saved. * Fixes archive unit test error message check. * Allows for nil output writers in snapshot RPC handlers. * Renames hash list Decode to DecodeAndVerify. * Closes the client connection for snapshot ops. * Lowers timeout for restore ops. * Updates Raft vendor to get new Restore signature and integrates with Consul. * Bounces the leader's internal state when we do a restore.	2016-10-25 19:20:24 -07:00
Sean Chittenden	5ae7835988	Rename server_details package to agent	2016-03-29 17:39:19 -07:00
Sean Chittenden	4984b6111d	Gratuitous rename 1/2 Reduce cognative load and perform an overdue rename. No functional change. Rename the `server_manager` package to `servers`. Rename the `ServerManager` package to `Manager`. In `client`, rename `serverMgr` to `servers`.	2016-03-29 16:12:00 -07:00
Sean Chittenden	4747cf3cab	Start server rebalance task after init'ing Serf Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup. When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.	2016-03-26 22:04:41 -07:00
Sean Chittenden	2d9982eb27	Proactively ping server before rotation Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.	2016-03-26 19:28:13 -07:00
Sean Chittenden	b3298ce4c3	Only log in FindServers In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.	2016-03-25 13:58:50 -07:00
Sean Chittenden	e041c3905d	Fix stale comment Pointed out by: @slackpad	2016-03-25 12:00:40 -07:00
Sean Chittenden	45fc7c362e	Add a comment for Client serverMgr	2016-03-25 11:59:27 -07:00
Sean Chittenden	d5f72e8c07	Rename GetNumServers to NumServers() Matches the style of the rest of the repo	2016-03-23 22:10:50 -07:00
Sean Chittenden	9de9cf90f1	Rename NewServerManger to just New Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.	2016-03-23 22:10:50 -07:00
Sean Chittenden	7faea986a0	Rename FindHealthyServer() to FindServer() There is no guarantee the server coming back is healthy. It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.	2016-03-23 22:10:50 -07:00
Sean Chittenden	231768faea	Simplify error handling Rely on Serf for liveliness. In the event of a failure, simply cycle the server to the end of the list. If the server is unhealthy, Serf will reap the dead server. Additional simplifications: ) Only rebalance servers based on timers, not when a new server is readded to the cluster. ) Back out the failure count in server_details.ServerDetails	2016-03-23 22:10:50 -07:00
Sean Chittenden	26e51376d9	Introduce asynchronous management of consul server lists Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance. Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.	2016-03-23 22:10:50 -07:00
Sean Chittenden	b33648ca5c	Move consul.serverConfig out of the consul package Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here	2016-03-23 16:16:22 -07:00
Sean Chittenden	f3a69c939d	Refactor consul.serverParts into server_details.ServerDetails This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.	2016-03-23 16:15:47 -07:00
Sean Chittenden	b3192ca410	Rename serverConfigMtx to serverConfigLock Pointed out by: @slackpad	2016-03-23 16:15:47 -07:00
Sean Chittenden	82458fa9e8	Handle the case where there are no healthy servers Pointed out by: @slackpad	2016-03-23 16:15:47 -07:00
Sean Chittenden	09d4c6439c	Refactor out the management of Consul servers Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go. This commit brings in a background task that proactively manages the server list and: ) reshuffles the list ) manages the timer out of the RPC() path *) uses atomics to detect a server has failed This is a WIP, more work in testing needs to be completed.	2016-03-23 16:15:47 -07:00
Sean Chittenden	7de85906c1	Rename `lastServer` to `preferredServer` Expanding the domain of lastServer beyond RPC() changes the meaning of this variable. Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.	2016-03-23 16:14:59 -07:00
Sean Chittenden	2949980a64	Warn if serf events have queued up past 80% of the limit It is theoretically possible that the number of queued serf events can back up. If this happens, emit a warning message if there are more than 200 events in queue. Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time". The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).	2016-03-23 16:14:11 -07:00
Sean Chittenden	3ac1bcc799	Remove lastRPCTime This mechanism isn't going to provide much value in the future. Preemptively reduce the complexity of future work.	2016-03-23 16:13:49 -07:00
Sean Chittenden	72b7856045	Rename c.consuls to c.consulServers Prep for breaking out maintenance of consuls into a new goroutine.	2016-03-23 16:10:27 -07:00
James Phillips	d660311fbb	Revert "Merge pull request #1667 from hashicorp/b-redistribute-clients" This reverts commit 8f30dea4209491ebbe4ef9ab94dd8052d17bdbe9, reversing changes made to eb27a02956e7e052c0bec6f96a0c0f7f6675f6a6.	2016-02-24 15:38:03 -08:00
Sean Chittenden	fc82b351b8	Use the server's address in debug logging, not the c.lastServer, which may be nil	2016-02-02 15:51:28 -08:00
Sean Chittenden	58225e0ee3	Remove unnecessary check, test was moved further up in scope	2016-02-02 11:13:58 -08:00
Sean Chittenden	ef8bbca48f	Continually rebalance client connections Introduce a low-level background connection expiration mechanism wherein connections will be recycled periodically based on the size and health of the cluster. For the vast majority of consul users, this will mean an average connection age of 150s. For 10K node clusters it will take ~3min for clusters to rebalance their connections. In the pathological case for a 100K cluster where 99K clients are in the minority talking to 1x server it will take ~26min to rebalance all connections. It's possibe for clients recovering from a parititon to become fixated on a single server until the server or agent is restarted. This is of particular interest to long-running environments with stable agents, where `allow_stale` is true, and partitions occur periodically.	2016-01-30 17:13:50 -08:00
Sean Chittenden	8a37e76cb0	Use rand.Int31n() vs unconditionally using modulus	2016-01-30 15:47:58 -08:00
Sean Chittenden	b216d4c11f	Rename clientRPCCache to clientRPCConnMaxIdle, change value Increase the max idle time for agents talking to servers from 30s to 127s in order to allow for the reuse of connections that are being initiated by cron. 127s was chosen as the first prime above 120s (arbitrarily chose to use a prime) with the intent of reusing connections who are used by once-a-minute cron(8) jobs and who use a 60s jitter window (e.g. in vixie cron job execution can drift by up to 59s per job, or 119s for a once-a-minute cron job).	2016-01-30 15:27:46 -08:00
Sean Chittenden	e83a5b7a70	Reuse the results from gettimeofday(2)... Inside of a single RPC call, reuse time.Now().	2016-01-30 14:39:17 -08:00
James Phillips	132e1d813b	Fixes configs now that Serf always caches coordinates.	2015-10-23 15:23:01 -07:00
James Phillips	f71c79c53f	Does some small cleanups based on PR feedback. * Holds coordinate updates in map and gets rid of the update channel. * Cleans up config variables a bit.	2015-10-23 15:23:01 -07:00
James Phillips	b6c31bdf2f	Flips the sense of the coordinate enable option.	2015-10-23 15:23:01 -07:00
James Phillips	edb9a119e2	Does a clean up pass on the Consul side.	2015-10-23 15:23:01 -07:00
Derek Chiang	eb599a1745	Address comments	2015-10-23 15:23:01 -07:00
Derek Chiang	b2cff43bb5	Complete logic for sending coordinates	2015-10-23 15:23:01 -07:00
Dale Wijnand	c5168e1263	Fix a bunch of typos.	2015-09-15 13:22:08 +01:00
Ryan Uber	e6923a4832	consul: always fire events from server nodes	2015-06-18 18:13:29 -07:00
Sam Boyer	bdc5983463	Condense switch fallthroughs into expr lists	2015-05-26 21:30:14 -04:00
Armon Dadgar	9642384429	consul: support the new TLS wrapper	2015-05-11 15:15:36 -07:00
Armon Dadgar	3bf337a6ac	consul: thread the target DC through the RPC path	2015-05-11 13:09:19 -07:00
Armon Dadgar	a1de4b17c2	consul: use tlsutil.Wrapper instead of tls.Config directly	2015-05-11 13:09:19 -07:00
Ryan Uber	275d99e1dc	consul: allow returning custom error for merge delegate	2015-02-22 18:24:10 -08:00
Armon Dadgar	a66a765ca9	consul: Adding merge delegate to prevent mixing clusters	2015-01-06 15:48:46 -08:00
Ryan Uber	4cd89a9113	Rebase against upstream	2014-11-19 16:45:49 -08:00
Ryan Uber	2661bbfa27	consul: more tests, remove unused KeyManager() method	2014-11-19 16:37:40 -08:00

1 2

92 Commits