Commit Graph

79 Commits

Author SHA1 Message Date
Sean Chittenden d5f72e8c07 Rename GetNumServers to NumServers()
Matches the style of the rest of the repo
2016-03-23 22:10:50 -07:00
Sean Chittenden 9de9cf90f1 Rename NewServerManger to just New
Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.
2016-03-23 22:10:50 -07:00
Sean Chittenden 7faea986a0 Rename FindHealthyServer() to FindServer()
There is no guarantee the server coming back is healthy.  It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 231768faea Simplify error handling
Rely on Serf for liveliness.  In the event of a failure, simply cycle the server to the end of the list.  If the server is unhealthy, Serf will reap the dead server.

Additional simplifications:

*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden 26e51376d9 Introduce asynchronous management of consul server lists
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance.  Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden b33648ca5c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 16:16:22 -07:00
Sean Chittenden f3a69c939d Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
Sean Chittenden b3192ca410 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden 82458fa9e8 Handle the case where there are no healthy servers
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden 09d4c6439c Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 16:15:47 -07:00
Sean Chittenden 7de85906c1 Rename `lastServer` to `preferredServer`
Expanding the domain of lastServer beyond RPC() changes the meaning of this variable.  Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.
2016-03-23 16:14:59 -07:00
Sean Chittenden 2949980a64 Warn if serf events have queued up past 80% of the limit
It is theoretically possible that the number of queued serf events can back up.  If this happens, emit a warning message if there are more than 200 events in queue.

Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time".  The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).
2016-03-23 16:14:11 -07:00
Sean Chittenden 3ac1bcc799 Remove lastRPCTime
This mechanism isn't going to provide much value in the future.  Preemptively reduce the complexity of future work.
2016-03-23 16:13:49 -07:00
Sean Chittenden 72b7856045 Rename c.consuls to c.consulServers
Prep for breaking out maintenance of consuls into a new goroutine.
2016-03-23 16:10:27 -07:00
James Phillips d660311fbb Revert "Merge pull request #1667 from hashicorp/b-redistribute-clients"
This reverts commit 8f30dea4209491ebbe4ef9ab94dd8052d17bdbe9, reversing
changes made to eb27a02956e7e052c0bec6f96a0c0f7f6675f6a6.
2016-02-24 15:38:03 -08:00
Sean Chittenden fc82b351b8 Use the server's address in debug logging, not the c.lastServer, which may be nil 2016-02-02 15:51:28 -08:00
Sean Chittenden 58225e0ee3 Remove unnecessary check, test was moved further up in scope 2016-02-02 11:13:58 -08:00
Sean Chittenden ef8bbca48f Continually rebalance client connections
Introduce a low-level background connection expiration mechanism wherein connections will be recycled periodically based on the size and health of the cluster.

For the vast majority of consul users, this will mean an average connection age of 150s.  For 10K node clusters it will take ~3min for clusters to rebalance their connections.  In the pathological case for a 100K cluster where 99K clients are in the minority talking to 1x server it will take ~26min to rebalance all connections.

It's possibe for clients recovering from a parititon to become fixated on a single server until the server or agent is restarted.  This is of particular interest to long-running environments with stable agents, where `allow_stale` is true, and partitions occur periodically.
2016-01-30 17:13:50 -08:00
Sean Chittenden 8a37e76cb0 Use rand.Int31n() vs unconditionally using modulus 2016-01-30 15:47:58 -08:00
Sean Chittenden b216d4c11f Rename clientRPCCache to clientRPCConnMaxIdle, change value
Increase the max idle time for agents talking to servers from 30s to 127s in order to allow for the reuse of connections that are being initiated by cron.

127s was chosen as the first prime above 120s (arbitrarily chose to use a prime) with the intent of reusing connections who are used by once-a-minute cron(8) jobs *and* who use a 60s jitter window (e.g. in vixie cron job execution can drift by up to 59s per job, or 119s for a once-a-minute cron job).
2016-01-30 15:27:46 -08:00
Sean Chittenden e83a5b7a70 Reuse the results from gettimeofday(2)...
Inside of a single RPC call, reuse time.Now().
2016-01-30 14:39:17 -08:00
James Phillips 132e1d813b Fixes configs now that Serf always caches coordinates. 2015-10-23 15:23:01 -07:00
James Phillips f71c79c53f Does some small cleanups based on PR feedback.
* Holds coordinate updates in map and gets rid of the update channel.
* Cleans up config variables a bit.
2015-10-23 15:23:01 -07:00
James Phillips b6c31bdf2f Flips the sense of the coordinate enable option. 2015-10-23 15:23:01 -07:00
James Phillips edb9a119e2 Does a clean up pass on the Consul side. 2015-10-23 15:23:01 -07:00
Derek Chiang eb599a1745 Address comments 2015-10-23 15:23:01 -07:00
Derek Chiang b2cff43bb5 Complete logic for sending coordinates 2015-10-23 15:23:01 -07:00
Dale Wijnand c5168e1263 Fix a bunch of typos. 2015-09-15 13:22:08 +01:00
Ryan Uber e6923a4832 consul: always fire events from server nodes 2015-06-18 18:13:29 -07:00
Sam Boyer bdc5983463 Condense switch fallthroughs into expr lists 2015-05-26 21:30:14 -04:00
Armon Dadgar 9642384429 consul: support the new TLS wrapper 2015-05-11 15:15:36 -07:00
Armon Dadgar 3bf337a6ac consul: thread the target DC through the RPC path 2015-05-11 13:09:19 -07:00
Armon Dadgar a1de4b17c2 consul: use tlsutil.Wrapper instead of tls.Config directly 2015-05-11 13:09:19 -07:00
Ryan Uber 275d99e1dc consul: allow returning custom error for merge delegate 2015-02-22 18:24:10 -08:00
Armon Dadgar a66a765ca9 consul: Adding merge delegate to prevent mixing clusters 2015-01-06 15:48:46 -08:00
Ryan Uber 4cd89a9113 Rebase against upstream 2014-11-19 16:45:49 -08:00
Ryan Uber 2661bbfa27 consul: more tests, remove unused KeyManager() method 2014-11-19 16:37:40 -08:00
Ryan Uber 295f876923 command/agent: fix up gossip encryption indicator 2014-11-19 16:35:37 -08:00
Ryan Uber 43a60f1424 command: basic rpc works for keys command 2014-11-19 16:30:21 -08:00
Ryan Uber 96376212ff consul: use rpc layer only for key management functions, add rpc commands 2014-11-19 16:30:21 -08:00
Atin Malaviya 2bd0e8c745 consul.Config() helper to generate the tlsutil.Config{} struct, 30 second keepalive, use keepalive for HTTP and HTTPS 2014-11-18 17:56:48 -05:00
Atin Malaviya b4424a1a50 Moved TLS Config stuff to tlsutil package 2014-11-18 11:03:36 -05:00
Armon Dadgar 3a1d686444 consul: Adding user event handler for callbacks 2014-08-26 19:04:07 -07:00
Armon Dadgar b1cf52db01 consul: expose UserEvent from Serf 2014-08-26 18:50:03 -07:00
Armon Dadgar ebae394863 consul: ACL setting passthrough 2014-08-18 15:46:20 -07:00
Nelson Elhage 0a2476b20e Restore the 0.2 TLS verification behavior.
Namely, don't check the DNS names in TLS certificates when connecting to
other servers.

As of golang 1.3, crypto/tls no longer natively supports doing partial
verification (verifying the cert issuer but not the hostname), so we
have to disable verification entirely and then do the issuer
verification ourselves. Fortunately, crypto/x509 makes this relatively
straightforward.

If the "server_name" configuration option is passed, we preserve the
existing behavior of checking that server name everywhere.

No option is provided to retain the current behavior of checking the
remote certificate against the local node name, since that behavior
seems clearly buggy and unintentional, and I have difficulty imagining
it is actually being used anywhere. It would be relatively
straightforward to restore if desired, however.
2014-06-28 13:32:42 -07:00
Armon Dadgar b5bd20634a consul: Gossip the build using Serf 2014-06-06 15:36:40 -07:00
Armon Dadgar 890d4d771f consul: Ensure clients also implement LocalMember 2014-05-29 11:21:56 -07:00
Armon Dadgar 319ab05b8c consul: Provide logger to yamux 2014-05-28 16:32:25 -07:00
Armon Dadgar bf25792e2f consul: Fix client server reaping 2014-05-28 16:32:24 -07:00