Commit Graph

3740 Commits

Author SHA1 Message Date
Sean Chittenden c980d492c6 Emulate a TryLock using atomic.CompareAndSwap
Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.
2016-03-23 22:10:50 -07:00
Sean Chittenden 102dcafe76 Make use of interfaces
Use an interface instead of serf.Serf as arg to NewServerManager.  Bonus points for improved testability.

Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden 231768faea Simplify error handling
Rely on Serf for liveliness.  In the event of a failure, simply cycle the server to the end of the list.  If the server is unhealthy, Serf will reap the dead server.

Additional simplifications:

*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden 0c519aa90d Unbreak client tests by reverting to original test
Debugging code crept into the actual test and hung out for much longer than it should have.
2016-03-23 22:10:50 -07:00
Sean Chittenden 26e51376d9 Introduce asynchronous management of consul server lists
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance.  Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden 6ed37d1d8d Comment nits 2016-03-23 22:10:50 -07:00
Sean Chittenden 32c24b5447 Update Serf to include `serf.NumNodes()` 2016-03-23 22:10:50 -07:00
Sean Chittenden c8ab3ae4cb Use saveServerConfig vs atomic.Value.Store(config) 2016-03-23 22:10:50 -07:00
Sean Chittenden 12377e80e6 Commit a handful of refactoring && copy/paste-o fixes 2016-03-23 22:10:50 -07:00
Sean Chittenden c1c17f158b Mutate copies of serverCfg.servers, not original
Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.
2016-03-23 22:10:50 -07:00
Sean Chittenden 753766cc5d rebalanceTimer may be nil during initialization
When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.
2016-03-23 22:10:50 -07:00
Sean Chittenden d0e2792d5c Properly retain a pointer to the rebalanceTimer 2016-03-23 22:10:50 -07:00
Sean Chittenden 62785de865 Cosmetic and various other wordsmithing cleanups 2016-03-23 22:10:50 -07:00
Sean Chittenden 31de4290cf Document the various functions and their locking 2016-03-23 22:10:50 -07:00
Sean Chittenden ffcd939feb Use config convenience method to get config
'cause ELETTHECOMPILERSDOTHEWORK.  I don't need that cluttering up the subconscious with more complexity.
2016-03-23 22:10:50 -07:00
Sean Chittenden ed7fee7a3c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden ab80393198 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden 1866d94285 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 73497f7915 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden 2a52d3eb80 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:32 -07:00
Sean Chittenden 49425c5371 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:09:46 -07:00
Sean Chittenden ebdccf0f35 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:05:29 -07:00
Sean Chittenden b7213d9daa Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:05:05 -07:00
Sean Chittenden e29b8de0a6 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:03:20 -07:00
Sean Chittenden 3730eaf6df Commit miss re: consuls variable rename 2016-03-23 16:24:29 -07:00
Sean Chittenden b33648ca5c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 16:16:22 -07:00
Sean Chittenden f3a69c939d Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
Sean Chittenden b3192ca410 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden 82458fa9e8 Handle the case where there are no healthy servers
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden 09d4c6439c Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 16:15:47 -07:00
Sean Chittenden 6bda2c007c Add a flag to denote that a server is disabled
A server is not normally disabled, but in the event of an RPC error, we want to mark a server as down to allow for fast failover to a different server.  This value must be an int in order to support atomic operations.

Additionally, this is the preliminary work required to bring up a server in a disabled state.  RPC health checks in the future could mark the server as alive, thereby creating an organic "slow start" feature for Consul.
2016-03-23 16:14:59 -07:00
Sean Chittenden 7de85906c1 Rename `lastServer` to `preferredServer`
Expanding the domain of lastServer beyond RPC() changes the meaning of this variable.  Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.
2016-03-23 16:14:59 -07:00
Sean Chittenden 12c2fefee3 Introduce GOTEST_FLAGS to conditionally add -v to go test
Trivial change that makes it possible for developers to set an environment variable and change the output of `go test` to be detailed (i.e. `GOTEST_FLAGS=-v`).
2016-03-23 16:14:11 -07:00
Sean Chittenden 2949980a64 Warn if serf events have queued up past 80% of the limit
It is theoretically possible that the number of queued serf events can back up.  If this happens, emit a warning message if there are more than 200 events in queue.

Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time".  The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).
2016-03-23 16:14:11 -07:00
Sean Chittenden 2a0c12460d Commit miss re: consuls variable rename 2016-03-23 16:13:49 -07:00
Sean Chittenden 3ac1bcc799 Remove lastRPCTime
This mechanism isn't going to provide much value in the future.  Preemptively reduce the complexity of future work.
2016-03-23 16:13:49 -07:00
Sean Chittenden 72b7856045 Rename c.consuls to c.consulServers
Prep for breaking out maintenance of consuls into a new goroutine.
2016-03-23 16:10:27 -07:00
Sean Chittenden 74c8da079e Fix whitespace alignment in a comment 2016-03-23 16:00:39 -07:00
Sean Chittenden d1ef4ec7e2 Use `rand.Int31n()` to get power of two optimization
In cases where i+1 is a power of two, skip one modulo operation.
2016-03-23 16:00:39 -07:00
James Phillips 0f23210628 Fixes JSON in wildcard query example. 2016-03-23 14:33:20 -07:00
James Phillips 51c3d05c90 Merge pull request #1865 from hashicorp/f-upgrade-boltdb
Updates BoltDB to v1.2.0 release.
2016-03-22 08:53:04 -07:00
James Phillips ac14a1ca97 Updates BoltDB to v1.2.0 release. 2016-03-22 08:36:46 -07:00
James Phillips 2c61c1d333 Merge pull request #1861 from hashicorp/b-flaky-test
Widens coordinate update sleeps in unit tests.
2016-03-21 18:24:05 -07:00
James Phillips 4629871f57 Widens coordinate update sleeps in unit tests. 2016-03-21 18:23:11 -07:00
James Phillips d24f2fdc74 Merge pull request #1854 from talonx/master
Added help text for -dev option #1804
2016-03-21 18:03:34 -07:00
James Phillips 36800754fc Merge pull request #1860 from hashicorp/b-flaky-sort
Gets rid of flaky sort check.
2016-03-21 17:31:17 -07:00
James Phillips 92e947dcc3 Gets rid of flaky sort check.
If we get a coordinate then this test will fail, so we only check the
first item in the list, which is deterministic.
2016-03-21 17:30:05 -07:00
James Phillips 511df6da9d Merge pull request #1859 from hashicorp/b-flaky-coord-tests
Increases timeouts for coordinate tests.
2016-03-21 17:10:01 -07:00
James Phillips 265a8d4053 Increases timeouts for coordinate tests.
We take the interval and add the random stagger to it, so 2X is cutting it
too close and the unit tests are often flaky.
2016-03-21 16:44:35 -07:00
James Phillips 7ad0d9789f Merge pull request #1839 from foxel/patch-1
Clarification for advertise_addrs.rpc
2016-03-21 16:14:17 -07:00