Reduce cognative load and perform an overdue rename. No functional change.
Rename the `server_manager` package to `servers`. Rename the `ServerManager` package to `Manager`. In `client`, rename `serverMgr` to `servers`.
Move the rebalance timer from ServerManager.Start's stack to struct ServerManager. This makes it possible to shuffle during tests without actually waiting >120s.
Comment out noisly loggers for the time being.
Improve the final logging statement to be useful and hint what the next active server for the client is going to be.
Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup. When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.
Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list.
Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`
In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.
In an earlier version there was a channel to notify when a new server was added, however this has long since been removed. Just default to the sane value of 2min before the first rebalance calc takes place.
Pointed out by: slackpad
Fantastic observation by slackpad. This was left over from when there was a boolean for health in the server struct (vs current strategy where we use server position in the list and rely on serf to cleanup the stale members).
Pointed out by: slackpad
Change the signature so it returns a value so that this can be tested externally with mock data. See the sample table in TestServerManagerInternal_refreshServerRebalanceTimer() for the rate at which it will back off. This function is mostly used to not cripple large clusters in the event of a partition.
Rely on Serf for liveliness. In the event of a failure, simply cycle the server to the end of the list. If the server is unhealthy, Serf will reap the dead server.
Additional simplifications:
*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance. Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.
This commit brings in a background task that proactively manages the server list and:
*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed
This is a WIP, more work in testing needs to be completed.
Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.
This commit brings in a background task that proactively manages the server list and:
*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed
This is a WIP, more work in testing needs to be completed.
Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.
This commit brings in a background task that proactively manages the server list and:
*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed
This is a WIP, more work in testing needs to be completed.
Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.
This commit brings in a background task that proactively manages the server list and:
*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed
This is a WIP, more work in testing needs to be completed.