Seth Vargo
1c55429a38
Add an API method for determining the best status
...
Given a list of HealthChecks, this determines the "best" status for the
collective group. This is useful for nodes and services, which may have
multiple checks associated with them.
2016-11-29 18:41:46 -05:00
Kyle Havlovitz
2d37a07476
Add keyring http endpoints
2016-11-22 20:10:43 -05:00
Kyle Havlovitz
9adc3854d1
Retry with backoff on session invalidation failure ( #2475 )
2016-11-04 21:53:22 -07:00
James Phillips
be4056789f
Moves the snapshot package up one level. ( #2472 )
2016-11-03 21:36:25 -07:00
Kyle Havlovitz
169cae2203
Disallow -bootstrap-expect flag in dev mode ( #2464 )
2016-11-03 01:54:43 -04:00
Kyle Havlovitz
440611f9f7
Add snapshot inspect subcommand ( #2451 )
2016-10-31 19:37:27 -04:00
Kyle Havlovitz
c6f461aa25
Enable snapshots in dev mode ( #2453 )
2016-10-31 14:39:47 -04:00
James Phillips
bc29610124
Adds support for snapshots and restores. ( #2396 )
...
* Updates Raft library to get new snapshot/restore API.
* Basic backup and restore working, but need some cleanup.
* Breaks out a snapshot module and adds a SHA256 integrity check.
* Adds snapshot ACL and fills in some missing comments.
* Require a consistent read for snapshots.
* Make sure snapshot works if ACLs aren't enabled.
* Adds a bit of package documentation.
* Returns an empty response from restore to avoid EOF errors.
* Adds API client support for snapshots.
* Makes internal file names match on-disk file snapshots.
* Adds DC and token coverage for snapshot API test.
* Adds missing documentation.
* Adds a unit test for the snapshot client endpoint.
* Moves the connection pool out of the client for easier testing.
* Fixes an incidental issue in the prepared query unit test.
I realized I had two servers in bootstrap mode so this wasn't a good setup.
* Adds a half close to the TCP stream and fixes panic on error.
* Adds client and endpoint tests for snapshots.
* Moves the pool back into the snapshot RPC client.
* Adds a TLS test and fixes half-closes for TLS connections.
* Tweaks some comments.
* Adds a low-level snapshot test.
This is independent of Consul so we can pull this out into a library
later if we want to.
* Cleans up snapshot and archive and completes archive tests.
* Sends a clear error for snapshot operations in dev mode.
Snapshots require the Raft snapshots to be readable, which isn't supported
in dev mode. Send a clear error instead of a deep-down Raft one.
* Adds docs for the snapshot endpoint.
* Adds a stale mode and index feedback for snapshot saves.
This gives folks a way to extract data even if the cluster has no
leader.
* Changes the internal format of a snapshot from zip to tgz.
* Pulls in Raft fix to cancel inflight before a restore.
* Pulls in new Raft restore interface.
* Adds metadata to snapshot saves and a verify function.
* Adds basic save and restore snapshot CLI commands.
* Gets rid of tarball extensions and adds restore message.
* Fixes an incidental bad link in the KV docs.
* Adds documentation for the snapshot CLI commands.
* Scuttle any request body when a snapshot is saved.
* Fixes archive unit test error message check.
* Allows for nil output writers in snapshot RPC handlers.
* Renames hash list Decode to DecodeAndVerify.
* Closes the client connection for snapshot ops.
* Lowers timeout for restore ops.
* Updates Raft vendor to get new Restore signature and integrates with Consul.
* Bounces the leader's internal state when we do a restore.
2016-10-25 19:20:24 -07:00
Kyle Havlovitz
f0aa65754b
Wait for agent joins to finish in TestClient_RPC
2016-10-25 17:48:11 -07:00
Kyle Havlovitz
2379b822ad
Add wait logic to TestClient_RPC_Pool
2016-10-25 17:48:11 -07:00
James Phillips
b423dee303
Fixes port numbers in peers.info.
2016-10-05 18:09:15 -07:00
James Phillips
8e76af4311
Merge pull request #2319 from hashicorp/f-bootstrap-abort
...
Adds check that aborts bootstrap mode if there's an existing cluster.
2016-09-01 09:49:03 -07:00
James Phillips
94c0e961eb
Fixes error message in test.
2016-09-01 09:48:08 -07:00
James Phillips
ce93b82e1e
Makes port selection atomic in unit tests.
2016-09-01 01:01:28 -07:00
James Phillips
d04a706a7c
Tweaks comment to be more correct.
2016-08-31 23:54:53 -07:00
James Phillips
4dd9b4b08a
Adds check that aborts bootstrap mode if there's an existing cluster.
2016-08-31 21:25:56 -07:00
James Phillips
750e1751ac
Copies the member data instead of referencing by pointer.
2016-08-30 16:54:21 -07:00
James Phillips
6be1e07fec
Makes the Raft configuration API easier to consume.
2016-08-30 11:30:56 -07:00
James Phillips
5df4b6bef2
Adds a log warning when operator peer changes occur.
2016-08-30 10:23:32 -07:00
James Phillips
1b7a16b7d3
Adds new consul operator endpoint, CLI, and ACL and some basic Raft commands.
2016-08-30 00:02:50 -07:00
James Phillips
29e52307cb
Makes empty checkServiceNode return a nil.
...
The change in #2308 had an inadvertent interface change, so we fix that with
a special case in this fix.
2016-08-29 19:12:07 -07:00
James Phillips
327fe725d9
Preallocates result struct, which was a profiling hot spot.
2016-08-26 16:34:28 -07:00
James Phillips
c5b6ac3655
Removes leader_lease_timeout from stats.
2016-08-25 15:39:19 -07:00
James Phillips
2f4c237cff
Adds a max raft multiplier and tweaks documentation.
2016-08-25 15:36:05 -07:00
James Phillips
5df36fbd82
Stops scaling the commit timeout.
2016-08-25 15:05:40 -07:00
James Phillips
f65ef936cb
Increases RPC hold timeout for new default timing.
...
Rather than scale this we just bump it up a bit. It'll be on the edge in
the lower-performance default mode, and will have plenty of margin in the
high-performance mode. This seems like a reasonable compromise to keep the
logic here simple vs. scaling, and seems inline with the expectations of
the different modes of operation.
2016-08-24 23:35:28 -07:00
James Phillips
b339b0d2fc
Adds performance tuning capability for Raft, detuned defaults, and supplemental docs.
2016-08-24 21:58:37 -07:00
James Phillips
0bdbdf1ba8
Merge pull request #2226 from abhinavdahiya/rm-health-unknown
...
Fixes #1775 ; Removes 'unknown' state
2016-08-17 17:51:04 -07:00
James Phillips
1f539d9914
Makes the filled-in parts of ServiceNode more explicit.
2016-08-12 18:25:36 -07:00
David van Geest
360e196c93
Translate Address to tagged WAN address in HTTP API when appropriate.
2016-08-12 18:25:36 -07:00
James Phillips
d11a7a197c
Removes upper end of muxado handler.
2016-08-09 18:16:41 -07:00
James Phillips
97a25e8564
Closes the conn on bad protocol version.
2016-08-09 18:13:53 -07:00
James Phillips
359587f70e
Removes support for muxado and protocol version 1.
2016-08-09 18:10:04 -07:00
James Phillips
99ab3390c2
Updates hashicorp/hcl and hashicorp/hil.
...
This required a small mod to core Consul code to cope with an interface
change.
2016-08-09 17:24:13 -07:00
James Phillips
ff6d42389c
Merge pull request #2222 from hashicorp/f-raft-v2
...
Integrates Consul with "stage one" of HashiCorp Raft library v2.
2016-08-09 16:04:48 -07:00
James Phillips
cce38f9a4b
Moves the peers.info content down into a constant.
2016-08-09 11:56:39 -07:00
James Phillips
7aaa4bc913
Adds peers back into bootstrap log, makes initial case consistent.
2016-08-09 11:52:41 -07:00
James Phillips
7f58b05dfe
Tweaks select style.
2016-08-09 11:33:42 -07:00
James Phillips
544169999c
Adds I/O-sensitive metrics to ACL replication operations.
2016-08-09 11:32:12 -07:00
James Phillips
820509760d
Switches to a smooth rate limit vs. a bursty one.
2016-08-09 11:29:12 -07:00
James Phillips
129e327bc9
Clarifies replication index shown in the log message.
2016-08-09 11:10:32 -07:00
James Phillips
4203612bd7
Returns from the shutdown wait right away.
2016-08-09 11:09:48 -07:00
James Phillips
e03fbef6b3
Moves ACL ID sorting interface onto the iterator.
2016-08-09 11:08:26 -07:00
James Phillips
0fa059ec49
Switches all ACL caches to 2Q.
2016-08-09 11:00:22 -07:00
James Phillips
1e75fa0362
Moves ACL ID generation down into the endpoint.
...
We don't want ACL replication to have this behavior so it was a
little dangerous to have in the shared helper function.
2016-08-09 00:11:00 -07:00
James Phillips
06a510a808
Removes unsafe "recover to empty" code.
...
This isn't safe because it would implicitly commit all outstanding log
entries. The new Raft library already has logic to not start a vote if
the current node isn't in the configuration, so this shoudn't be needed.
2016-08-08 19:19:19 -07:00
James Phillips
dd3169b395
Tweaks recovery based on interface changes.
2016-08-08 19:19:18 -07:00
James Phillips
19004e7095
Moves to a safer design where we don't ingest the initial peers.json file.
2016-08-08 19:19:18 -07:00
James Phillips
44c468995f
Touches up Raft integration after latest changes.
2016-08-08 19:19:18 -07:00
James Phillips
fc25145e85
Formats log messages to be consistent.
2016-08-08 19:19:18 -07:00
James Phillips
6b157eada0
Adds more comments about the raftSafeFn.
2016-08-08 19:19:18 -07:00
James Phillips
fcd8bb157a
Clarifies a comment about no-op peer operations.
2016-08-08 19:19:18 -07:00
James Phillips
2bf633f206
Adds back "safing" the configuration when a server leaves.
2016-08-08 19:19:18 -07:00
James Phillips
6c8e8271e2
Integrates Consul with new version of Raft library.
2016-08-08 19:19:17 -07:00
James Phillips
4a931ae12e
Adds an ACL replication status endpoint.
2016-08-04 23:30:16 -07:00
James Phillips
c94f1e1b83
Increases the ACL cache size to 10k.
2016-08-04 18:03:07 -07:00
James Phillips
3906517f70
Adds a full integrated test for ACL replication.
2016-08-04 17:59:08 -07:00
James Phillips
f639f49cc0
Adds remaining core replication tests.
2016-08-04 16:33:40 -07:00
James Phillips
defb39f8d4
Removes a TODO comment.
...
Decided we don't need to log anything about the token here. If the
token is not valid then the client will get an error about that, so
anything that can happen here is related to talking to the server in
the ACL datacenter, so not specific to the token.
2016-08-04 07:46:59 -07:00
James Phillips
93a7fd0561
Adds tests for the ACL reconcile algorithm.
2016-08-03 21:24:09 -07:00
James Phillips
796933b45b
Activates fallback to replicated ACLs.
2016-08-03 21:24:09 -07:00
James Phillips
9cece515c0
Adds basic ACL replication plumbing.
2016-08-03 21:24:04 -07:00
Abhinav Dahiya
9dc52449e3
Fixes #1775 ; Removes 'unknown' state
...
Signed-off-by: Abhinav Dahiya <abhinavdtu2012@gmail.com>
2016-07-30 19:33:14 +05:30
James Phillips
a1266e4164
Adds some supplemental tests for RPC "no leader" retries.
...
This adds some extra tests for #2175 .
2016-07-11 17:32:26 -06:00
Armon Dadgar
2d8cf9ef4a
consul: change tests to not expect ErrNoLeader
2016-07-10 13:24:18 -04:00
Armon Dadgar
5d0a977bdf
consul: Refactor forward to hold RPC when no leader is known
2016-07-10 13:24:06 -04:00
Armon Dadgar
191876f87e
consul: Add RPCHoldTimeout as tunable hold period
2016-07-10 13:23:43 -04:00
Ryan Uber
d8fd470f4f
Merge pull request #1837 from cleung2010/obfuscate-acl-token
...
Obfuscate token for lookupACL error
2016-07-05 13:56:49 -07:00
Calvin Leung Huang
38134f1b8c
Fix substring length on obfuscated token
2016-07-05 15:53:30 -04:00
Ryan Uber
577523fc73
consul: sort source node first if at position <= 10 in PQ's
2016-07-01 14:28:58 -07:00
Ryan Uber
e9960e6c85
Merge pull request #2137 from hashicorp/f-pq-near
...
Support "near" parameter in prepared query service block
2016-07-01 12:28:48 -07:00
Ryan Uber
ccbe86d7a8
consul: mention magic _agent token in struct comments
2016-07-01 11:50:30 -07:00
Ryan Uber
ebacaa2d67
consul: send agent source data as separate query source
2016-06-30 16:51:18 -07:00
Ryan Uber
782a081925
consul: use source parameter for near prepared queries
2016-06-30 12:11:20 -07:00
Ryan Uber
270270a33a
consul: send origin node + dc when executing prepared queries
2016-06-21 15:34:26 -07:00
Ryan Uber
925915c6ac
consul: test baked-in distance sort
2016-06-21 12:54:18 -07:00
Ryan Uber
114e57fff1
consul: use the Near field instead of PreferLocal
2016-06-21 12:39:40 -07:00
James Phillips
8358df599d
Merge pull request #2127 from hashicorp/b-remote-consuls-locking
...
Ensure locking of `Server`'s `remoteConsuls`.
2016-06-21 10:00:04 -07:00
James Phillips
f9e2900692
Merge pull request #2131 from hashicorp/b-misc-microoptimizations
...
Misc micro optimizations
2016-06-21 09:59:01 -07:00
Sean Chittenden
ebdb72ce0a
Ensure locking of `Server`'s `remoteConsuls`.
2016-06-20 22:59:49 -07:00
Sean Chittenden
72f7a4061c
Misc comment improvements
2016-06-20 15:29:38 -07:00
Sean Chittenden
9bf6e61655
Initialize a non-empty number of Consul Datacenters. No functional change.
2016-06-20 15:26:59 -07:00
Sean Chittenden
b78c95d37e
Prefer rand.Int31n() over rand.Int31().
2016-06-20 15:26:27 -07:00
Sean Chittenden
e81bf2a505
Fix deadlock in Consul RTT.
...
- consul/rtt.go:388: s.getDatacentersByDistance(). Acquires RLock()
- consul/rtt.go:341: sortDatacentersByDistance() RLock still held.
- consul/rtt.go:282: getDatacenterDistance() RLock still held.
- consul/rtt.go:268: getNodesForDatacenter(). Attempts to reacquire RLock(), hangs indefinitely.
2016-06-20 14:59:54 -07:00
Ryan Uber
89fe991ab7
consul: test raw PreferLocal functionality
2016-06-20 14:53:13 -07:00
Ryan Uber
1fef85cd2e
consul: support PreferLocal in PQ's
2016-06-20 14:24:40 -07:00
Sean Chittenden
7482a9207d
Chase casting types.CheckID to a string into the state_store.
...
It turns out the indexer can only use strings as arguments when
creating a query. Cast `types.CheckID` to a `string` before calling
into `memdb`.
Ideally the indexer would be smart enough to do this at compile-time,
but I need to look into how to do this without reflection and the
runtime package. For the time being statically cast `types.CheckID`
to a `string` at the call sites.
2016-06-07 16:59:02 -04:00
Sean Chittenden
ff45f8c8ff
Revert "Move `structs.CheckID` to a new top-level package, `types`."
...
This reverts commit 2bbd52e3b44ff1b60939a8400264d534662d6d51.
2016-06-07 16:59:02 -04:00
Sean Chittenden
a4554b945c
Move `structs.CheckID` to a new top-level package, `types`.
...
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden
cd68cd3868
Move `structs.CheckID` to a new top-level package, `types`.
...
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden
0857e93d0b
Float a type balloon. Some strings are square pegs in round holes.
...
This experiment was brought about because of variable naming
confusion where name and checkIDs were interchanged. Gave CheckID
an Qualified Type Name and chased downstream changes.
2016-06-07 16:59:02 -04:00
James Phillips
ffcba3df58
Merge pull request #2028 from hashicorp/f-atomic-kv
...
Adds support for atomic transactions spanning multiple KV entries.
2016-05-15 13:46:05 -07:00
Sean Chittenden
3756fb23a6
Remove unused peers variable from setupRaft().
2016-05-15 06:40:46 -07:00
James Phillips
a11f32a1da
Adds a get-tree verb to KV transaction operations.
2016-05-13 16:57:39 -07:00
James Phillips
0f94a7a326
Switches GETs to a filtering model for ACLs.
2016-05-13 15:58:55 -07:00
James Phillips
5fd99b13ef
Removes null results for deletes, and preps for more than one result from an operation.
2016-05-13 01:47:55 -07:00
James Phillips
2649a6336e
Adds a read-only optimized path for transactions.
2016-05-13 00:34:05 -07:00
James Phillips
0c34ed078c
Adds a comment for the txnKVS() function.
2016-05-12 16:11:26 -07:00
James Phillips
88b1c7d054
Makes get fail a transaction if the key doesn't exist.
2016-05-11 14:18:31 -07:00
James Phillips
3d35acaa90
De-nests the KV output structure (removes DirEnt member).
2016-05-11 13:48:03 -07:00
James Phillips
04a13ec3d7
Switches to "KV" instead of "KV" for the KV operations.
2016-05-11 10:58:27 -07:00
James Phillips
dc662f7e35
Refactors TxnRequest/TxnResponse into a form that will allow non-KV ops.
...
This isn't needed/used yet, but it's a good hook to get in there so we
can add more atomic operations in the future. The Go API hides this detail
so that feels like a KV-specific API. The implications on the REST API are
pretty minimal.
2016-05-11 01:39:10 -07:00
James Phillips
d980cbcd9d
Moves txn code into a new endpoint, not specific to KV.
2016-05-10 21:58:02 -07:00
James Phillips
907d8bab34
Fixes some go vet findings in a unit test.
2016-05-10 20:01:52 -07:00
Sean Chittenden
94e2766423
Remove stray type definition
...
Noticed while working on Nomad Client's server selection code.
2016-05-10 18:56:28 -07:00
James Phillips
4eb89481df
Adds internal endpoint read ACL support and full unit tests.
2016-05-10 11:23:47 -07:00
James Phillips
6a96e052c4
Adds an empty get test case.
2016-05-09 22:18:26 -07:00
James Phillips
471160d8f0
Performs basic plumbing of KVS transactions through all the layers.
2016-05-09 22:15:49 -07:00
James Phillips
dca00c96f7
Adds state store support for atomic KVS ops.
2016-05-05 15:46:59 -07:00
James Phillips
a1a59bee73
Splits existing KVS operations into *Txn helpers for later reuse.
2016-05-04 14:20:11 -07:00
James Phillips
9185450fd5
Moves KVS-related state store code out into its own set of files.
2016-05-02 16:21:04 -07:00
Sean Chittenden
c16b1ca178
Add the list of Raft peers to Consul's Stats
...
```
% consul info
[snip]
raft:
[snip]
raft_peers = 127.0.0.1:8300
[snip]
```
Poached from: Nomad Project
2016-04-28 15:08:48 -07:00
James Phillips
79153c3014
Merge pull request #1884 from mtchavez/1541-data-dir-perms
...
command: Data directory permission error message
2016-04-12 22:06:49 -07:00
James Phillips
6e177a9b44
Merge pull request #1895 from shoenig/fixtypo
...
doc: fix trivial typo s/NewFSMPath/NewFSM/
2016-04-12 21:53:24 -07:00
James Phillips
3f340716fd
Adds a clone method to HealthCheck and uses that in local.go.
2016-04-11 00:05:39 -07:00
Chavez
c9602c561c
Add description to rpc test client pool member failure message
2016-04-01 19:17:38 -07:00
Seth Hoenig
7f67c123b7
doc: fix trivial typo s/NewFSMPath/NewFSM/
2016-03-29 20:52:17 -05:00
Sean Chittenden
5ae7835988
Rename server_details package to agent
2016-03-29 17:39:19 -07:00
Sean Chittenden
7f06c71650
Add a quick package doc for the servers package
2016-03-29 16:22:53 -07:00
Sean Chittenden
897282f77d
Rename serverConfig to serverList
...
serverList is a vastly more accurate name. Chase accordingly. No functional change other than types and APIs.
2016-03-29 16:17:16 -07:00
Sean Chittenden
4984b6111d
Gratuitous rename 1/2
...
Reduce cognative load and perform an overdue rename. No functional change.
Rename the `server_manager` package to `servers`. Rename the `ServerManager` package to `Manager`. In `client`, rename `serverMgr` to `servers`.
2016-03-29 16:12:00 -07:00
Sean Chittenden
4734e0113f
Remove two unused constants
2016-03-29 11:11:41 -07:00
Sean Chittenden
cb9833b134
Remove useless comment residual from decomposing functions
2016-03-29 10:53:00 -07:00
Sean Chittenden
1f049a3c38
EDYSLEXICMOMENT
2016-03-29 10:50:10 -07:00
Sean Chittenden
177f64134e
Refactor out recocileServerList anon function
...
Add testing to reconcileServerList and test various server sizes.
Test that a percentage of nodes fail their Ping (50% in testing atm)
2016-03-29 02:45:38 -07:00
Sean Chittenden
6609ee5d51
Teach fauxConnPool to fail a pct of the time
...
50% failure rate seems legit as a starting point w/ 100 servers.
2016-03-28 14:53:29 -07:00
Sean Chittenden
7d26f7bfa7
Call NotifyFailedServers to rotate the server list
2016-03-28 14:12:41 -07:00
Sean Chittenden
6a987062b9
Add log line re: server manager backing off and sleeping
...
This is useful in situations where the RPC rotate duration is greater than 1µs. WTB exponential backoff of logging so we don't spam forever.
2016-03-28 14:04:04 -07:00
Sean Chittenden
689b79aef3
Remove old debugging lines of questionable future value
2016-03-28 14:02:53 -07:00
Sean Chittenden
0b0a07a280
Shuffle in place
...
Don't create a copy and save the copy, not necessary any more.
2016-03-28 14:02:27 -07:00
Sean Chittenden
e230b3a3b7
Nuke unnecessary comment
...
See above function comments for details
2016-03-28 13:57:36 -07:00
Sean Chittenden
34a29a2107
Move FIXME comment to the right call site
2016-03-28 13:49:55 -07:00
Sean Chittenden
b38d3d71c8
Rename the ConnPoolPinger interface to Pinger
2016-03-28 13:46:01 -07:00
Sean Chittenden
d6b4345375
Return error from PingConsulServer
...
In order to report why a Ping failed, change the signature of PingConsulServers to include an error message.
2016-03-28 13:38:58 -07:00
Sean Chittenden
6c9fb06511
Change the definition of the ServerDetails struct key
...
Use only the serf Name for now. Leaving the plumbing for now.
2016-03-28 12:53:19 -07:00
Sean Chittenden
2bcff6bac4
Correct the comment to match reality
2016-03-28 12:32:30 -07:00
Sean Chittenden
fc1edea1ef
Rename serverCfg to sc for consistency
2016-03-28 12:06:26 -07:00
Sean Chittenden
988b05700d
Add a quick length check
...
Verify that AddServer behaved as expected
2016-03-28 11:38:12 -07:00
Sean Chittenden
7181e42ba8
Switch the order of ServerDetails.String()
...
It's more natrual to have the network first. I think I flipped the order accidentally.
2016-03-28 11:37:25 -07:00
Sean Chittenden
dca8fd2643
Move rebalance log statement from INFO to DEBUG
2016-03-27 01:32:04 -07:00
Sean Chittenden
180edd8e7b
Chase the API bump re: refreshServerRebalanceTimer
...
If it works in prod, why shouldn't it work in the tests?
2016-03-27 00:04:52 -07:00
Sean Chittenden
9b5dd7a785
Move initialization of the rebalanceTimer to New()
2016-03-27 00:03:48 -07:00
Sean Chittenden
86d1bad541
Add a test for ConnPool.PingConsulServer
...
Spin up 5x servers, join and ping each server
2016-03-26 23:52:06 -07:00
Sean Chittenden
f903005080
Expose ServerManager.ResetRebalanceTimer
...
Move the rebalance timer from ServerManager.Start's stack to struct ServerManager. This makes it possible to shuffle during tests without actually waiting >120s.
2016-03-26 23:41:01 -07:00
Sean Chittenden
2ba281bc5a
Logging improvements
...
Comment out noisly loggers for the time being.
Improve the final logging statement to be useful and hint what the next active server for the client is going to be.
2016-03-26 22:41:08 -07:00
Sean Chittenden
fab3981b1d
Standardize the log message based on the package
...
This log statement used to belong in the consul package but has since moved to the server manager package.
2016-03-26 22:29:00 -07:00
Sean Chittenden
c6d9c42d9f
Reduce the error level from Fatal when unit testing
2016-03-26 22:07:09 -07:00
Sean Chittenden
4747cf3cab
Start server rebalance task after init'ing Serf
...
Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup. When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.
2016-03-26 22:04:41 -07:00
Sean Chittenden
2ddf82d9d8
Catch up to a few renames
2016-03-26 19:32:11 -07:00
Sean Chittenden
640ced7c11
Use empty string for addr in ServerDetails.String()
2016-03-26 19:30:04 -07:00
Sean Chittenden
e0f29c17cd
Guard against a nil ServerDetails.Addr
...
It's not clear how or why this would ever be nil, but some of the unit tests produce a nil addr. Be defensive.
2016-03-26 19:29:31 -07:00
Sean Chittenden
2d9982eb27
Proactively ping server before rotation
...
Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.
2016-03-26 19:28:13 -07:00
Sean Chittenden
b3a8e2f115
Factor out the shuffle server
2016-03-26 19:19:04 -07:00
Sean Chittenden
766ddae165
Revise comments re: cycleServer
...
Improve the comments to discuss what happens presently. Add a note to consider possibly calling to TestConsulServer proactively.
2016-03-26 18:53:13 -07:00
Sean Chittenden
ac1d42e9d8
Comment why the interface is needed: cyclic import
2016-03-26 18:38:35 -07:00
Sean Chittenden
a9b3dba05f
Add a struct key type for server_details
2016-03-26 17:58:12 -07:00
Sean Chittenden
496f05b561
Add additional checks
2016-03-25 14:40:46 -07:00
Sean Chittenden
c18158aac3
Delete the right tag
...
"role" != "consul"
2016-03-25 14:31:48 -07:00
Sean Chittenden
b44554f882
Don't pass in sm, server manager is already in scope
...
Go closures are implicitly capturing lambdas.
2016-03-25 14:10:09 -07:00
Sean Chittenden
2713899a5b
Trim residual complexity from server join notifications
...
Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list.
Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`
2016-03-25 14:06:35 -07:00
Sean Chittenden
b3298ce4c3
Only log in FindServers
...
In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.
2016-03-25 13:58:50 -07:00
Sean Chittenden
f024272ab2
Initialize the rebalancce to clientRPCMinReuseDuration
...
In an earlier version there was a channel to notify when a new server was added, however this has long since been removed. Just default to the sane value of 2min before the first rebalance calc takes place.
Pointed out by: slackpad
2016-03-25 13:46:18 -07:00
Sean Chittenden
89311a5859
Use range vs for
...
Returning a new array vs mutating an array in place so we can use range now.
2016-03-25 13:08:08 -07:00
Sean Chittenden
643997623e
Comment updates
2016-03-25 13:06:59 -07:00
Sean Chittenden
072f34cf02
Only rotate server list with more than one server
...
Fantastic observation by slackpad. This was left over from when there was a boolean for health in the server struct (vs current strategy where we use server position in the list and rely on serf to cleanup the stale members).
Pointed out by: slackpad
2016-03-25 12:54:36 -07:00
Sean Chittenden
aadd274a13
Relocate saveServerConfig next to getServerConfig
...
Requested by: slackpad
2016-03-25 12:41:22 -07:00
Sean Chittenden
cf271e7f65
Clarify that ConsulClusterInfo is an interface over serf
...
An interface was used to break a cyclic import dependency.
2016-03-25 12:38:40 -07:00
Sean Chittenden
973d924ab4
Reword comment after moving code into new packages
2016-03-25 12:34:46 -07:00
Sean Chittenden
78ec9f241d
Change initialReblaanaceTimeout to a time.Duration
...
Pointed out by: @slackpad
2016-03-25 12:34:12 -07:00
Sean Chittenden
328728c88a
Negative check: test an invalid condition
2016-03-25 12:22:33 -07:00
Sean Chittenden
22e546ff32
Test to make sure bootstrap is missing
2016-03-25 12:20:12 -07:00
Sean Chittenden
5f035da4f1
Be more Go idiomatic w/ variable names: s/valid/ok/g
...
Cargo culting is bad, m'kay?
Pointy Hat: sean-
2016-03-25 12:14:24 -07:00
Sean Chittenden
e041c3905d
Fix stale comment
...
Pointed out by: @slackpad
2016-03-25 12:00:40 -07:00
Sean Chittenden
45fc7c362e
Add a comment for Client serverMgr
2016-03-25 11:59:27 -07:00
Sean Chittenden
5873b7e28e
Correct a bogus goimport rewrite for tests
2016-03-23 22:35:49 -07:00
Sean Chittenden
dcc64d91c6
Test ServerManager.refreshServerRebalanceTimer
...
Change the signature so it returns a value so that this can be tested externally with mock data. See the sample table in TestServerManagerInternal_refreshServerRebalanceTimer() for the rate at which it will back off. This function is mostly used to not cripple large clusters in the event of a partition.
2016-03-23 22:10:50 -07:00
Sean Chittenden
8e3b3d766d
Add a handful more unit tests to the public interface
2016-03-23 22:10:50 -07:00
Sean Chittenden
d5f72e8c07
Rename GetNumServers to NumServers()
...
Matches the style of the rest of the repo
2016-03-23 22:10:50 -07:00
Sean Chittenden
9de9cf90f1
Rename NewServerManger to just New
...
Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.
2016-03-23 22:10:50 -07:00
Sean Chittenden
7faea986a0
Rename FindHealthyServer() to FindServer()
...
There is no guarantee the server coming back is healthy. It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.
2016-03-23 22:10:50 -07:00
Sean Chittenden
18885e3214
cycleServer is a pure function, save the result
2016-03-23 22:10:50 -07:00
Sean Chittenden
4ec9ed4de2
Missed unit test cruft
2016-03-23 22:10:50 -07:00
Sean Chittenden
b906e40811
Update comments to reflect reality
2016-03-23 22:10:50 -07:00
Sean Chittenden
1a09a5b2cf
Remove additional cruft from ServerManager's channels
...
No longer needed code.
2016-03-23 22:10:50 -07:00
Sean Chittenden
c980d492c6
Emulate a TryLock using atomic.CompareAndSwap
...
Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.
2016-03-23 22:10:50 -07:00
Sean Chittenden
102dcafe76
Make use of interfaces
...
Use an interface instead of serf.Serf as arg to NewServerManager. Bonus points for improved testability.
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden
231768faea
Simplify error handling
...
Rely on Serf for liveliness. In the event of a failure, simply cycle the server to the end of the list. If the server is unhealthy, Serf will reap the dead server.
Additional simplifications:
*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden
0c519aa90d
Unbreak client tests by reverting to original test
...
Debugging code crept into the actual test and hung out for much longer than it should have.
2016-03-23 22:10:50 -07:00
Sean Chittenden
26e51376d9
Introduce asynchronous management of consul server lists
...
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance. Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden
6ed37d1d8d
Comment nits
2016-03-23 22:10:50 -07:00
Sean Chittenden
c8ab3ae4cb
Use saveServerConfig vs atomic.Value.Store(config)
2016-03-23 22:10:50 -07:00
Sean Chittenden
12377e80e6
Commit a handful of refactoring && copy/paste-o fixes
2016-03-23 22:10:50 -07:00
Sean Chittenden
c1c17f158b
Mutate copies of serverCfg.servers, not original
...
Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.
2016-03-23 22:10:50 -07:00
Sean Chittenden
753766cc5d
rebalanceTimer may be nil during initialization
...
When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.
2016-03-23 22:10:50 -07:00
Sean Chittenden
d0e2792d5c
Properly retain a pointer to the rebalanceTimer
2016-03-23 22:10:50 -07:00
Sean Chittenden
62785de865
Cosmetic and various other wordsmithing cleanups
2016-03-23 22:10:50 -07:00
Sean Chittenden
31de4290cf
Document the various functions and their locking
2016-03-23 22:10:50 -07:00
Sean Chittenden
ffcd939feb
Use config convenience method to get config
...
'cause ELETTHECOMPILERSDOTHEWORK. I don't need that cluttering up the subconscious with more complexity.
2016-03-23 22:10:50 -07:00
Sean Chittenden
ed7fee7a3c
Move consul.serverConfig out of the consul package
...
Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden
ab80393198
Rename serverConfigMtx to serverConfigLock
...
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00