Commit Graph

1205 Commits

Author SHA1 Message Date
James Phillips ce93b82e1e
Makes port selection atomic in unit tests. 2016-09-01 01:01:28 -07:00
James Phillips d04a706a7c
Tweaks comment to be more correct. 2016-08-31 23:54:53 -07:00
James Phillips 4dd9b4b08a
Adds check that aborts bootstrap mode if there's an existing cluster. 2016-08-31 21:25:56 -07:00
James Phillips 750e1751ac
Copies the member data instead of referencing by pointer. 2016-08-30 16:54:21 -07:00
James Phillips 6be1e07fec
Makes the Raft configuration API easier to consume. 2016-08-30 11:30:56 -07:00
James Phillips 5df4b6bef2
Adds a log warning when operator peer changes occur. 2016-08-30 10:23:32 -07:00
James Phillips 1b7a16b7d3
Adds new consul operator endpoint, CLI, and ACL and some basic Raft commands. 2016-08-30 00:02:50 -07:00
James Phillips 29e52307cb
Makes empty checkServiceNode return a nil.
The change in #2308 had an inadvertent interface change, so we fix that with
a special case in this fix.
2016-08-29 19:12:07 -07:00
James Phillips 327fe725d9
Preallocates result struct, which was a profiling hot spot. 2016-08-26 16:34:28 -07:00
James Phillips c5b6ac3655
Removes leader_lease_timeout from stats. 2016-08-25 15:39:19 -07:00
James Phillips 2f4c237cff
Adds a max raft multiplier and tweaks documentation. 2016-08-25 15:36:05 -07:00
James Phillips 5df36fbd82
Stops scaling the commit timeout. 2016-08-25 15:05:40 -07:00
James Phillips f65ef936cb
Increases RPC hold timeout for new default timing.
Rather than scale this we just bump it up a bit. It'll be on the edge in
the lower-performance default mode, and will have plenty of margin in the
high-performance mode. This seems like a reasonable compromise to keep the
logic here simple vs. scaling, and seems inline with the expectations of
the different modes of operation.
2016-08-24 23:35:28 -07:00
James Phillips b339b0d2fc
Adds performance tuning capability for Raft, detuned defaults, and supplemental docs. 2016-08-24 21:58:37 -07:00
James Phillips 0bdbdf1ba8 Merge pull request #2226 from abhinavdahiya/rm-health-unknown
Fixes #1775; Removes 'unknown' state
2016-08-17 17:51:04 -07:00
James Phillips 1f539d9914
Makes the filled-in parts of ServiceNode more explicit. 2016-08-12 18:25:36 -07:00
David van Geest 360e196c93
Translate Address to tagged WAN address in HTTP API when appropriate. 2016-08-12 18:25:36 -07:00
James Phillips d11a7a197c
Removes upper end of muxado handler. 2016-08-09 18:16:41 -07:00
James Phillips 97a25e8564
Closes the conn on bad protocol version. 2016-08-09 18:13:53 -07:00
James Phillips 359587f70e
Removes support for muxado and protocol version 1. 2016-08-09 18:10:04 -07:00
James Phillips 99ab3390c2
Updates hashicorp/hcl and hashicorp/hil.
This required a small mod to core Consul code to cope with an interface
change.
2016-08-09 17:24:13 -07:00
James Phillips ff6d42389c Merge pull request #2222 from hashicorp/f-raft-v2
Integrates Consul with "stage one" of HashiCorp Raft library v2.
2016-08-09 16:04:48 -07:00
James Phillips cce38f9a4b
Moves the peers.info content down into a constant. 2016-08-09 11:56:39 -07:00
James Phillips 7aaa4bc913
Adds peers back into bootstrap log, makes initial case consistent. 2016-08-09 11:52:41 -07:00
James Phillips 7f58b05dfe
Tweaks select style. 2016-08-09 11:33:42 -07:00
James Phillips 544169999c
Adds I/O-sensitive metrics to ACL replication operations. 2016-08-09 11:32:12 -07:00
James Phillips 820509760d
Switches to a smooth rate limit vs. a bursty one. 2016-08-09 11:29:12 -07:00
James Phillips 129e327bc9
Clarifies replication index shown in the log message. 2016-08-09 11:10:32 -07:00
James Phillips 4203612bd7
Returns from the shutdown wait right away. 2016-08-09 11:09:48 -07:00
James Phillips e03fbef6b3
Moves ACL ID sorting interface onto the iterator. 2016-08-09 11:08:26 -07:00
James Phillips 0fa059ec49
Switches all ACL caches to 2Q. 2016-08-09 11:00:22 -07:00
James Phillips 1e75fa0362
Moves ACL ID generation down into the endpoint.
We don't want ACL replication to have this behavior so it was a
little dangerous to have in the shared helper function.
2016-08-09 00:11:00 -07:00
James Phillips 06a510a808
Removes unsafe "recover to empty" code.
This isn't safe because it would implicitly commit all outstanding log
entries. The new Raft library already has logic to not start a vote if
the current node isn't in the configuration, so this shoudn't be needed.
2016-08-08 19:19:19 -07:00
James Phillips dd3169b395
Tweaks recovery based on interface changes. 2016-08-08 19:19:18 -07:00
James Phillips 19004e7095
Moves to a safer design where we don't ingest the initial peers.json file. 2016-08-08 19:19:18 -07:00
James Phillips 44c468995f
Touches up Raft integration after latest changes. 2016-08-08 19:19:18 -07:00
James Phillips fc25145e85
Formats log messages to be consistent. 2016-08-08 19:19:18 -07:00
James Phillips 6b157eada0
Adds more comments about the raftSafeFn. 2016-08-08 19:19:18 -07:00
James Phillips fcd8bb157a
Clarifies a comment about no-op peer operations. 2016-08-08 19:19:18 -07:00
James Phillips 2bf633f206
Adds back "safing" the configuration when a server leaves. 2016-08-08 19:19:18 -07:00
James Phillips 6c8e8271e2
Integrates Consul with new version of Raft library. 2016-08-08 19:19:17 -07:00
James Phillips 4a931ae12e
Adds an ACL replication status endpoint. 2016-08-04 23:30:16 -07:00
James Phillips c94f1e1b83
Increases the ACL cache size to 10k. 2016-08-04 18:03:07 -07:00
James Phillips 3906517f70
Adds a full integrated test for ACL replication. 2016-08-04 17:59:08 -07:00
James Phillips f639f49cc0
Adds remaining core replication tests. 2016-08-04 16:33:40 -07:00
James Phillips defb39f8d4
Removes a TODO comment.
Decided we don't need to log anything about the token here. If the
token is not valid then the client will get an error about that, so
anything that can happen here is related to talking to the server in
the ACL datacenter, so not specific to the token.
2016-08-04 07:46:59 -07:00
James Phillips 93a7fd0561
Adds tests for the ACL reconcile algorithm. 2016-08-03 21:24:09 -07:00
James Phillips 796933b45b
Activates fallback to replicated ACLs. 2016-08-03 21:24:09 -07:00
James Phillips 9cece515c0
Adds basic ACL replication plumbing. 2016-08-03 21:24:04 -07:00
Abhinav Dahiya 9dc52449e3 Fixes #1775; Removes 'unknown' state
Signed-off-by: Abhinav Dahiya <abhinavdtu2012@gmail.com>
2016-07-30 19:33:14 +05:30
James Phillips a1266e4164 Adds some supplemental tests for RPC "no leader" retries.
This adds some extra tests for #2175.
2016-07-11 17:32:26 -06:00
Armon Dadgar 2d8cf9ef4a consul: change tests to not expect ErrNoLeader 2016-07-10 13:24:18 -04:00
Armon Dadgar 5d0a977bdf consul: Refactor forward to hold RPC when no leader is known 2016-07-10 13:24:06 -04:00
Armon Dadgar 191876f87e consul: Add RPCHoldTimeout as tunable hold period 2016-07-10 13:23:43 -04:00
Ryan Uber d8fd470f4f Merge pull request #1837 from cleung2010/obfuscate-acl-token
Obfuscate token for lookupACL error
2016-07-05 13:56:49 -07:00
Calvin Leung Huang 38134f1b8c Fix substring length on obfuscated token 2016-07-05 15:53:30 -04:00
Ryan Uber 577523fc73 consul: sort source node first if at position <= 10 in PQ's 2016-07-01 14:28:58 -07:00
Ryan Uber e9960e6c85 Merge pull request #2137 from hashicorp/f-pq-near
Support "near" parameter in prepared query service block
2016-07-01 12:28:48 -07:00
Ryan Uber ccbe86d7a8 consul: mention magic _agent token in struct comments 2016-07-01 11:50:30 -07:00
Ryan Uber ebacaa2d67 consul: send agent source data as separate query source 2016-06-30 16:51:18 -07:00
Ryan Uber 782a081925 consul: use source parameter for near prepared queries 2016-06-30 12:11:20 -07:00
Ryan Uber 270270a33a consul: send origin node + dc when executing prepared queries 2016-06-21 15:34:26 -07:00
Ryan Uber 925915c6ac consul: test baked-in distance sort 2016-06-21 12:54:18 -07:00
Ryan Uber 114e57fff1 consul: use the Near field instead of PreferLocal 2016-06-21 12:39:40 -07:00
James Phillips 8358df599d Merge pull request #2127 from hashicorp/b-remote-consuls-locking
Ensure locking of `Server`'s `remoteConsuls`.
2016-06-21 10:00:04 -07:00
James Phillips f9e2900692 Merge pull request #2131 from hashicorp/b-misc-microoptimizations
Misc micro optimizations
2016-06-21 09:59:01 -07:00
Sean Chittenden ebdb72ce0a
Ensure locking of `Server`'s `remoteConsuls`. 2016-06-20 22:59:49 -07:00
Sean Chittenden 72f7a4061c
Misc comment improvements 2016-06-20 15:29:38 -07:00
Sean Chittenden 9bf6e61655
Initialize a non-empty number of Consul Datacenters. No functional change. 2016-06-20 15:26:59 -07:00
Sean Chittenden b78c95d37e
Prefer rand.Int31n() over rand.Int31(). 2016-06-20 15:26:27 -07:00
Sean Chittenden e81bf2a505
Fix deadlock in Consul RTT.
- consul/rtt.go:388: s.getDatacentersByDistance().  Acquires RLock()
- consul/rtt.go:341: sortDatacentersByDistance() RLock still held.
- consul/rtt.go:282: getDatacenterDistance() RLock still held.
- consul/rtt.go:268: getNodesForDatacenter(). Attempts to reacquire RLock(), hangs indefinitely.
2016-06-20 14:59:54 -07:00
Ryan Uber 89fe991ab7 consul: test raw PreferLocal functionality 2016-06-20 14:53:13 -07:00
Ryan Uber 1fef85cd2e consul: support PreferLocal in PQ's 2016-06-20 14:24:40 -07:00
Sean Chittenden 7482a9207d
Chase casting types.CheckID to a string into the state_store.
It turns out the indexer can only use strings as arguments when
creating a query.  Cast `types.CheckID` to a `string` before calling
into `memdb`.

Ideally the indexer would be smart enough to do this at compile-time,
but I need to look into how to do this without reflection and the
runtime package.  For the time being statically cast `types.CheckID`
to a `string` at the call sites.
2016-06-07 16:59:02 -04:00
Sean Chittenden ff45f8c8ff
Revert "Move `structs.CheckID` to a new top-level package, `types`."
This reverts commit 2bbd52e3b44ff1b60939a8400264d534662d6d51.
2016-06-07 16:59:02 -04:00
Sean Chittenden a4554b945c
Move `structs.CheckID` to a new top-level package, `types`.
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden cd68cd3868
Move `structs.CheckID` to a new top-level package, `types`.
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden 0857e93d0b
Float a type balloon. Some strings are square pegs in round holes.
This experiment was brought about because of variable naming
confusion where name and checkIDs were interchanged.  Gave CheckID
an Qualified Type Name and chased downstream changes.
2016-06-07 16:59:02 -04:00
James Phillips ffcba3df58 Merge pull request #2028 from hashicorp/f-atomic-kv
Adds support for atomic transactions spanning multiple KV entries.
2016-05-15 13:46:05 -07:00
Sean Chittenden 3756fb23a6
Remove unused peers variable from setupRaft(). 2016-05-15 06:40:46 -07:00
James Phillips a11f32a1da Adds a get-tree verb to KV transaction operations. 2016-05-13 16:57:39 -07:00
James Phillips 0f94a7a326 Switches GETs to a filtering model for ACLs. 2016-05-13 15:58:55 -07:00
James Phillips 5fd99b13ef Removes null results for deletes, and preps for more than one result from an operation. 2016-05-13 01:47:55 -07:00
James Phillips 2649a6336e Adds a read-only optimized path for transactions. 2016-05-13 00:34:05 -07:00
James Phillips 0c34ed078c Adds a comment for the txnKVS() function. 2016-05-12 16:11:26 -07:00
James Phillips 88b1c7d054 Makes get fail a transaction if the key doesn't exist. 2016-05-11 14:18:31 -07:00
James Phillips 3d35acaa90 De-nests the KV output structure (removes DirEnt member). 2016-05-11 13:48:03 -07:00
James Phillips 04a13ec3d7 Switches to "KV" instead of "KV" for the KV operations. 2016-05-11 10:58:27 -07:00
James Phillips dc662f7e35 Refactors TxnRequest/TxnResponse into a form that will allow non-KV ops.
This isn't needed/used yet, but it's a good hook to get in there so we
can add more atomic operations in the future. The Go API hides this detail
so that feels like a KV-specific API. The implications on the REST API are
pretty minimal.
2016-05-11 01:39:10 -07:00
James Phillips d980cbcd9d Moves txn code into a new endpoint, not specific to KV. 2016-05-10 21:58:02 -07:00
James Phillips 907d8bab34 Fixes some go vet findings in a unit test. 2016-05-10 20:01:52 -07:00
Sean Chittenden 94e2766423
Remove stray type definition
Noticed while working on Nomad Client's server selection code.
2016-05-10 18:56:28 -07:00
James Phillips 4eb89481df Adds internal endpoint read ACL support and full unit tests. 2016-05-10 11:23:47 -07:00
James Phillips 6a96e052c4 Adds an empty get test case. 2016-05-09 22:18:26 -07:00
James Phillips 471160d8f0 Performs basic plumbing of KVS transactions through all the layers. 2016-05-09 22:15:49 -07:00
James Phillips dca00c96f7 Adds state store support for atomic KVS ops. 2016-05-05 15:46:59 -07:00
James Phillips a1a59bee73 Splits existing KVS operations into *Txn helpers for later reuse. 2016-05-04 14:20:11 -07:00
James Phillips 9185450fd5 Moves KVS-related state store code out into its own set of files. 2016-05-02 16:21:04 -07:00
Sean Chittenden c16b1ca178 Add the list of Raft peers to Consul's Stats
```
% consul info
[snip]
raft:
[snip]
	raft_peers = 127.0.0.1:8300
[snip]
```

Poached from: Nomad Project
2016-04-28 15:08:48 -07:00
James Phillips 79153c3014 Merge pull request #1884 from mtchavez/1541-data-dir-perms
command: Data directory permission error message
2016-04-12 22:06:49 -07:00
James Phillips 6e177a9b44 Merge pull request #1895 from shoenig/fixtypo
doc: fix trivial typo s/NewFSMPath/NewFSM/
2016-04-12 21:53:24 -07:00
James Phillips 3f340716fd Adds a clone method to HealthCheck and uses that in local.go. 2016-04-11 00:05:39 -07:00
Chavez c9602c561c Add description to rpc test client pool member failure message 2016-04-01 19:17:38 -07:00
Seth Hoenig 7f67c123b7 doc: fix trivial typo s/NewFSMPath/NewFSM/ 2016-03-29 20:52:17 -05:00
Sean Chittenden 5ae7835988 Rename server_details package to agent 2016-03-29 17:39:19 -07:00
Sean Chittenden 7f06c71650 Add a quick package doc for the servers package 2016-03-29 16:22:53 -07:00
Sean Chittenden 897282f77d Rename serverConfig to serverList
serverList is a vastly more accurate name.  Chase accordingly.  No functional change other than types and APIs.
2016-03-29 16:17:16 -07:00
Sean Chittenden 4984b6111d Gratuitous rename 1/2
Reduce cognative load and perform an overdue rename.  No functional change.

Rename the `server_manager` package to `servers`.  Rename the `ServerManager` package to `Manager`.  In `client`, rename `serverMgr` to `servers`.
2016-03-29 16:12:00 -07:00
Sean Chittenden 4734e0113f Remove two unused constants 2016-03-29 11:11:41 -07:00
Sean Chittenden cb9833b134 Remove useless comment residual from decomposing functions 2016-03-29 10:53:00 -07:00
Sean Chittenden 1f049a3c38 EDYSLEXICMOMENT 2016-03-29 10:50:10 -07:00
Sean Chittenden 177f64134e Refactor out recocileServerList anon function
Add testing to reconcileServerList and test various server sizes.

Test that a percentage of nodes fail their Ping (50% in testing atm)
2016-03-29 02:45:38 -07:00
Sean Chittenden 6609ee5d51 Teach fauxConnPool to fail a pct of the time
50% failure rate seems legit as a starting point w/ 100 servers.
2016-03-28 14:53:29 -07:00
Sean Chittenden 7d26f7bfa7 Call NotifyFailedServers to rotate the server list 2016-03-28 14:12:41 -07:00
Sean Chittenden 6a987062b9 Add log line re: server manager backing off and sleeping
This is useful in situations where the RPC rotate duration is greater than 1µs.  WTB exponential backoff of logging so we don't spam forever.
2016-03-28 14:04:04 -07:00
Sean Chittenden 689b79aef3 Remove old debugging lines of questionable future value 2016-03-28 14:02:53 -07:00
Sean Chittenden 0b0a07a280 Shuffle in place
Don't create a copy and save the copy, not necessary any more.
2016-03-28 14:02:27 -07:00
Sean Chittenden e230b3a3b7 Nuke unnecessary comment
See above function comments for details
2016-03-28 13:57:36 -07:00
Sean Chittenden 34a29a2107 Move FIXME comment to the right call site 2016-03-28 13:49:55 -07:00
Sean Chittenden b38d3d71c8 Rename the ConnPoolPinger interface to Pinger 2016-03-28 13:46:01 -07:00
Sean Chittenden d6b4345375 Return error from PingConsulServer
In order to report why a Ping failed, change the signature of PingConsulServers to include an error message.
2016-03-28 13:38:58 -07:00
Sean Chittenden 6c9fb06511 Change the definition of the ServerDetails struct key
Use only the serf Name for now.  Leaving the plumbing for now.
2016-03-28 12:53:19 -07:00
Sean Chittenden 2bcff6bac4 Correct the comment to match reality 2016-03-28 12:32:30 -07:00
Sean Chittenden fc1edea1ef Rename serverCfg to sc for consistency 2016-03-28 12:06:26 -07:00
Sean Chittenden 988b05700d Add a quick length check
Verify that AddServer behaved as expected
2016-03-28 11:38:12 -07:00
Sean Chittenden 7181e42ba8 Switch the order of ServerDetails.String()
It's more natrual to have the network first.  I think I flipped the order accidentally.
2016-03-28 11:37:25 -07:00
Sean Chittenden dca8fd2643 Move rebalance log statement from INFO to DEBUG 2016-03-27 01:32:04 -07:00
Sean Chittenden 180edd8e7b Chase the API bump re: refreshServerRebalanceTimer
If it works in prod, why shouldn't it work in the tests?
2016-03-27 00:04:52 -07:00
Sean Chittenden 9b5dd7a785 Move initialization of the rebalanceTimer to New() 2016-03-27 00:03:48 -07:00
Sean Chittenden 86d1bad541 Add a test for ConnPool.PingConsulServer
Spin up 5x servers, join and ping each server
2016-03-26 23:52:06 -07:00
Sean Chittenden f903005080 Expose ServerManager.ResetRebalanceTimer
Move the rebalance timer from ServerManager.Start's stack to struct ServerManager.  This makes it possible to shuffle during tests without actually waiting >120s.
2016-03-26 23:41:01 -07:00
Sean Chittenden 2ba281bc5a Logging improvements
Comment out noisly loggers for the time being.

Improve the final logging statement to be useful and hint what the next active server for the client is going to be.
2016-03-26 22:41:08 -07:00
Sean Chittenden fab3981b1d Standardize the log message based on the package
This log statement used to belong in the consul package but has since moved to the server manager package.
2016-03-26 22:29:00 -07:00
Sean Chittenden c6d9c42d9f Reduce the error level from Fatal when unit testing 2016-03-26 22:07:09 -07:00
Sean Chittenden 4747cf3cab Start server rebalance task after init'ing Serf
Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup.  When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.
2016-03-26 22:04:41 -07:00
Sean Chittenden 2ddf82d9d8 Catch up to a few renames 2016-03-26 19:32:11 -07:00
Sean Chittenden 640ced7c11 Use empty string for addr in ServerDetails.String() 2016-03-26 19:30:04 -07:00
Sean Chittenden e0f29c17cd Guard against a nil ServerDetails.Addr
It's not clear how or why this would ever be nil, but some of the unit tests produce a nil addr.  Be defensive.
2016-03-26 19:29:31 -07:00
Sean Chittenden 2d9982eb27 Proactively ping server before rotation
Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.
2016-03-26 19:28:13 -07:00
Sean Chittenden b3a8e2f115 Factor out the shuffle server 2016-03-26 19:19:04 -07:00
Sean Chittenden 766ddae165 Revise comments re: cycleServer
Improve the comments to discuss what happens presently.  Add a note to consider possibly calling to TestConsulServer proactively.
2016-03-26 18:53:13 -07:00
Sean Chittenden ac1d42e9d8 Comment why the interface is needed: cyclic import 2016-03-26 18:38:35 -07:00
Sean Chittenden a9b3dba05f Add a struct key type for server_details 2016-03-26 17:58:12 -07:00
Sean Chittenden 496f05b561 Add additional checks 2016-03-25 14:40:46 -07:00
Sean Chittenden c18158aac3 Delete the right tag
"role" != "consul"
2016-03-25 14:31:48 -07:00
Sean Chittenden b44554f882 Don't pass in sm, server manager is already in scope
Go closures are implicitly capturing lambdas.
2016-03-25 14:10:09 -07:00
Sean Chittenden 2713899a5b Trim residual complexity from server join notifications
Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list.

Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`
2016-03-25 14:06:35 -07:00
Sean Chittenden b3298ce4c3 Only log in FindServers
In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.
2016-03-25 13:58:50 -07:00
Sean Chittenden f024272ab2 Initialize the rebalancce to clientRPCMinReuseDuration
In an earlier version there was a channel to notify when a new server was added, however this has long since been removed.  Just default to the sane value of 2min before the first rebalance calc takes place.

Pointed out by: slackpad
2016-03-25 13:46:18 -07:00
Sean Chittenden 89311a5859 Use range vs for
Returning a new array vs mutating an array in place so we can use range now.
2016-03-25 13:08:08 -07:00