open-consul

Author	SHA1	Message	Date
Kyle Havlovitz	9adc3854d1	Retry with backoff on session invalidation failure (#2475 )	2016-11-04 21:53:22 -07:00
James Phillips	be4056789f	Moves the snapshot package up one level. (#2472 )	2016-11-03 21:36:25 -07:00
Kyle Havlovitz	169cae2203	Disallow -bootstrap-expect flag in dev mode (#2464 )	2016-11-03 01:54:43 -04:00
Kyle Havlovitz	440611f9f7	Add snapshot inspect subcommand (#2451 )	2016-10-31 19:37:27 -04:00
Kyle Havlovitz	c6f461aa25	Enable snapshots in dev mode (#2453 )	2016-10-31 14:39:47 -04:00
James Phillips	bc29610124	Adds support for snapshots and restores. (#2396 ) * Updates Raft library to get new snapshot/restore API. * Basic backup and restore working, but need some cleanup. * Breaks out a snapshot module and adds a SHA256 integrity check. * Adds snapshot ACL and fills in some missing comments. * Require a consistent read for snapshots. * Make sure snapshot works if ACLs aren't enabled. * Adds a bit of package documentation. * Returns an empty response from restore to avoid EOF errors. * Adds API client support for snapshots. * Makes internal file names match on-disk file snapshots. * Adds DC and token coverage for snapshot API test. * Adds missing documentation. * Adds a unit test for the snapshot client endpoint. * Moves the connection pool out of the client for easier testing. * Fixes an incidental issue in the prepared query unit test. I realized I had two servers in bootstrap mode so this wasn't a good setup. * Adds a half close to the TCP stream and fixes panic on error. * Adds client and endpoint tests for snapshots. * Moves the pool back into the snapshot RPC client. * Adds a TLS test and fixes half-closes for TLS connections. * Tweaks some comments. * Adds a low-level snapshot test. This is independent of Consul so we can pull this out into a library later if we want to. * Cleans up snapshot and archive and completes archive tests. * Sends a clear error for snapshot operations in dev mode. Snapshots require the Raft snapshots to be readable, which isn't supported in dev mode. Send a clear error instead of a deep-down Raft one. * Adds docs for the snapshot endpoint. * Adds a stale mode and index feedback for snapshot saves. This gives folks a way to extract data even if the cluster has no leader. * Changes the internal format of a snapshot from zip to tgz. * Pulls in Raft fix to cancel inflight before a restore. * Pulls in new Raft restore interface. * Adds metadata to snapshot saves and a verify function. * Adds basic save and restore snapshot CLI commands. * Gets rid of tarball extensions and adds restore message. * Fixes an incidental bad link in the KV docs. * Adds documentation for the snapshot CLI commands. * Scuttle any request body when a snapshot is saved. * Fixes archive unit test error message check. * Allows for nil output writers in snapshot RPC handlers. * Renames hash list Decode to DecodeAndVerify. * Closes the client connection for snapshot ops. * Lowers timeout for restore ops. * Updates Raft vendor to get new Restore signature and integrates with Consul. * Bounces the leader's internal state when we do a restore.	2016-10-25 19:20:24 -07:00
Kyle Havlovitz	f0aa65754b	Wait for agent joins to finish in TestClient_RPC	2016-10-25 17:48:11 -07:00
Kyle Havlovitz	2379b822ad	Add wait logic to TestClient_RPC_Pool	2016-10-25 17:48:11 -07:00
James Phillips	b423dee303	Fixes port numbers in peers.info.	2016-10-05 18:09:15 -07:00
James Phillips	8e76af4311	Merge pull request #2319 from hashicorp/f-bootstrap-abort Adds check that aborts bootstrap mode if there's an existing cluster.	2016-09-01 09:49:03 -07:00
James Phillips	94c0e961eb	Fixes error message in test.	2016-09-01 09:48:08 -07:00
James Phillips	ce93b82e1e	Makes port selection atomic in unit tests.	2016-09-01 01:01:28 -07:00
James Phillips	d04a706a7c	Tweaks comment to be more correct.	2016-08-31 23:54:53 -07:00
James Phillips	4dd9b4b08a	Adds check that aborts bootstrap mode if there's an existing cluster.	2016-08-31 21:25:56 -07:00
James Phillips	750e1751ac	Copies the member data instead of referencing by pointer.	2016-08-30 16:54:21 -07:00
James Phillips	6be1e07fec	Makes the Raft configuration API easier to consume.	2016-08-30 11:30:56 -07:00
James Phillips	5df4b6bef2	Adds a log warning when operator peer changes occur.	2016-08-30 10:23:32 -07:00
James Phillips	1b7a16b7d3	Adds new consul operator endpoint, CLI, and ACL and some basic Raft commands.	2016-08-30 00:02:50 -07:00
James Phillips	29e52307cb	Makes empty checkServiceNode return a nil. The change in #2308 had an inadvertent interface change, so we fix that with a special case in this fix.	2016-08-29 19:12:07 -07:00
James Phillips	327fe725d9	Preallocates result struct, which was a profiling hot spot.	2016-08-26 16:34:28 -07:00
James Phillips	c5b6ac3655	Removes leader_lease_timeout from stats.	2016-08-25 15:39:19 -07:00
James Phillips	2f4c237cff	Adds a max raft multiplier and tweaks documentation.	2016-08-25 15:36:05 -07:00
James Phillips	5df36fbd82	Stops scaling the commit timeout.	2016-08-25 15:05:40 -07:00
James Phillips	f65ef936cb	Increases RPC hold timeout for new default timing. Rather than scale this we just bump it up a bit. It'll be on the edge in the lower-performance default mode, and will have plenty of margin in the high-performance mode. This seems like a reasonable compromise to keep the logic here simple vs. scaling, and seems inline with the expectations of the different modes of operation.	2016-08-24 23:35:28 -07:00
James Phillips	b339b0d2fc	Adds performance tuning capability for Raft, detuned defaults, and supplemental docs.	2016-08-24 21:58:37 -07:00
James Phillips	0bdbdf1ba8	Merge pull request #2226 from abhinavdahiya/rm-health-unknown Fixes #1775; Removes 'unknown' state	2016-08-17 17:51:04 -07:00
James Phillips	1f539d9914	Makes the filled-in parts of ServiceNode more explicit.	2016-08-12 18:25:36 -07:00
David van Geest	360e196c93	Translate Address to tagged WAN address in HTTP API when appropriate.	2016-08-12 18:25:36 -07:00
James Phillips	d11a7a197c	Removes upper end of muxado handler.	2016-08-09 18:16:41 -07:00
James Phillips	97a25e8564	Closes the conn on bad protocol version.	2016-08-09 18:13:53 -07:00
James Phillips	359587f70e	Removes support for muxado and protocol version 1.	2016-08-09 18:10:04 -07:00
James Phillips	99ab3390c2	Updates hashicorp/hcl and hashicorp/hil. This required a small mod to core Consul code to cope with an interface change.	2016-08-09 17:24:13 -07:00
James Phillips	ff6d42389c	Merge pull request #2222 from hashicorp/f-raft-v2 Integrates Consul with "stage one" of HashiCorp Raft library v2.	2016-08-09 16:04:48 -07:00
James Phillips	cce38f9a4b	Moves the peers.info content down into a constant.	2016-08-09 11:56:39 -07:00
James Phillips	7aaa4bc913	Adds peers back into bootstrap log, makes initial case consistent.	2016-08-09 11:52:41 -07:00
James Phillips	7f58b05dfe	Tweaks select style.	2016-08-09 11:33:42 -07:00
James Phillips	544169999c	Adds I/O-sensitive metrics to ACL replication operations.	2016-08-09 11:32:12 -07:00
James Phillips	820509760d	Switches to a smooth rate limit vs. a bursty one.	2016-08-09 11:29:12 -07:00
James Phillips	129e327bc9	Clarifies replication index shown in the log message.	2016-08-09 11:10:32 -07:00
James Phillips	4203612bd7	Returns from the shutdown wait right away.	2016-08-09 11:09:48 -07:00
James Phillips	e03fbef6b3	Moves ACL ID sorting interface onto the iterator.	2016-08-09 11:08:26 -07:00
James Phillips	0fa059ec49	Switches all ACL caches to 2Q.	2016-08-09 11:00:22 -07:00
James Phillips	1e75fa0362	Moves ACL ID generation down into the endpoint. We don't want ACL replication to have this behavior so it was a little dangerous to have in the shared helper function.	2016-08-09 00:11:00 -07:00
James Phillips	06a510a808	Removes unsafe "recover to empty" code. This isn't safe because it would implicitly commit all outstanding log entries. The new Raft library already has logic to not start a vote if the current node isn't in the configuration, so this shoudn't be needed.	2016-08-08 19:19:19 -07:00
James Phillips	dd3169b395	Tweaks recovery based on interface changes.	2016-08-08 19:19:18 -07:00
James Phillips	19004e7095	Moves to a safer design where we don't ingest the initial peers.json file.	2016-08-08 19:19:18 -07:00
James Phillips	44c468995f	Touches up Raft integration after latest changes.	2016-08-08 19:19:18 -07:00
James Phillips	fc25145e85	Formats log messages to be consistent.	2016-08-08 19:19:18 -07:00
James Phillips	6b157eada0	Adds more comments about the raftSafeFn.	2016-08-08 19:19:18 -07:00
James Phillips	fcd8bb157a	Clarifies a comment about no-op peer operations.	2016-08-08 19:19:18 -07:00
James Phillips	2bf633f206	Adds back "safing" the configuration when a server leaves.	2016-08-08 19:19:18 -07:00
James Phillips	6c8e8271e2	Integrates Consul with new version of Raft library.	2016-08-08 19:19:17 -07:00
James Phillips	4a931ae12e	Adds an ACL replication status endpoint.	2016-08-04 23:30:16 -07:00
James Phillips	c94f1e1b83	Increases the ACL cache size to 10k.	2016-08-04 18:03:07 -07:00
James Phillips	3906517f70	Adds a full integrated test for ACL replication.	2016-08-04 17:59:08 -07:00
James Phillips	f639f49cc0	Adds remaining core replication tests.	2016-08-04 16:33:40 -07:00
James Phillips	defb39f8d4	Removes a TODO comment. Decided we don't need to log anything about the token here. If the token is not valid then the client will get an error about that, so anything that can happen here is related to talking to the server in the ACL datacenter, so not specific to the token.	2016-08-04 07:46:59 -07:00
James Phillips	93a7fd0561	Adds tests for the ACL reconcile algorithm.	2016-08-03 21:24:09 -07:00
James Phillips	796933b45b	Activates fallback to replicated ACLs.	2016-08-03 21:24:09 -07:00
James Phillips	9cece515c0	Adds basic ACL replication plumbing.	2016-08-03 21:24:04 -07:00
Abhinav Dahiya	9dc52449e3	Fixes #1775 ; Removes 'unknown' state Signed-off-by: Abhinav Dahiya <abhinavdtu2012@gmail.com>	2016-07-30 19:33:14 +05:30
James Phillips	a1266e4164	Adds some supplemental tests for RPC "no leader" retries. This adds some extra tests for #2175.	2016-07-11 17:32:26 -06:00
Armon Dadgar	2d8cf9ef4a	consul: change tests to not expect ErrNoLeader	2016-07-10 13:24:18 -04:00
Armon Dadgar	5d0a977bdf	consul: Refactor forward to hold RPC when no leader is known	2016-07-10 13:24:06 -04:00
Armon Dadgar	191876f87e	consul: Add RPCHoldTimeout as tunable hold period	2016-07-10 13:23:43 -04:00
Ryan Uber	d8fd470f4f	Merge pull request #1837 from cleung2010/obfuscate-acl-token Obfuscate token for lookupACL error	2016-07-05 13:56:49 -07:00
Calvin Leung Huang	38134f1b8c	Fix substring length on obfuscated token	2016-07-05 15:53:30 -04:00
Ryan Uber	577523fc73	consul: sort source node first if at position <= 10 in PQ's	2016-07-01 14:28:58 -07:00
Ryan Uber	e9960e6c85	Merge pull request #2137 from hashicorp/f-pq-near Support "near" parameter in prepared query service block	2016-07-01 12:28:48 -07:00
Ryan Uber	ccbe86d7a8	consul: mention magic _agent token in struct comments	2016-07-01 11:50:30 -07:00
Ryan Uber	ebacaa2d67	consul: send agent source data as separate query source	2016-06-30 16:51:18 -07:00
Ryan Uber	782a081925	consul: use source parameter for near prepared queries	2016-06-30 12:11:20 -07:00
Ryan Uber	270270a33a	consul: send origin node + dc when executing prepared queries	2016-06-21 15:34:26 -07:00
Ryan Uber	925915c6ac	consul: test baked-in distance sort	2016-06-21 12:54:18 -07:00
Ryan Uber	114e57fff1	consul: use the Near field instead of PreferLocal	2016-06-21 12:39:40 -07:00
James Phillips	8358df599d	Merge pull request #2127 from hashicorp/b-remote-consuls-locking Ensure locking of `Server`'s `remoteConsuls`.	2016-06-21 10:00:04 -07:00
James Phillips	f9e2900692	Merge pull request #2131 from hashicorp/b-misc-microoptimizations Misc micro optimizations	2016-06-21 09:59:01 -07:00
Sean Chittenden	ebdb72ce0a	Ensure locking of `Server`'s `remoteConsuls`.	2016-06-20 22:59:49 -07:00
Sean Chittenden	72f7a4061c	Misc comment improvements	2016-06-20 15:29:38 -07:00
Sean Chittenden	9bf6e61655	Initialize a non-empty number of Consul Datacenters. No functional change.	2016-06-20 15:26:59 -07:00
Sean Chittenden	b78c95d37e	Prefer rand.Int31n() over rand.Int31().	2016-06-20 15:26:27 -07:00
Sean Chittenden	e81bf2a505	Fix deadlock in Consul RTT. - consul/rtt.go:388: s.getDatacentersByDistance(). Acquires RLock() - consul/rtt.go:341: sortDatacentersByDistance() RLock still held. - consul/rtt.go:282: getDatacenterDistance() RLock still held. - consul/rtt.go:268: getNodesForDatacenter(). Attempts to reacquire RLock(), hangs indefinitely.	2016-06-20 14:59:54 -07:00
Ryan Uber	89fe991ab7	consul: test raw PreferLocal functionality	2016-06-20 14:53:13 -07:00
Ryan Uber	1fef85cd2e	consul: support PreferLocal in PQ's	2016-06-20 14:24:40 -07:00
Sean Chittenden	7482a9207d	Chase casting types.CheckID to a string into the state_store. It turns out the indexer can only use strings as arguments when creating a query. Cast `types.CheckID` to a `string` before calling into `memdb`. Ideally the indexer would be smart enough to do this at compile-time, but I need to look into how to do this without reflection and the runtime package. For the time being statically cast `types.CheckID` to a `string` at the call sites.	2016-06-07 16:59:02 -04:00
Sean Chittenden	ff45f8c8ff	Revert "Move `structs.CheckID` to a new top-level package, `types`." This reverts commit 2bbd52e3b44ff1b60939a8400264d534662d6d51.	2016-06-07 16:59:02 -04:00
Sean Chittenden	a4554b945c	Move `structs.CheckID` to a new top-level package, `types`. Per discussion w/ @slackpad, move this type to its own top-level package	2016-06-07 16:59:02 -04:00
Sean Chittenden	cd68cd3868	Move `structs.CheckID` to a new top-level package, `types`. Per discussion w/ @slackpad, move this type to its own top-level package	2016-06-07 16:59:02 -04:00
Sean Chittenden	0857e93d0b	Float a type balloon. Some strings are square pegs in round holes. This experiment was brought about because of variable naming confusion where name and checkIDs were interchanged. Gave CheckID an Qualified Type Name and chased downstream changes.	2016-06-07 16:59:02 -04:00
James Phillips	ffcba3df58	Merge pull request #2028 from hashicorp/f-atomic-kv Adds support for atomic transactions spanning multiple KV entries.	2016-05-15 13:46:05 -07:00
Sean Chittenden	3756fb23a6	Remove unused peers variable from setupRaft().	2016-05-15 06:40:46 -07:00
James Phillips	a11f32a1da	Adds a get-tree verb to KV transaction operations.	2016-05-13 16:57:39 -07:00
James Phillips	0f94a7a326	Switches GETs to a filtering model for ACLs.	2016-05-13 15:58:55 -07:00
James Phillips	5fd99b13ef	Removes null results for deletes, and preps for more than one result from an operation.	2016-05-13 01:47:55 -07:00
James Phillips	2649a6336e	Adds a read-only optimized path for transactions.	2016-05-13 00:34:05 -07:00
James Phillips	0c34ed078c	Adds a comment for the txnKVS() function.	2016-05-12 16:11:26 -07:00
James Phillips	88b1c7d054	Makes get fail a transaction if the key doesn't exist.	2016-05-11 14:18:31 -07:00
James Phillips	3d35acaa90	De-nests the KV output structure (removes DirEnt member).	2016-05-11 13:48:03 -07:00
James Phillips	04a13ec3d7	Switches to "KV" instead of "KV" for the KV operations.	2016-05-11 10:58:27 -07:00
James Phillips	dc662f7e35	Refactors TxnRequest/TxnResponse into a form that will allow non-KV ops. This isn't needed/used yet, but it's a good hook to get in there so we can add more atomic operations in the future. The Go API hides this detail so that feels like a KV-specific API. The implications on the REST API are pretty minimal.	2016-05-11 01:39:10 -07:00
James Phillips	d980cbcd9d	Moves txn code into a new endpoint, not specific to KV.	2016-05-10 21:58:02 -07:00
James Phillips	907d8bab34	Fixes some go vet findings in a unit test.	2016-05-10 20:01:52 -07:00
Sean Chittenden	94e2766423	Remove stray type definition Noticed while working on Nomad Client's server selection code.	2016-05-10 18:56:28 -07:00
James Phillips	4eb89481df	Adds internal endpoint read ACL support and full unit tests.	2016-05-10 11:23:47 -07:00
James Phillips	6a96e052c4	Adds an empty get test case.	2016-05-09 22:18:26 -07:00
James Phillips	471160d8f0	Performs basic plumbing of KVS transactions through all the layers.	2016-05-09 22:15:49 -07:00
James Phillips	dca00c96f7	Adds state store support for atomic KVS ops.	2016-05-05 15:46:59 -07:00
James Phillips	a1a59bee73	Splits existing KVS operations into *Txn helpers for later reuse.	2016-05-04 14:20:11 -07:00
James Phillips	9185450fd5	Moves KVS-related state store code out into its own set of files.	2016-05-02 16:21:04 -07:00
Sean Chittenden	c16b1ca178	Add the list of Raft peers to Consul's Stats ``` % consul info [snip] raft: [snip] raft_peers = 127.0.0.1:8300 [snip] ``` Poached from: Nomad Project	2016-04-28 15:08:48 -07:00
James Phillips	79153c3014	Merge pull request #1884 from mtchavez/1541-data-dir-perms command: Data directory permission error message	2016-04-12 22:06:49 -07:00
James Phillips	6e177a9b44	Merge pull request #1895 from shoenig/fixtypo doc: fix trivial typo s/NewFSMPath/NewFSM/	2016-04-12 21:53:24 -07:00
James Phillips	3f340716fd	Adds a clone method to HealthCheck and uses that in local.go.	2016-04-11 00:05:39 -07:00
Chavez	c9602c561c	Add description to rpc test client pool member failure message	2016-04-01 19:17:38 -07:00
Seth Hoenig	7f67c123b7	doc: fix trivial typo s/NewFSMPath/NewFSM/	2016-03-29 20:52:17 -05:00
Sean Chittenden	5ae7835988	Rename server_details package to agent	2016-03-29 17:39:19 -07:00
Sean Chittenden	7f06c71650	Add a quick package doc for the servers package	2016-03-29 16:22:53 -07:00
Sean Chittenden	897282f77d	Rename serverConfig to serverList serverList is a vastly more accurate name. Chase accordingly. No functional change other than types and APIs.	2016-03-29 16:17:16 -07:00
Sean Chittenden	4984b6111d	Gratuitous rename 1/2 Reduce cognative load and perform an overdue rename. No functional change. Rename the `server_manager` package to `servers`. Rename the `ServerManager` package to `Manager`. In `client`, rename `serverMgr` to `servers`.	2016-03-29 16:12:00 -07:00
Sean Chittenden	4734e0113f	Remove two unused constants	2016-03-29 11:11:41 -07:00
Sean Chittenden	cb9833b134	Remove useless comment residual from decomposing functions	2016-03-29 10:53:00 -07:00
Sean Chittenden	1f049a3c38	EDYSLEXICMOMENT	2016-03-29 10:50:10 -07:00
Sean Chittenden	177f64134e	Refactor out recocileServerList anon function Add testing to reconcileServerList and test various server sizes. Test that a percentage of nodes fail their Ping (50% in testing atm)	2016-03-29 02:45:38 -07:00
Sean Chittenden	6609ee5d51	Teach fauxConnPool to fail a pct of the time 50% failure rate seems legit as a starting point w/ 100 servers.	2016-03-28 14:53:29 -07:00
Sean Chittenden	7d26f7bfa7	Call NotifyFailedServers to rotate the server list	2016-03-28 14:12:41 -07:00
Sean Chittenden	6a987062b9	Add log line re: server manager backing off and sleeping This is useful in situations where the RPC rotate duration is greater than 1µs. WTB exponential backoff of logging so we don't spam forever.	2016-03-28 14:04:04 -07:00
Sean Chittenden	689b79aef3	Remove old debugging lines of questionable future value	2016-03-28 14:02:53 -07:00
Sean Chittenden	0b0a07a280	Shuffle in place Don't create a copy and save the copy, not necessary any more.	2016-03-28 14:02:27 -07:00
Sean Chittenden	e230b3a3b7	Nuke unnecessary comment See above function comments for details	2016-03-28 13:57:36 -07:00
Sean Chittenden	34a29a2107	Move FIXME comment to the right call site	2016-03-28 13:49:55 -07:00
Sean Chittenden	b38d3d71c8	Rename the ConnPoolPinger interface to Pinger	2016-03-28 13:46:01 -07:00
Sean Chittenden	d6b4345375	Return error from PingConsulServer In order to report why a Ping failed, change the signature of PingConsulServers to include an error message.	2016-03-28 13:38:58 -07:00
Sean Chittenden	6c9fb06511	Change the definition of the ServerDetails struct key Use only the serf Name for now. Leaving the plumbing for now.	2016-03-28 12:53:19 -07:00
Sean Chittenden	2bcff6bac4	Correct the comment to match reality	2016-03-28 12:32:30 -07:00
Sean Chittenden	fc1edea1ef	Rename serverCfg to sc for consistency	2016-03-28 12:06:26 -07:00
Sean Chittenden	988b05700d	Add a quick length check Verify that AddServer behaved as expected	2016-03-28 11:38:12 -07:00
Sean Chittenden	7181e42ba8	Switch the order of ServerDetails.String() It's more natrual to have the network first. I think I flipped the order accidentally.	2016-03-28 11:37:25 -07:00
Sean Chittenden	dca8fd2643	Move rebalance log statement from INFO to DEBUG	2016-03-27 01:32:04 -07:00
Sean Chittenden	180edd8e7b	Chase the API bump re: refreshServerRebalanceTimer If it works in prod, why shouldn't it work in the tests?	2016-03-27 00:04:52 -07:00
Sean Chittenden	9b5dd7a785	Move initialization of the rebalanceTimer to New()	2016-03-27 00:03:48 -07:00
Sean Chittenden	86d1bad541	Add a test for ConnPool.PingConsulServer Spin up 5x servers, join and ping each server	2016-03-26 23:52:06 -07:00
Sean Chittenden	f903005080	Expose ServerManager.ResetRebalanceTimer Move the rebalance timer from ServerManager.Start's stack to struct ServerManager. This makes it possible to shuffle during tests without actually waiting >120s.	2016-03-26 23:41:01 -07:00
Sean Chittenden	2ba281bc5a	Logging improvements Comment out noisly loggers for the time being. Improve the final logging statement to be useful and hint what the next active server for the client is going to be.	2016-03-26 22:41:08 -07:00
Sean Chittenden	fab3981b1d	Standardize the log message based on the package This log statement used to belong in the consul package but has since moved to the server manager package.	2016-03-26 22:29:00 -07:00
Sean Chittenden	c6d9c42d9f	Reduce the error level from Fatal when unit testing	2016-03-26 22:07:09 -07:00
Sean Chittenden	4747cf3cab	Start server rebalance task after init'ing Serf Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup. When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.	2016-03-26 22:04:41 -07:00
Sean Chittenden	2ddf82d9d8	Catch up to a few renames	2016-03-26 19:32:11 -07:00
Sean Chittenden	640ced7c11	Use empty string for addr in ServerDetails.String()	2016-03-26 19:30:04 -07:00
Sean Chittenden	e0f29c17cd	Guard against a nil ServerDetails.Addr It's not clear how or why this would ever be nil, but some of the unit tests produce a nil addr. Be defensive.	2016-03-26 19:29:31 -07:00
Sean Chittenden	2d9982eb27	Proactively ping server before rotation Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.	2016-03-26 19:28:13 -07:00
Sean Chittenden	b3a8e2f115	Factor out the shuffle server	2016-03-26 19:19:04 -07:00
Sean Chittenden	766ddae165	Revise comments re: cycleServer Improve the comments to discuss what happens presently. Add a note to consider possibly calling to TestConsulServer proactively.	2016-03-26 18:53:13 -07:00
Sean Chittenden	ac1d42e9d8	Comment why the interface is needed: cyclic import	2016-03-26 18:38:35 -07:00
Sean Chittenden	a9b3dba05f	Add a struct key type for server_details	2016-03-26 17:58:12 -07:00
Sean Chittenden	496f05b561	Add additional checks	2016-03-25 14:40:46 -07:00
Sean Chittenden	c18158aac3	Delete the right tag "role" != "consul"	2016-03-25 14:31:48 -07:00
Sean Chittenden	b44554f882	Don't pass in sm, server manager is already in scope Go closures are implicitly capturing lambdas.	2016-03-25 14:10:09 -07:00
Sean Chittenden	2713899a5b	Trim residual complexity from server join notifications Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list. Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`	2016-03-25 14:06:35 -07:00
Sean Chittenden	b3298ce4c3	Only log in FindServers In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.	2016-03-25 13:58:50 -07:00
Sean Chittenden	f024272ab2	Initialize the rebalancce to clientRPCMinReuseDuration In an earlier version there was a channel to notify when a new server was added, however this has long since been removed. Just default to the sane value of 2min before the first rebalance calc takes place. Pointed out by: slackpad	2016-03-25 13:46:18 -07:00
Sean Chittenden	89311a5859	Use range vs for Returning a new array vs mutating an array in place so we can use range now.	2016-03-25 13:08:08 -07:00
Sean Chittenden	643997623e	Comment updates	2016-03-25 13:06:59 -07:00
Sean Chittenden	072f34cf02	Only rotate server list with more than one server Fantastic observation by slackpad. This was left over from when there was a boolean for health in the server struct (vs current strategy where we use server position in the list and rely on serf to cleanup the stale members). Pointed out by: slackpad	2016-03-25 12:54:36 -07:00
Sean Chittenden	aadd274a13	Relocate saveServerConfig next to getServerConfig Requested by: slackpad	2016-03-25 12:41:22 -07:00
Sean Chittenden	cf271e7f65	Clarify that ConsulClusterInfo is an interface over serf An interface was used to break a cyclic import dependency.	2016-03-25 12:38:40 -07:00
Sean Chittenden	973d924ab4	Reword comment after moving code into new packages	2016-03-25 12:34:46 -07:00
Sean Chittenden	78ec9f241d	Change initialReblaanaceTimeout to a time.Duration Pointed out by: @slackpad	2016-03-25 12:34:12 -07:00
Sean Chittenden	328728c88a	Negative check: test an invalid condition	2016-03-25 12:22:33 -07:00
Sean Chittenden	22e546ff32	Test to make sure bootstrap is missing	2016-03-25 12:20:12 -07:00
Sean Chittenden	5f035da4f1	Be more Go idiomatic w/ variable names: s/valid/ok/g Cargo culting is bad, m'kay? Pointy Hat: sean-	2016-03-25 12:14:24 -07:00
Sean Chittenden	e041c3905d	Fix stale comment Pointed out by: @slackpad	2016-03-25 12:00:40 -07:00
Sean Chittenden	45fc7c362e	Add a comment for Client serverMgr	2016-03-25 11:59:27 -07:00
Sean Chittenden	5873b7e28e	Correct a bogus goimport rewrite for tests	2016-03-23 22:35:49 -07:00
Sean Chittenden	dcc64d91c6	Test ServerManager.refreshServerRebalanceTimer Change the signature so it returns a value so that this can be tested externally with mock data. See the sample table in TestServerManagerInternal_refreshServerRebalanceTimer() for the rate at which it will back off. This function is mostly used to not cripple large clusters in the event of a partition.	2016-03-23 22:10:50 -07:00
Sean Chittenden	8e3b3d766d	Add a handful more unit tests to the public interface	2016-03-23 22:10:50 -07:00
Sean Chittenden	d5f72e8c07	Rename GetNumServers to NumServers() Matches the style of the rest of the repo	2016-03-23 22:10:50 -07:00
Sean Chittenden	9de9cf90f1	Rename NewServerManger to just New Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.	2016-03-23 22:10:50 -07:00
Sean Chittenden	7faea986a0	Rename FindHealthyServer() to FindServer() There is no guarantee the server coming back is healthy. It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.	2016-03-23 22:10:50 -07:00
Sean Chittenden	18885e3214	cycleServer is a pure function, save the result	2016-03-23 22:10:50 -07:00
Sean Chittenden	4ec9ed4de2	Missed unit test cruft	2016-03-23 22:10:50 -07:00
Sean Chittenden	b906e40811	Update comments to reflect reality	2016-03-23 22:10:50 -07:00
Sean Chittenden	1a09a5b2cf	Remove additional cruft from ServerManager's channels No longer needed code.	2016-03-23 22:10:50 -07:00
Sean Chittenden	c980d492c6	Emulate a TryLock using atomic.CompareAndSwap Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.	2016-03-23 22:10:50 -07:00
Sean Chittenden	102dcafe76	Make use of interfaces Use an interface instead of serf.Serf as arg to NewServerManager. Bonus points for improved testability. Pointed out by: @slackpad	2016-03-23 22:10:50 -07:00
Sean Chittenden	231768faea	Simplify error handling Rely on Serf for liveliness. In the event of a failure, simply cycle the server to the end of the list. If the server is unhealthy, Serf will reap the dead server. Additional simplifications: ) Only rebalance servers based on timers, not when a new server is readded to the cluster. ) Back out the failure count in server_details.ServerDetails	2016-03-23 22:10:50 -07:00
Sean Chittenden	0c519aa90d	Unbreak client tests by reverting to original test Debugging code crept into the actual test and hung out for much longer than it should have.	2016-03-23 22:10:50 -07:00
Sean Chittenden	26e51376d9	Introduce asynchronous management of consul server lists Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance. Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.	2016-03-23 22:10:50 -07:00
Sean Chittenden	6ed37d1d8d	Comment nits	2016-03-23 22:10:50 -07:00
Sean Chittenden	c8ab3ae4cb	Use saveServerConfig vs atomic.Value.Store(config)	2016-03-23 22:10:50 -07:00
Sean Chittenden	12377e80e6	Commit a handful of refactoring && copy/paste-o fixes	2016-03-23 22:10:50 -07:00
Sean Chittenden	c1c17f158b	Mutate copies of serverCfg.servers, not original Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.	2016-03-23 22:10:50 -07:00
Sean Chittenden	753766cc5d	rebalanceTimer may be nil during initialization When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.	2016-03-23 22:10:50 -07:00
Sean Chittenden	d0e2792d5c	Properly retain a pointer to the rebalanceTimer	2016-03-23 22:10:50 -07:00
Sean Chittenden	62785de865	Cosmetic and various other wordsmithing cleanups	2016-03-23 22:10:50 -07:00
Sean Chittenden	31de4290cf	Document the various functions and their locking	2016-03-23 22:10:50 -07:00
Sean Chittenden	ffcd939feb	Use config convenience method to get config 'cause ELETTHECOMPILERSDOTHEWORK. I don't need that cluttering up the subconscious with more complexity.	2016-03-23 22:10:50 -07:00
Sean Chittenden	ed7fee7a3c	Move consul.serverConfig out of the consul package Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here	2016-03-23 22:10:50 -07:00
Sean Chittenden	ab80393198	Rename serverConfigMtx to serverConfigLock Pointed out by: @slackpad	2016-03-23 22:10:50 -07:00
Sean Chittenden	1866d94285	Refactor out the management of Consul servers Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go. This commit brings in a background task that proactively manages the server list and: ) reshuffles the list ) manages the timer out of the RPC() path *) uses atomics to detect a server has failed This is a WIP, more work in testing needs to be completed.	2016-03-23 22:10:50 -07:00
Sean Chittenden	73497f7915	Move consul.serverConfig out of the consul package Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here	2016-03-23 22:10:50 -07:00
Sean Chittenden	2a52d3eb80	Rename serverConfigMtx to serverConfigLock Pointed out by: @slackpad	2016-03-23 22:10:32 -07:00
Sean Chittenden	49425c5371	Refactor out the management of Consul servers Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go. This commit brings in a background task that proactively manages the server list and: ) reshuffles the list ) manages the timer out of the RPC() path *) uses atomics to detect a server has failed This is a WIP, more work in testing needs to be completed.	2016-03-23 22:09:46 -07:00
Sean Chittenden	ebdccf0f35	Move consul.serverConfig out of the consul package Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here	2016-03-23 22:05:29 -07:00
Sean Chittenden	b7213d9daa	Rename serverConfigMtx to serverConfigLock Pointed out by: @slackpad	2016-03-23 22:05:05 -07:00
Sean Chittenden	e29b8de0a6	Refactor out the management of Consul servers Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go. This commit brings in a background task that proactively manages the server list and: ) reshuffles the list ) manages the timer out of the RPC() path *) uses atomics to detect a server has failed This is a WIP, more work in testing needs to be completed.	2016-03-23 22:03:20 -07:00
Sean Chittenden	3730eaf6df	Commit miss re: consuls variable rename	2016-03-23 16:24:29 -07:00
Sean Chittenden	b33648ca5c	Move consul.serverConfig out of the consul package Relocated to its own package, server_manager. This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary. More work is needed to be done here	2016-03-23 16:16:22 -07:00
Sean Chittenden	f3a69c939d	Refactor consul.serverParts into server_details.ServerDetails This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.	2016-03-23 16:15:47 -07:00
Sean Chittenden	b3192ca410	Rename serverConfigMtx to serverConfigLock Pointed out by: @slackpad	2016-03-23 16:15:47 -07:00
Sean Chittenden	82458fa9e8	Handle the case where there are no healthy servers Pointed out by: @slackpad	2016-03-23 16:15:47 -07:00
Sean Chittenden	09d4c6439c	Refactor out the management of Consul servers Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go. This commit brings in a background task that proactively manages the server list and: ) reshuffles the list ) manages the timer out of the RPC() path *) uses atomics to detect a server has failed This is a WIP, more work in testing needs to be completed.	2016-03-23 16:15:47 -07:00
Sean Chittenden	6bda2c007c	Add a flag to denote that a server is disabled A server is not normally disabled, but in the event of an RPC error, we want to mark a server as down to allow for fast failover to a different server. This value must be an int in order to support atomic operations. Additionally, this is the preliminary work required to bring up a server in a disabled state. RPC health checks in the future could mark the server as alive, thereby creating an organic "slow start" feature for Consul.	2016-03-23 16:14:59 -07:00
Sean Chittenden	7de85906c1	Rename `lastServer` to `preferredServer` Expanding the domain of lastServer beyond RPC() changes the meaning of this variable. Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.	2016-03-23 16:14:59 -07:00
Sean Chittenden	2949980a64	Warn if serf events have queued up past 80% of the limit It is theoretically possible that the number of queued serf events can back up. If this happens, emit a warning message if there are more than 200 events in queue. Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time". The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).	2016-03-23 16:14:11 -07:00
Sean Chittenden	2a0c12460d	Commit miss re: consuls variable rename	2016-03-23 16:13:49 -07:00
Sean Chittenden	3ac1bcc799	Remove lastRPCTime This mechanism isn't going to provide much value in the future. Preemptively reduce the complexity of future work.	2016-03-23 16:13:49 -07:00
Sean Chittenden	72b7856045	Rename c.consuls to c.consulServers Prep for breaking out maintenance of consuls into a new goroutine.	2016-03-23 16:10:27 -07:00
Sean Chittenden	d1ef4ec7e2	Use `rand.Int31n()` to get power of two optimization In cases where i+1 is a power of two, skip one modulo operation.	2016-03-23 16:00:39 -07:00
James Phillips	92e947dcc3	Gets rid of flaky sort check. If we get a coordinate then this test will fail, so we only check the first item in the list, which is deterministic.	2016-03-21 17:30:05 -07:00
James Phillips	265a8d4053	Increases timeouts for coordinate tests. We take the interval and add the random stagger to it, so 2X is cutting it too close and the unit tests are often flaky.	2016-03-21 16:44:35 -07:00
James Phillips	13b8ce0adc	Merge pull request #1851 from hashicorp/f-ipv6-bind Allow [::] as a bind address (binds to first public IPv6 address)	2016-03-19 16:16:19 -07:00
James Phillips	18e12aa886	Adds more specific checks for ipv6 addresses.	2016-03-19 16:14:45 -07:00
James Phillips	e4ca18089f	Removes leader from members and changes name since it's an address.	2016-03-18 17:07:11 -07:00
Sergey Romanov	11b73bb1a5	#735 add information about leader to consul members	2016-03-18 17:05:40 -07:00
Wim	508bc796a8	Allow [::] as a bind address (binds to first public IPv6 address)	2016-03-18 23:59:44 +01:00
Calvin Leung Huang	7215d9bdef	Obfuscate token for lookupACL error	2016-03-15 17:16:25 -04:00
James Phillips	a9d640c024	Hardens the match interoplator against negative arguments.	2016-03-07 13:32:32 -08:00
James Phillips	63c826c2c0	Adds a comment about the embedded struct.	2016-03-07 10:45:39 -08:00
James Phillips	275c84a0cc	Renames "debug" endpoint and structures to "explain".	2016-03-07 10:45:39 -08:00
James Phillips	8493640b09	Adds a prepared query debug endpoint.	2016-03-07 10:45:39 -08:00
James Phillips	918b1ace47	Applies prefix ACL to a catch-all template as a special case.	2016-03-07 10:45:39 -08:00
James Phillips	3c512fc089	Adds a test for the custom prepared query template indexer.	2016-03-07 10:45:39 -08:00
James Phillips	39d3094d50	Adds core query template tests to the state store.	2016-03-07 10:45:39 -08:00
James Phillips	06087633f0	Adds in basic query template lookups and vendors newly-updated memdb as well as improved iradix tree.	2016-03-07 10:45:39 -08:00
James Phillips	142e69befe	Adds tests for the low-level template functions.	2016-03-07 10:45:39 -08:00
James Phillips	b578fbbfc4	Adds tests for the string visitor.	2016-03-07 10:45:39 -08:00
James Phillips	2a9a5f823e	Factors rendering down into the resolve function.	2016-03-07 10:45:39 -08:00
James Phillips	8e25451232	Splits walk functions out from the rest of the template code.	2016-03-07 10:45:39 -08:00
James Phillips	fa60d575bf	Integrates templates into state store and endpoint (sans tests).	2016-03-07 10:45:39 -08:00
James Phillips	62405110dc	Wraps the prepared query to also store the compiled template.	2016-03-07 10:45:39 -08:00
James Phillips	98281be7df	Adds basic query template compiler and renderer.	2016-03-07 10:45:39 -08:00
Mike Cowgill	25613895e3	one line schema change to not allow missing for sessions Table node index, Fixes #1774	2016-03-02 21:19:53 -08:00
James Phillips	f0150ff5ce	Adds missing token redact in the GET path.	2016-02-26 15:59:00 -08:00
James Phillips	48f2089d7f	Merge pull request #1757 from hashicorp/f-revert-1667 Reverts server connection rebalancing changes from #1667	2016-02-24 18:07:13 -08:00
James Phillips	c75256ac8b	Adds a check for users re-submitting the redacted token.	2016-02-24 17:35:26 -08:00
James Phillips	2f7eac8b86	Renames "prepared_query" ACL policy to "query".	2016-02-24 17:02:06 -08:00
James Phillips	3b91618d7d	Changes to more idiomatic "ok" pattern for prefix getter.	2016-02-24 16:26:43 -08:00
James Phillips	1c7ee582f9	Renames a unit test.	2016-02-24 16:17:20 -08:00
James Phillips	d660311fbb	Revert "Merge pull request #1667 from hashicorp/b-redistribute-clients" This reverts commit 8f30dea4209491ebbe4ef9ab94dd8052d17bdbe9, reversing changes made to eb27a02956e7e052c0bec6f96a0c0f7f6675f6a6.	2016-02-24 15:38:03 -08:00
James Phillips	54f0b7bbb6	Completes switch of prepared_query ACLs to govern query names.	2016-02-24 01:26:16 -08:00

... 3 4 5 6 7 ...

1316 commits