Commit graph

1119 commits

Author SHA1 Message Date
Kyle Havlovitz 26a26a489d Add retry with backoff to initial bootstrap checks (#2561) 2016-12-01 17:05:02 -05:00
Seth Vargo 1c55429a38
Add an API method for determining the best status
Given a list of HealthChecks, this determines the "best" status for the
collective group. This is useful for nodes and services, which may have
multiple checks associated with them.
2016-11-29 18:41:46 -05:00
Kyle Havlovitz 2d37a07476 Add keyring http endpoints 2016-11-22 20:10:43 -05:00
Kyle Havlovitz 9adc3854d1 Retry with backoff on session invalidation failure (#2475) 2016-11-04 21:53:22 -07:00
James Phillips be4056789f Moves the snapshot package up one level. (#2472) 2016-11-03 21:36:25 -07:00
Kyle Havlovitz 169cae2203 Disallow -bootstrap-expect flag in dev mode (#2464) 2016-11-03 01:54:43 -04:00
Kyle Havlovitz 440611f9f7 Add snapshot inspect subcommand (#2451) 2016-10-31 19:37:27 -04:00
Kyle Havlovitz c6f461aa25 Enable snapshots in dev mode (#2453) 2016-10-31 14:39:47 -04:00
James Phillips bc29610124 Adds support for snapshots and restores. (#2396)
* Updates Raft library to get new snapshot/restore API.

* Basic backup and restore working, but need some cleanup.

* Breaks out a snapshot module and adds a SHA256 integrity check.

* Adds snapshot ACL and fills in some missing comments.

* Require a consistent read for snapshots.

* Make sure snapshot works if ACLs aren't enabled.

* Adds a bit of package documentation.

* Returns an empty response from restore to avoid EOF errors.

* Adds API client support for snapshots.

* Makes internal file names match on-disk file snapshots.

* Adds DC and token coverage for snapshot API test.

* Adds missing documentation.

* Adds a unit test for the snapshot client endpoint.

* Moves the connection pool out of the client for easier testing.

* Fixes an incidental issue in the prepared query unit test.

I realized I had two servers in bootstrap mode so this wasn't a good setup.

* Adds a half close to the TCP stream and fixes panic on error.

* Adds client and endpoint tests for snapshots.

* Moves the pool back into the snapshot RPC client.

* Adds a TLS test and fixes half-closes for TLS connections.

* Tweaks some comments.

* Adds a low-level snapshot test.

This is independent of Consul so we can pull this out into a library
later if we want to.

* Cleans up snapshot and archive and completes archive tests.

* Sends a clear error for snapshot operations in dev mode.

Snapshots require the Raft snapshots to be readable, which isn't supported
in dev mode. Send a clear error instead of a deep-down Raft one.

* Adds docs for the snapshot endpoint.

* Adds a stale mode and index feedback for snapshot saves.

This gives folks a way to extract data even if the cluster has no
leader.

* Changes the internal format of a snapshot from zip to tgz.

* Pulls in Raft fix to cancel inflight before a restore.

* Pulls in new Raft restore interface.

* Adds metadata to snapshot saves and a verify function.

* Adds basic save and restore snapshot CLI commands.

* Gets rid of tarball extensions and adds restore message.

* Fixes an incidental bad link in the KV docs.

* Adds documentation for the snapshot CLI commands.

* Scuttle any request body when a snapshot is saved.

* Fixes archive unit test error message check.

* Allows for nil output writers in snapshot RPC handlers.

* Renames hash list Decode to DecodeAndVerify.

* Closes the client connection for snapshot ops.

* Lowers timeout for restore ops.

* Updates Raft vendor to get new Restore signature and integrates with Consul.

* Bounces the leader's internal state when we do a restore.
2016-10-25 19:20:24 -07:00
Kyle Havlovitz f0aa65754b Wait for agent joins to finish in TestClient_RPC 2016-10-25 17:48:11 -07:00
Kyle Havlovitz 2379b822ad Add wait logic to TestClient_RPC_Pool 2016-10-25 17:48:11 -07:00
James Phillips b423dee303
Fixes port numbers in peers.info. 2016-10-05 18:09:15 -07:00
James Phillips 8e76af4311 Merge pull request #2319 from hashicorp/f-bootstrap-abort
Adds check that aborts bootstrap mode if there's an existing cluster.
2016-09-01 09:49:03 -07:00
James Phillips 94c0e961eb
Fixes error message in test. 2016-09-01 09:48:08 -07:00
James Phillips ce93b82e1e
Makes port selection atomic in unit tests. 2016-09-01 01:01:28 -07:00
James Phillips d04a706a7c
Tweaks comment to be more correct. 2016-08-31 23:54:53 -07:00
James Phillips 4dd9b4b08a
Adds check that aborts bootstrap mode if there's an existing cluster. 2016-08-31 21:25:56 -07:00
James Phillips 750e1751ac
Copies the member data instead of referencing by pointer. 2016-08-30 16:54:21 -07:00
James Phillips 6be1e07fec
Makes the Raft configuration API easier to consume. 2016-08-30 11:30:56 -07:00
James Phillips 5df4b6bef2
Adds a log warning when operator peer changes occur. 2016-08-30 10:23:32 -07:00
James Phillips 1b7a16b7d3
Adds new consul operator endpoint, CLI, and ACL and some basic Raft commands. 2016-08-30 00:02:50 -07:00
James Phillips 29e52307cb
Makes empty checkServiceNode return a nil.
The change in #2308 had an inadvertent interface change, so we fix that with
a special case in this fix.
2016-08-29 19:12:07 -07:00
James Phillips 327fe725d9
Preallocates result struct, which was a profiling hot spot. 2016-08-26 16:34:28 -07:00
James Phillips c5b6ac3655
Removes leader_lease_timeout from stats. 2016-08-25 15:39:19 -07:00
James Phillips 2f4c237cff
Adds a max raft multiplier and tweaks documentation. 2016-08-25 15:36:05 -07:00
James Phillips 5df36fbd82
Stops scaling the commit timeout. 2016-08-25 15:05:40 -07:00
James Phillips f65ef936cb
Increases RPC hold timeout for new default timing.
Rather than scale this we just bump it up a bit. It'll be on the edge in
the lower-performance default mode, and will have plenty of margin in the
high-performance mode. This seems like a reasonable compromise to keep the
logic here simple vs. scaling, and seems inline with the expectations of
the different modes of operation.
2016-08-24 23:35:28 -07:00
James Phillips b339b0d2fc
Adds performance tuning capability for Raft, detuned defaults, and supplemental docs. 2016-08-24 21:58:37 -07:00
James Phillips 0bdbdf1ba8 Merge pull request #2226 from abhinavdahiya/rm-health-unknown
Fixes #1775; Removes 'unknown' state
2016-08-17 17:51:04 -07:00
James Phillips 1f539d9914
Makes the filled-in parts of ServiceNode more explicit. 2016-08-12 18:25:36 -07:00
David van Geest 360e196c93
Translate Address to tagged WAN address in HTTP API when appropriate. 2016-08-12 18:25:36 -07:00
James Phillips d11a7a197c
Removes upper end of muxado handler. 2016-08-09 18:16:41 -07:00
James Phillips 97a25e8564
Closes the conn on bad protocol version. 2016-08-09 18:13:53 -07:00
James Phillips 359587f70e
Removes support for muxado and protocol version 1. 2016-08-09 18:10:04 -07:00
James Phillips 99ab3390c2
Updates hashicorp/hcl and hashicorp/hil.
This required a small mod to core Consul code to cope with an interface
change.
2016-08-09 17:24:13 -07:00
James Phillips ff6d42389c Merge pull request #2222 from hashicorp/f-raft-v2
Integrates Consul with "stage one" of HashiCorp Raft library v2.
2016-08-09 16:04:48 -07:00
James Phillips cce38f9a4b
Moves the peers.info content down into a constant. 2016-08-09 11:56:39 -07:00
James Phillips 7aaa4bc913
Adds peers back into bootstrap log, makes initial case consistent. 2016-08-09 11:52:41 -07:00
James Phillips 7f58b05dfe
Tweaks select style. 2016-08-09 11:33:42 -07:00
James Phillips 544169999c
Adds I/O-sensitive metrics to ACL replication operations. 2016-08-09 11:32:12 -07:00
James Phillips 820509760d
Switches to a smooth rate limit vs. a bursty one. 2016-08-09 11:29:12 -07:00
James Phillips 129e327bc9
Clarifies replication index shown in the log message. 2016-08-09 11:10:32 -07:00
James Phillips 4203612bd7
Returns from the shutdown wait right away. 2016-08-09 11:09:48 -07:00
James Phillips e03fbef6b3
Moves ACL ID sorting interface onto the iterator. 2016-08-09 11:08:26 -07:00
James Phillips 0fa059ec49
Switches all ACL caches to 2Q. 2016-08-09 11:00:22 -07:00
James Phillips 1e75fa0362
Moves ACL ID generation down into the endpoint.
We don't want ACL replication to have this behavior so it was a
little dangerous to have in the shared helper function.
2016-08-09 00:11:00 -07:00
James Phillips 06a510a808
Removes unsafe "recover to empty" code.
This isn't safe because it would implicitly commit all outstanding log
entries. The new Raft library already has logic to not start a vote if
the current node isn't in the configuration, so this shoudn't be needed.
2016-08-08 19:19:19 -07:00
James Phillips dd3169b395
Tweaks recovery based on interface changes. 2016-08-08 19:19:18 -07:00
James Phillips 19004e7095
Moves to a safer design where we don't ingest the initial peers.json file. 2016-08-08 19:19:18 -07:00
James Phillips 44c468995f
Touches up Raft integration after latest changes. 2016-08-08 19:19:18 -07:00