open-consul

Author	SHA1	Message	Date
Josh Soref	1dd8c378b9	Spelling (#3958 ) * spelling: another * spelling: autopilot * spelling: beginning * spelling: circonus * spelling: default * spelling: definition * spelling: distance * spelling: encountered * spelling: enterprise * spelling: expands * spelling: exits * spelling: formatting * spelling: health * spelling: hierarchy * spelling: imposed * spelling: independence * spelling: inspect * spelling: last * spelling: latest * spelling: client * spelling: message * spelling: minimum * spelling: notify * spelling: nonexistent * spelling: operator * spelling: payload * spelling: preceded * spelling: prepared * spelling: programmatically * spelling: required * spelling: reconcile * spelling: responses * spelling: request * spelling: response * spelling: results * spelling: retrieve * spelling: service * spelling: significantly * spelling: specifies * spelling: supported * spelling: synchronization * spelling: synchronous * spelling: themselves * spelling: unexpected * spelling: validations * spelling: value	2018-03-19 16:56:00 +00:00
Pierre Souchay	3eb287f57d	Fixed typo in comments	2018-03-19 17:12:08 +01:00
Pierre Souchay	eb2a4eaea3	Refactoring to have clearer code without weird bool	2018-03-19 16:12:54 +01:00
Pierre Souchay	a5f6ac0df4	[BUGFIX] When a node level check is removed, ensure all services of node are notified Bugfix for https://github.com/hashicorp/consul/pull/3899 When a node level check is removed (example: maintenance), some watchers on services might have to recompute their state. If those nodes are performing blocking queries, they have to be notified. While their state was updated when node-level state did change or was added this was not the case when the check was removed. This fixes it.	2018-03-19 14:14:03 +01:00
Devin Canterberry	881d20c606	🐛 Formatting changes only; add missing trailing commas	2018-03-15 10:19:46 -07:00
Mitchell Hashimoto	fbac58280e	agent/consul/fsm: begin using testify/assert	2018-03-06 09:48:15 -08:00
Paul Banks	628dcc9793	Merge pull request #3899 from pierresouchay/fix_blocking_queries_index Services Indexes modified per service instead of using a global Index	2018-03-02 16:24:43 +00:00
Pierre Souchay	85b73f8163	Simplified error handling for maxIndexForService * added unit tests to ensure service index is properly garbage collected * added Upgrade from Version 1.0.6 to higher section in documentation	2018-03-01 14:09:36 +01:00
Preetha Appan	77d35f1829	Remove extra newline	2018-02-21 13:21:47 -06:00
Preetha Appan	573500dc51	Unit test that calls revokeLeadership twice to make sure its idempotent	2018-02-21 12:48:53 -06:00
Preetha Appan	bd270b02ba	Make sure revokeLeadership is called if establishLeadership errors	2018-02-21 12:33:22 -06:00
Alex Dadgar	535842004c	Test autopilots start/stop idempotency	2018-02-21 10:19:30 -08:00
Alex Dadgar	4d99696f02	Improve autopilot shutdown to be idempotent	2018-02-20 15:51:59 -08:00
Pierre Souchay	e6d85cb36a	Fixed comments for function maxIndexForService	2018-02-20 23:57:28 +01:00
Pierre Souchay	b26ea3c230	[Revert] Only update services if tags are different This patch did give some better results, but break watches on the services of a node. It is possible to apply the same optimization for nodes than to services (one index per instance), but it would complicate further the patch. Let's do it in another PR.	2018-02-20 23:34:42 +01:00
Pierre Souchay	903e866835	Only update services if tags are different	2018-02-20 23:08:04 +01:00
Pierre Souchay	56d5c0bf22	Enable Raft index optimization per service name on health endpoint Had to fix unit test in order to check properly indexes.	2018-02-20 01:35:50 +01:00
Pierre Souchay	ec1b278595	Get only first service to test whether we have to cleanup index of a service	2018-02-19 22:44:49 +01:00
Pierre Souchay	523feb0be4	Fixed comment about raftIndex + use test.Helper()	2018-02-19 19:30:25 +01:00
Pierre Souchay	4c188c1d08	Services Indexes modified per service instead of using a global Index This patch improves the watches for services on large cluster: each service has now its own index, such watches on a specific service are not modified by changes in the global catalog. It should improve a lot the performance of tools such as consul-template or libraries performing watches on very large clusters with many services/watches.	2018-02-19 18:29:22 +01:00
Veselkov Konstantin	05666113a4	remove golint warnings	2018-01-28 22:40:13 +04:00
Kyle Havlovitz	0e76d62846	Reset clusterHealth when autopilot starts	2018-01-23 12:52:28 -08:00
Kyle Havlovitz	6d1dbe6cc4	Move autopilot health loop into leader operations	2018-01-23 11:17:41 -08:00
James Phillips	62e97a6602	Fixes a `go fmt` cleanup.	2017-12-20 13:43:38 -08:00
Kyle Havlovitz	74b0c58831	Fix vet error	2017-12-18 18:04:42 -08:00
Kyle Havlovitz	dfc165a47b	Move autopilot initializing to oss file	2017-12-18 18:02:44 -08:00
Kyle Havlovitz	044c38aa7b	Move autopilot setup to a separate file	2017-12-18 16:55:51 -08:00
Kyle Havlovitz	9e1ba6fb4e	Make some final tweaks to autopilot package	2017-12-18 12:26:47 -08:00
Kyle Havlovitz	6b58df5898	Merge pull request #3737 from hashicorp/autopilot-refactor Move autopilot to a standalone package	2017-12-15 14:09:40 -08:00
James Phillips	262cbbd9ca	Merge pull request #3728 from weiwei04/fix_globalRPC_goroutine_leak fix globalRPC goroutine leak	2017-12-14 17:54:19 -08:00
Kyle Havlovitz	798aca92c5	Expose IsPotentialVoter for advanced autopilot logic	2017-12-13 17:53:51 -08:00
Kyle Havlovitz	a4ac148077	Merge branch 'master' into autopilot-refactor	2017-12-13 11:54:32 -08:00
Kyle Havlovitz	6c985132de	A few last autopilot adjustments	2017-12-13 11:19:17 -08:00
Kyle Havlovitz	77d92bf15c	More autopilot reorganizing	2017-12-13 10:57:37 -08:00
James Phillips	984de6e2e0	Adds TODOs referencing #3744 .	2017-12-13 10:52:06 -08:00
Kyle Havlovitz	f347c8a531	More refactoring to make autopilot consul-agnostic	2017-12-12 17:46:28 -08:00
Kyle Havlovitz	8546a1d3c6	Move autopilot to a standalone package	2017-12-11 16:45:33 -08:00
James Phillips	32b64575d1	Moves Serf helper into lib to fix import cycle in consul-enterprise.	2017-12-07 16:57:58 -08:00
James Phillips	c16cce80bb	Turns of intent queue warnings and enables dynamic queue sizing.	2017-12-07 16:27:06 -08:00
Wei Wei	04531ff0fb	fix globalRPC goroutine leak Signed-off-by: Wei Wei <weiwei.inf@gmail.com>	2017-12-05 11:53:30 +08:00
James Phillips	c4bc89a187	Creates a registration mechanism for snapshot and restore.	2017-11-29 18:36:53 -08:00
James Phillips	8571555703	Begins split out of snapshots from the main FSM class.	2017-11-29 18:36:53 -08:00
James Phillips	4eaee8e0ba	Creates a registration mechanism for FSM commands.	2017-11-29 18:36:53 -08:00
James Phillips	3e7ea1931c	Moves the FSM into its own package. This will help make it clearer what happens when we add some registration plumbing for the different operations and snapshots.	2017-11-29 18:36:53 -08:00
James Phillips	7f3783f4be	Resolves an FSM snapshot TODO. This adds checks for sink write calls before we continue the refactor, which will resolve the other TODO comment we deleted as part of this change.	2017-11-29 18:36:53 -08:00
James Phillips	5a24d37ac0	Creates a registration mechanism for schemas. This also splits out the registration into the table-specific source files.	2017-11-29 18:36:52 -08:00
James Phillips	36bb30e67a	Creates a registration mechanism for RPC endpoints.	2017-11-29 18:36:52 -08:00
James Phillips	ba56669ea8	Renames stubs to be more consistent.	2017-11-29 18:36:52 -08:00
James Phillips	56552095c9	Sheds monotonic time info so tombstone GC bins work properly.	2017-11-29 10:34:24 -08:00
James Phillips	8656b7a3e9	Gives back the lock before writing to the expire channel. The lock isn't needed after we clean up the expire bin, and as seen in #3700 we can get into a deadlock waiting to place the expire index into the channel while holding this lock. Fixes #3700	2017-11-19 16:24:16 -08:00
James Phillips	8210523b1b	Moves the LAN event handler after the router is created. Fixes #3680	2017-11-10 12:26:48 -08:00
James Phillips	bfbbfb62ca	Revert "Adds a small sleep to make sure we are in the next GC bucket."	2017-11-08 22:18:37 -08:00
James Phillips	d6328a5bf8	Adds a sleep to make sure we are in the next GC bucket, ups time. Fixes #3670	2017-11-08 22:02:40 -08:00
James Phillips	91824375be	Skips the tombstone GC test in Travis for now. Related to #3670	2017-11-08 20:14:20 -08:00
James Phillips	b94ba8aeb4	Removes bogus getPort() in favor of freeport.	2017-11-08 19:55:50 -08:00
James Phillips	444a345a3a	Tightens timing up and reorders GC test to be less flaky.	2017-11-08 15:09:29 -08:00
James Phillips	e00624425b	Doubles the GC timing.	2017-11-08 15:01:11 -08:00
James Phillips	8eb91777d9	Opens up test timing a little more.	2017-11-08 14:01:19 -08:00
James Phillips	d45c2a01f1	Shifts off a gran boundary to help make test less flaky.	2017-11-08 13:57:17 -08:00
James Phillips	757e353334	Opens up the tombstone GC test timing.	2017-11-08 13:43:39 -08:00
Kyle Havlovitz	068ca11eb8	Move check definition to a sub-struct	2017-11-01 14:54:46 -07:00
Kyle Havlovitz	bc3ba5f873	Merge branch 'master' into esm-changes	2017-11-01 11:37:48 -07:00
Kyle Havlovitz	83524f44c4	Merge pull request #3622 from hashicorp/coordinate-node-endpoint agent: add /v1/coordianate/node/:node endpoint	2017-11-01 11:35:50 -07:00
Kyle Havlovitz	9909b661ac	Fill out the tests around coordinate/node functionality	2017-10-31 15:36:44 -07:00
Kyle Havlovitz	fd4d9f1c16	Factor out registerNodes function	2017-10-31 13:34:49 -07:00
James Phillips	c6e0366c02	Relaxes Autopilot promotion logic. (#3623 ) * Relaxes Autopilot promotion logic. When we defaulted the Raft protocol version to 3 in #3477 we made the numPeers() routine more strict to only count voters (this is more conservative and more correct). This had the side effect of breaking rolling updates because it's at odds with the Autopilot non-voter promotion logic. That logic used to wait to only promote to maintain an odd quorum of servers. During a rolling update (add one new server, wait, and then kill an old server) the dead server cleanup would still count the old server as a peer, which is conservative and the right thing to do, and no longer count the non-voter. This would wait to promote, so you could get into a stalemate. It is safer to promote early than remove early, so by promoting as soon as possible we have chosen that as the solution here. Fixes #3611 * Gets rid of unnecessary extra not-a-voter check.	2017-10-31 15:16:56 -05:00
Kyle Havlovitz	496dd7ab5b	Merge branch 'coordinate-node-endpoint' of github.com:hashicorp/consul into esm-changes	2017-10-26 19:20:24 -07:00
Kyle Havlovitz	f80e70271d	Added Coordinate.Node rpc endpoint and client api method	2017-10-26 19:16:40 -07:00
Kyle Havlovitz	84a07ea113	Expose SkipNodeUpdate field and some health check info in the http api	2017-10-25 19:37:30 +02:00
Frank Schroeder	74859ff3c0	test: replace porter tool with freeport lib This patch removes the porter tool which hands out free ports from a given range with a library which does the same thing. The challenge for acquiring free ports in concurrent go test runs is that go packages are tested concurrently and run in separate processes. There has to be some inter-process synchronization in preventing processes allocating the same ports. freeport allocates blocks of ports from a range expected to be not in heavy use and implements a system-wide mutex by binding to the first port of that block for the lifetime of the application. Ports are then provided sequentially from that block and are tested on localhost before being returned as available.	2017-10-21 22:01:09 +02:00
Ryan Slade	6f05ea91a3	Replace time.Now().Sub(x) with time.Since(x)	2017-10-17 20:38:24 +02:00
James Phillips	e9670761f9	Cleans up some drift between the OSS and Enterprise trees.	2017-10-11 15:53:07 -07:00
James Phillips	d1ad538345	Makes RPC handling more robust when rolling servers. (#3561 ) * Adds client-side retry for no leader errors. This paves over the case where the client was connected to the leader when it loses leadership. * Adds a configurable server RPC drain time and a fail-fast path for RPCs. When a server leaves it gets removed from the Raft configuration, so it will never know who the new leader server ends up being. Without this we'd be doomed to wait out the RPC hold timeout and then fail. This makes things fail a little quicker while a sever is draining, and since we added a client retry AND since the server doing this has already shut down and left the Serf LAN, clients should retry against some other server. * Makes the RPC hold timeout configurable. * Reorders struct members. * Sets the RPC hold timeout default for test servers. * Bumps the leave drain time up to 5 seconds. * Robustifies retries with a simpler client-side RPC hold. * Reverts untended delete.	2017-10-10 15:19:50 -07:00
James Phillips	a1db119d02	Fixes handling of stop channel and failed barrier attempts. (#3546 ) * Fixes handling of stop channel and failed barrier attempts. There were two issues here. First, we needed to not exit when there was a timeout trying to write the barrier, because Raft might not step down, so we'd be left as the leader but having run all the step down actions. Second, we didn't close over the stopCh correctly, so it was possible to nil that out and have the leaderLoop never exit. We close over it properly AND sequence the nil-ing of it AFTER the leaderLoop exits for good measure, so the code is more robust. Fixes #3545 * Cleans up based on code review feedback. * Tweaks comments. * Renames variables and removes comments.	2017-10-06 07:54:49 -07:00
Kyle Havlovitz	0063516e5e	Update metric names and add a legacy config flag	2017-10-04 16:43:27 -07:00
Preetha Appan	f38d20eb40	Remove extra newline	2017-10-03 15:19:31 -05:00
Preetha Appan	3c81e2db7c	Only allow 'list' policies within 'key' policy definitions. Consolidated two similar tests into one and fixed alignment.	2017-10-03 15:15:56 -05:00
Preetha Appan	d5acfc3982	Introduces new 'list' permission that applies to KV store recursive reads, and enforced only when opted in.	2017-10-02 17:10:21 -05:00
James Phillips	330ce87851	Gets rid of flaky clause in stats fetcher unit test. Given how the rutine is coded we can still get data so this wasn't a reliable thing to check.	2017-09-26 20:53:06 -07:00
preetapan	783e24be64	Issue 3452 (#3500 ) * Make sure that id and address are set in member created during reaping of catalog nodes that have been removed from serf * Get address from node table in the state store rather than from service address * Fix incorrect lookup by checkname instead of node name * Make sure that serverlookup is called with the right address format, added unit test. * Address code review comments * Tweaks style stuff.	2017-09-26 20:49:41 -07:00
James Phillips	4b17c9618f	Cleans up some edge cases in TestSnapshot_Forward_Leader. These could cause the tests to hang.	2017-09-26 14:07:28 -07:00
Preetha Appan	318d0232f7	Move Raft protocol version for list peers end point to server side, fix unit tests. This fixes #3449	2017-09-26 09:35:39 -05:00
James Phillips	fcaa889116	Bumps default Raft protocol to version 3. (#3477 ) * Changes default Raft protocol to 3. * Changes numPeers() to report only voters. This should have been there before, but it's more obvious that this is incorrect now that we default the Raft protocol to 3, which puts new servers in a read-only state while Autopilot waits for them to become healthy. * Fixes TestLeader_RollRaftServer. * Fixes TestOperator_RaftRemovePeerByAddress. * Fixes TestServer_. Relaxed the check for a given number of voter peers and instead do a thorough check that all servers see each other in their Raft configurations. Fixes TestACL_. These now just check for Raft replication to be set up, and don't care about the number of voter peers. Fixes TestOperator_Raft_ListPeers. * Fixes TestAutopilot_CleanupDeadServerPeriodic. * Fixes TestCatalog_ListNodes_ConsistentRead_Fail. * Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away sockets when it sees io.EOF. * Changes version to 1.0.0 in the options doc. * Makes metrics test more deterministic with autopilot metrics possible.	2017-09-25 15:27:04 -07:00
Preetha Appan	8394ad08db	Introduce Code Policy validation via sentinel, with a noop implementation	2017-09-25 13:44:55 -05:00
Frank Schröder	69a088ca85	New config parser, HCL support, multiple bind addrs (#3480 ) * new config parser for agent This patch implements a new config parser for the consul agent which makes the following changes to the previous implementation: * add HCL support * all configuration fragments in tests and for default config are expressed as HCL fragments * HCL fragments can be provided on the command line so that they can eventually replace the command line flags. * HCL/JSON fragments are parsed into a temporary Config structure which can be merged using reflection (all values are pointers). The existing merge logic of overwrite for values and append for slices has been preserved. * A single builder process generates a typed runtime configuration for the agent. The new implementation is more strict and fails in the builder process if no valid runtime configuration can be generated. Therefore, additional validations in other parts of the code should be removed. The builder also pre-computes all required network addresses so that no address/port magic should be required where the configuration is used and should therefore be removed. * Upgrade github.com/hashicorp/hcl to support int64 * improve error messages * fix directory permission test * Fix rtt test * Fix ForceLeave test * Skip performance test for now until we know what to do * Update github.com/hashicorp/memberlist to update log prefix * Make memberlist use the default logger * improve config error handling * do not fail on non-existing data-dir * experiment with non-uniform timeouts to get a handle on stalled leader elections * Run tests for packages separately to eliminate the spurious port conflicts * refactor private address detection and unify approach for ipv4 and ipv6. Fixes #2825 * do not allow unix sockets for DNS * improve bind and advertise addr error handling * go through builder using test coverage * minimal update to the docs * more coverage tests fixed * more tests * fix makefile * cleanup * fix port conflicts with external port server 'porter' * stop test server on error * do not run api test that change global ENV concurrently with the other tests * Run remaining api tests concurrently * no need for retry with the port number service * monkey patch race condition in go-sockaddr until we understand why that fails * monkey patch hcl decoder race condidtion until we understand why that fails * monkey patch spurious errors in strings.EqualFold from here * add test for hcl decoder race condition. Run with go test -parallel 128 * Increase timeout again * cleanup * don't log port allocations by default * use base command arg parsing to format help output properly * handle -dc deprecation case in Build * switch autopilot.max_trailing_logs to int * remove duplicate test case * remove unused methods * remove comments about flag/config value inconsistencies * switch got and want around since the error message was misleading. * Removes a stray debug log. * Removes a stray newline in imports. * Fixes TestACL_Version8. * Runs go fmt. * Adds a default case for unknown address types. * Reoders and reformats some imports. * Adds some comments and fixes typos. * Reorders imports. * add unix socket support for dns later * drop all deprecated flags and arguments * fix wrong field name * remove stray node-id file * drop unnecessary patch section in test * drop duplicate test * add test for LeaveOnTerm and SkipLeaveOnInt in client mode * drop "bla" and add clarifying comment for the test * split up tests to support enterprise/non-enterprise tests * drop raft multiplier and derive values during build phase * sanitize runtime config reflectively and add test * detect invalid config fields * fix tests with invalid config fields * use different values for wan sanitiziation test * drop recursor in favor of recursors * allow dns_config.udp_answer_limit to be zero * make sure tests run on machines with multiple ips * Fix failing tests in a few more places by providing a bind address in the test * Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder. * Add porter to server_test.go to make tests there less flaky * go fmt	2017-09-25 11:40:42 -07:00
James Phillips	268018c558	Robustifies check in TestCatalog_ListNodes_ConsistentRead_Fail test. Fixes #3469	2017-09-13 21:22:53 -07:00
James Phillips	8be4ee766a	Revert "Manages segments list via a pointer." This reverts commit c277a4250461443cbd63de0259e5e32766f651ea.	2017-09-07 16:37:11 -07:00
James Phillips	5008aabb62	Manages segments list via a pointer.	2017-09-07 16:21:07 -07:00
James Phillips	908f7be97f	Cleans up formatting.	2017-09-07 12:26:58 -07:00
James Phillips	02a3f3f27b	Shows the segment name in the keyring API and command output.	2017-09-07 12:17:39 -07:00
James Phillips	7c616e3768	Moves reconcile loop into segment stub.	2017-09-06 18:01:53 -07:00
James Phillips	4e34c2af06	Takes the skip out of the client check. Without this the merge delegate won't check the segment for non-servers a little below here.	2017-09-06 17:05:40 -07:00
James Phillips	78ac144fff	Merge pull request #3447 from hashicorp/issue-3070 Skips unique node ID check for old versions of Consul.	2017-09-06 13:24:15 -07:00
James Phillips	62d9299646	Fixes incorrect comment.	2017-09-06 13:23:19 -07:00
James Phillips	031f1874d0	Pulls down some code for the check loop.	2017-09-06 13:07:42 -07:00
James Phillips	2fd9328b21	Uses the Raft configuration for the self-add skip check.	2017-09-06 13:05:51 -07:00
Preetha Appan	1eae9f1e2f	Change member join reconcile step to process joining itself, to handle node IP address changes correctly when number of servers < 3	2017-09-06 13:53:01 -05:00
James Phillips	353e037c9b	Skips unique node ID check for old versions of Consul. Fixes #3070.	2017-09-05 22:57:29 -07:00
James Phillips	c629773b40	Makes the all segments query explict, and the default for `consul members`.	2017-09-05 12:22:20 -07:00
James Phillips	bc9780baad	Adds simple rate limiting for client agent RPC calls to Consul servers. (#3440 ) * Added rate limiting for agent RPC calls. * Initializes the rate limiter based on the config. * Adds the rate limiter into the snapshot RPC path. * Adds unit tests for the RPC rate limiter. * Groups the RPC limit parameters under "limits" in the config. * Adds some documentation about the RPC limiter. * Sends a 429 response when the rate limiter kicks in. * Adds docs for new telemetry. * Makes snapshot telemetry look like RPC telemetry and cleans up comments.	2017-09-01 15:02:50 -07:00

1 2 3 4 5

240 commits