Commit Graph

10278 Commits

Author SHA1 Message Date
Pierre Souchay a16f34058b Display more information about check being not properly added when it fails (#4405)
* Display more information about check being not properly added when it fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

* Renamed ensureCheckIfNodeMatches() as requested by @banks
2018-08-14 17:45:33 +01:00
Freddy cbe61dfcec
Improve reliability of tests with TestAgent (#4525)
- Add WaitForTestAgent to tests flaky due to missing serfHealth registration

- Fix bug in retries calling Fatalf with *testing.T

- Convert TestLockCommand_ChildExitCode to table driven test
2018-08-14 12:08:33 -04:00
Paul Banks 352057b956
Update intentions.html.md 2018-08-14 15:09:45 +01:00
Freddy e06cdb5278
Address flakiness in command/exec tests (#4517)
* Add fn to wait for TestAgent node and check registration

* Add waits for TestAgent and retries before timeouts in exec_test
2018-08-10 15:04:07 -04:00
Geoffrey Grosenbach d3573d7c27
Consul Production Deployment Guide
Renames guide to "Production Deployment"
Adds link in sidebar menu.
Implements edits suggested by Consul engineering team.
2018-08-10 11:51:05 -07:00
Matt Keeler 9ac477f740
Update CHANGELOG.md 2018-08-10 11:33:22 -04:00
Pierre Souchay 821a91ca31 Allow to rename nodes with IDs, will fix #3974 and #4413 (#4415)
* Allow to rename nodes with IDs, will fix #3974 and #4413

This change allow to rename any well behaving recent agent with an
ID to be renamed safely, ie: without taking the name of another one
with case insensitive comparison.

Deprecated behaviour warning
----------------------------

Due to asceding compatibility, it is still possible however to
"take" the name of another name by not providing any ID.

Note that when not providing any ID, it is possible to have 2 nodes
having similar names with case differences, ie: myNode and mynode
which might lead to DB corruption on Consul server side and
lead to server not properly restarting.

See #3983 and #4399 for Context about this change.

Disabling registration of nodes without IDs as specified in #4414
should probably be the way to go eventually.

* Removed the case-insensitive search when adding a node within the else
block since it breaks the test TestAgentAntiEntropy_Services

While the else case is probably legit, it will be fixed with #4414 in
a later release.

* Added again the test in the else to avoid duplicated names, but
enforce this test only for nodes having IDs.

Thus most tests without any ID will work, and allows us fixing

* Added more tests regarding request with/without IDs.

`TestStateStore_EnsureNode` now test registration and renaming with IDs

`TestStateStore_EnsureNodeDeprecated` tests registration without IDs
and tests removing an ID from a node as well as updated a node
without its ID (deprecated behaviour kept for backwards compatibility)

* Do not allow renaming in case of conflict, including when other node has no ID

* Fixed function GetNodeID that was not working due to wrong type when searching node from its ID

Thus, all tests about renaming were not working properly.

Added the full test cas that allowed me to detect it.

* Better error messages, more tests when nodeID is not a valid UUID in GetNodeID()

* Added separate TestStateStore_GetNodeID to test GetNodeID.

More complete test coverage for GetNodeID

* Added new unit test `TestStateStore_ensureNoNodeWithSimilarNameTxn`

Also fixed comments to be clearer after remarks from @banks

* Fixed error message in unit test to match test case

* Use uuid.ParseUUID to parse Node.ID as requested by @mkeeler
2018-08-10 11:30:45 -04:00
Paul Banks 3adfe86f03 Update Serf and memberlist (#4511)
This includes fixes that improve gossip scalability on very large (> 10k node) clusters.

The Serf changes:
 - take snapshot disk IO out of the critical path for handling messages hashicorp/serf#524
 - make snapshot compaction much less aggressive - the old fixed threshold caused snapshots to be constantly compacted (synchronously with request handling) on clusters larger than about 2000 nodes! hashicorp/serf#525

Memberlist changes:
 - prioritize handling alive messages over suspect/dead to improve stability, and handle queue in LIFO order to avoid acting on info that 's already stale in the queue by the time we handle it. hashicorp/memberlist#159
 - limit the number of concurrent pushPull requests being handled at once to 128. In one test scenario with 10s of thousands of servers we saw channel and lock blocking cause over 3000 pushPulls at once which ballooned the memory of the server because each push pull contained a de-serialised list of all known 10k+ nodes and their tags for a total of about 60 million objects and 7GB of memory stuck. While the rest of the fixes here should prevent the same root cause from blocking in the same way, this prevents any other bug or source of contention from allowing pushPull messages to stack up and eat resources. hashicorp/memberlist#158
2018-08-09 13:16:13 -04:00
Siva Prasad d98d02777f
PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512)
* Fixes TestAgent_IndexChurn

* Fixes TestPreparedQuery_Wrapper

* Increased sleep in agent_test for IndexChurn to 500ms

* Made the comment about joinWAN operation much less of a cliffhanger
2018-08-09 12:40:07 -04:00
Armon Dadgar 998235c50e
Merge pull request #4505 from hashicorp/f-channel-size
consul: Update buffer sizes
2018-08-08 12:26:23 -07:00
Freddy ba90922b27
Improve flaky connect/proxy Listener tests (#4498)
Improve flaky connect/proxy Listener tests

- Add sleep to TestEchoConn to allow for Read/Write to finish before fetching data in reportStats

- Account for flakiness around interval for Gauge

- Improve debug output when dumping metrics
2018-08-08 14:56:03 -04:00
Armon Dadgar a343392f63 consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
Geoffrey Grosenbach 8dd1d92283
Merge pull request #4485 from hashicorp/doc-remove-atlas
Remove all mention of Atlas, even in deprecated changelogs
2018-08-07 10:45:50 -07:00
Geoffrey Grosenbach ddfc5311aa
Merge pull request #4318 from hashicorp/doc-discovery-code-snippet
Improve styling of discovery snippet
2018-08-07 10:44:58 -07:00
Siva Prasad cfa436dc16
Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix." (#4497)
* Revert "BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)"

This reverts commit cec5d7239621e0732b3f70158addb1899442acb3.

* Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)"

This reverts commit 589b589b53e56af38de25db9b56967bdf1f2c069.
2018-08-07 08:29:48 -04:00
Pierre Souchay fd927ea110 BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)
- Improve resilience of testrpc.WaitForLeader()

- Add additionall retry to CI

- Increase "go test" timeout to 8m

- Add wait for cluster leader to several tests in the agent package

- Add retry to some tests in the api and command packages
2018-08-06 19:46:09 -04:00
Siva Prasad 29c181f5fa
CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)
* connect: fix an issue with Consul CA bootstrapping being interrupted

* streamline change server id test
2018-08-06 16:15:24 -04:00
Geoffrey Grosenbach c2c6765fc0 Remove all mention of Atlas, even in deprecated changelogs 2018-08-03 10:51:18 -07:00
Freddy f0a8f678c7
Merge pull request #4484 from freddygv/travis-in-containers
Move travis build env back to containers
2018-08-02 15:51:36 -04:00
Paul Banks 6ac4ed9302
Update Changelog 2018-08-02 18:57:00 +01:00
Jack Pearkes 5c993a3f5f
Clarification for serf_wan documentation (#4459)
* updates docs for agent options

trying to add a little more clarity to suggestion that folks should use
port 8302 for both LAN and WAN comms

* website: clarify language for serf wan port behavior
2018-08-02 10:25:25 -07:00
freddygv bc80364e41 Move travis build env back to containers 2018-08-02 10:30:28 -04:00
Siva Prasad dcd7d9b015
DNS : Fixes recursors answering the DNS query to properly return the correct response. (#4461)
* Fixes the DNS recursor properly resolving the requests

* Added a test case for the recursor bug

* Refactored code && added a test case for all failing recursors

* Inner indentation moved into else if check
2018-08-02 10:12:52 -04:00
Paul Banks 496af9061e
Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
Mitchell Hashimoto 718ce4fe0f
Merge pull request #4464 from hashicorp/je.htmlfix
A couple more docs fixes
2018-07-31 06:57:04 -07:00
Matt Keeler 9e7d6519b6 Putting source back into Dev Mode 2018-07-30 13:54:29 -04:00
John Cowen 08fe675865
UI: Add conditional enterprise logo (#4432)
Adds additional 'enterprise' text underneath the 'startup' logo if the
ui is built with a CONSUL_BINARY_TYPE environment variable that doesn't
equal `oss`.
2018-07-30 17:59:43 +01:00
John Cowen 03935bdf86
ui: Adds bottom breathing space on the bottom of forms (#4433) 2018-07-30 17:58:13 +01:00
John Cowen 7903512b26
UI - Refactor Adapter.handleResponse (#4398)
* Add some tests to check the correct GET API endpoints are called

* Refactor adapters

1. Add integration tests for `urlFor...` and majority `handleResponse` methods
2. Refactor out `handleResponse` a little more into single/batch/boolean
methods
3. Move setting of the `Datacenter` property into the `handleResponse`
method, basically the same place that the uid is being set using the dc
parsed form the URL
4. Add some Errors for if you don't pass ids to certain `urlFor` methods
2018-07-30 17:55:44 +01:00
John Cowen 2dda5a7da3
UI - Non-prod CSS sourcemaps (#4418) 2018-07-30 17:53:14 +01:00
mkeeler 0b775e1645
Release v1.2.2 2018-07-30 16:01:13 +00:00
mkeeler 21ddd2a7a7 Bump website version to 1.2.2 2018-07-30 15:59:17 +00:00
Matt Keeler 6046ee8386
Update CHANGELOG.md 2018-07-30 11:57:53 -04:00
Matt Keeler d763b560b7
Update CHANGELOG.md
Update for fixing #4441
2018-07-30 09:14:07 -04:00
Matt Keeler cbd0afc87c
Handle resolving proxy tokens when parsing HTTP requests (#4453)
Fixes: #4441

This fixes the issue with Connect Managed Proxies + ACLs being broken.

The underlying problem was that the token parsed for most http endpoints was sent untouched to the servers via the RPC request. These changes make it so that at the HTTP endpoint when parsing the token we additionally attempt to convert potential proxy tokens into regular tokens before sending to the RPC endpoint. Proxy tokens are only valid on the agent with the managed proxy so the resolution has to happen before it gets forwarded anywhere.
2018-07-30 09:11:51 -04:00
Geoffrey Grosenbach 85acf6b983 Copy-and-paste Go client example (#4448)
* Copy-and-paste Go client example

Includes Go source that runs without modification, as well as simple
instructions for compiling, running, and viewing the output in the
Consul UI.

* Remove unnecessary flags from development server example

This is a bare minimum Go example needed to store keys and values in
Consul. The `-ui` and `-server` flags aren't needed when running with
`-dev`.
2018-07-30 12:48:19 +01:00
Jeff Escalante 2dea506400 a couple more corrections 2018-07-27 19:39:44 -04:00
John Cowen addf85cbaf ui: Changelog additions (#4458) 2018-07-27 11:12:58 -07:00
Jeff Escalante 60e1450606 fix a couple html errors (#4456) 2018-07-26 16:30:24 -07:00
Christie Koehler fb4a902ca3 docs: Update links to ttl health check endpoints. (#4208)
* docs: Update links to ttl health check endpoints.

* remove absolute URLs
2018-07-26 16:14:44 -07:00
Matt Keeler 8994a6491f
Update CHANGELOG.md 2018-07-26 11:42:33 -04:00
Matt Keeler 5c7c58ed26
Gossip tuneables (#4444)
Expose a few gossip tuneables for both lan and wan interfaces

gossip_nodes
gossip_interval
probe_timeout
probe_interval
retransmit_mult
suspicion_mult
2018-07-26 11:39:49 -04:00
Matt Keeler 567d9eedf6
Merge pull request #4445 from hashicorp/bugfix/build-cross-compile
Fix cross compiling with make
2018-07-26 11:23:47 -04:00
Kyle Havlovitz 42ab07b398
fix inconsistency in TestConnectCAConfig_GetSet 2018-07-26 07:46:47 -07:00
Paul Banks ebebf9984d
Update CHANGELOG.md 2018-07-26 14:12:29 +01:00
Paul Banks e314dd7335
Update CHANGELOG.md 2018-07-26 14:11:26 +01:00
Paul Banks c4be0d2a4f
Document managed proxy logs (#4447)
* Document proxy logs

* Add extra note about terminating proxies
2018-07-26 13:56:28 +01:00
Paul Banks 25628f0e69
Add config option to disable HTTP printable char path check (#4442) 2018-07-26 13:53:39 +01:00
Kyle Havlovitz bf4c8aeac6
Update CHANGELOG.md 2018-07-25 17:54:58 -07:00
Kyle Havlovitz ecc02c6aee
Merge pull request #4400 from hashicorp/leaf-cert-ttl
Add configurable leaf cert TTL to Connect CA
2018-07-25 17:53:25 -07:00