Commit graph

8909 commits

Author SHA1 Message Date
Siva Prasad 5fe9053416
TestAgentAntiEntropy: Wait until Consul service is up on the agent. (#4591)
* Anti-Entropy test wait for Consul service added

* Reverted some tests back to using WaitForLeader
2018-08-28 09:52:11 -04:00
Pierre Souchay 09ff5f7224 Fixed unit test TestCatalogListServicesCommand (#4592) 2018-08-27 13:53:46 -04:00
Pierre Souchay d0c4539fba Fix unit test TestOperatorAutopilotGetConfigCommand (#4594) 2018-08-27 13:29:25 -04:00
Pierre Souchay 33184fce85 Fixed unstable test TestUiNodeInfo (#4586) 2018-08-27 11:49:14 -04:00
Pierre Souchay 0c2d9b0782 Fixed unstable test TestProxy_public (#4587) 2018-08-27 11:45:07 -04:00
Pierre Souchay 73824ab65f Fixed unstable test TestRTTCommand_LAN in command/rtt (#4585) 2018-08-27 11:37:13 -04:00
Pierre Souchay 5f603d86ad Fix unstable test TestRegisterMonitor_heartbeat (#4568) 2018-08-24 13:33:58 -04:00
Kyle Havlovitz 88e9e81734
Merge pull request #4577 from remijouannet/patch-1
Update monitoring-telegraf.html.md
2018-08-24 09:13:53 -07:00
Rémi Jouannet 3d5d7ba6df
Update monitoring-telegraf.html.md 2018-08-24 16:48:02 +02:00
John Cowen c619223367
UI: Fixes healthy node listing resize on large portrait screens (#4564)
1. Split the resizing functionality of into a separate mixin to be
shared across components
2. Add basic integration tests to prove that everything is getting
called through out the lifetime of the app. I decided against unit
testing as there isn't really any isolated logic to be tested, more
checking that things are being called in the correct order etc i.e. the
integration is correct.

Adds assertion to with-resizing so its obvious to override `resize`
2018-08-24 12:35:52 +02:00
Freddy 2f51b5a4de
Update CHANGELOG.md (#4562) 2018-08-23 15:20:26 -04:00
Pierre Souchay 9b5cf0c1d0 [BUGFIX] Avoid returning empty data on startup of a non-leader server (#4554)
Ensure that DB is properly initialized when performing stale queries

Addresses:
- https://github.com/hashicorp/consul-replicate/issues/82
- https://github.com/hashicorp/consul/issues/3975
- https://github.com/hashicorp/consul-template/issues/1131
2018-08-23 12:06:39 -04:00
Paul Banks cb340cefea
Intention ACL API clarification (#4547) 2018-08-20 20:33:15 +01:00
jjshanks 12342e0d0e Update intentions documentation to clarify ACL behavior (#4546)
* Update intentions documentation to clarify ACL behavior

* Incorprate @banks suggestions into docs

* Fix my own typos!
2018-08-20 20:03:53 +01:00
Matt Keeler fb4552d070
Update CHANGELOG.md 2018-08-17 16:20:34 -04:00
Miroslav Bagljas 8f7e87439a Fixes #4483: Add support for Authorization: Bearer token Header (#4502)
Added Authorization Bearer token support as per RFC6750

* appended Authorization header token parsing after X-Consul-Token
* added test cases
* updated website documentation to mention Authorization header

* improve tests, improve Bearer parsing
2018-08-17 16:18:42 -04:00
Matt Keeler 89194ee16c
Update CHANGELOG.md 2018-08-17 14:45:52 -04:00
Matt Keeler da3ed5dc76
Fix #4515: Segfault when serf_wan port was -1 but reconnect_time_wan was set (#4531)
Fixes #4515 

This just slightly refactors the logic to only attempt to set the serf wan reconnect timeout when the rest of the serf wan settings are configured - thus avoiding a segfault.
2018-08-17 14:44:25 -04:00
Siva Prasad 1932e25c98
Fixed a make build issue with Windows Binaries. (#4538)
* Fixed an issue where Windows binary had trouble being copied correctly

* Enclosed binname inside angular brackets
2018-08-17 09:31:57 -04:00
Kyle Havlovitz adb2aef591
Merge pull request #4535 from hashicorp/ca-snapshot-fix
fsm: add missing CA config to snapshot/restore logic
2018-08-16 13:05:47 -07:00
Kyle Havlovitz 26a21df014
Merge branch 'master' into ca-snapshot-fix 2018-08-16 13:00:54 -07:00
Kyle Havlovitz af4b037c52
fsm: add connect service config to snapshot/restore test 2018-08-16 12:58:54 -07:00
Matt Keeler c48e18df23
Update CHANGELOG.md 2018-08-16 15:35:02 -04:00
nickmy9729 43a68822e3 Added code to allow snapshot inclusion of NodeMeta (#4527) 2018-08-16 15:33:35 -04:00
Kyle Havlovitz 880eccb502
fsm: add missing CA config to snapshot/restore logic 2018-08-16 11:58:50 -07:00
Kyle Havlovitz 2f4f2f28da
Merge pull request #4528 from hashicorp/autopilot-fixes
Fix inconsistency caused by the autopilot StatsFetcher
2018-08-15 11:35:52 -07:00
Siva Prasad 33c3d1ddfa
Addresses the flakiness of CatalogNodes (#4530) 2018-08-15 11:16:05 -04:00
sandstrom 0e987522d9 Clarify port usage for agents (#4510) 2018-08-14 16:10:01 -07:00
Kyle Havlovitz fd83063686
autopilot: don't follow the normal server removal rules for nonvoters 2018-08-14 14:24:51 -07:00
Kyle Havlovitz aa19559cc7
Fix stats fetcher healthcheck RPCs not being independent 2018-08-14 14:23:52 -07:00
Shubheksha 1afcabb0a2 replace old fork of text package (#4501) 2018-08-14 12:23:18 -07:00
Pierre Souchay a16f34058b Display more information about check being not properly added when it fails (#4405)
* Display more information about check being not properly added when it fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

* Renamed ensureCheckIfNodeMatches() as requested by @banks
2018-08-14 17:45:33 +01:00
Freddy cbe61dfcec
Improve reliability of tests with TestAgent (#4525)
- Add WaitForTestAgent to tests flaky due to missing serfHealth registration

- Fix bug in retries calling Fatalf with *testing.T

- Convert TestLockCommand_ChildExitCode to table driven test
2018-08-14 12:08:33 -04:00
Paul Banks 352057b956
Update intentions.html.md 2018-08-14 15:09:45 +01:00
Freddy e06cdb5278
Address flakiness in command/exec tests (#4517)
* Add fn to wait for TestAgent node and check registration

* Add waits for TestAgent and retries before timeouts in exec_test
2018-08-10 15:04:07 -04:00
Geoffrey Grosenbach d3573d7c27
Consul Production Deployment Guide
Renames guide to "Production Deployment"
Adds link in sidebar menu.
Implements edits suggested by Consul engineering team.
2018-08-10 11:51:05 -07:00
Matt Keeler 9ac477f740
Update CHANGELOG.md 2018-08-10 11:33:22 -04:00
Pierre Souchay 821a91ca31 Allow to rename nodes with IDs, will fix #3974 and #4413 (#4415)
* Allow to rename nodes with IDs, will fix #3974 and #4413

This change allow to rename any well behaving recent agent with an
ID to be renamed safely, ie: without taking the name of another one
with case insensitive comparison.

Deprecated behaviour warning
----------------------------

Due to asceding compatibility, it is still possible however to
"take" the name of another name by not providing any ID.

Note that when not providing any ID, it is possible to have 2 nodes
having similar names with case differences, ie: myNode and mynode
which might lead to DB corruption on Consul server side and
lead to server not properly restarting.

See #3983 and #4399 for Context about this change.

Disabling registration of nodes without IDs as specified in #4414
should probably be the way to go eventually.

* Removed the case-insensitive search when adding a node within the else
block since it breaks the test TestAgentAntiEntropy_Services

While the else case is probably legit, it will be fixed with #4414 in
a later release.

* Added again the test in the else to avoid duplicated names, but
enforce this test only for nodes having IDs.

Thus most tests without any ID will work, and allows us fixing

* Added more tests regarding request with/without IDs.

`TestStateStore_EnsureNode` now test registration and renaming with IDs

`TestStateStore_EnsureNodeDeprecated` tests registration without IDs
and tests removing an ID from a node as well as updated a node
without its ID (deprecated behaviour kept for backwards compatibility)

* Do not allow renaming in case of conflict, including when other node has no ID

* Fixed function GetNodeID that was not working due to wrong type when searching node from its ID

Thus, all tests about renaming were not working properly.

Added the full test cas that allowed me to detect it.

* Better error messages, more tests when nodeID is not a valid UUID in GetNodeID()

* Added separate TestStateStore_GetNodeID to test GetNodeID.

More complete test coverage for GetNodeID

* Added new unit test `TestStateStore_ensureNoNodeWithSimilarNameTxn`

Also fixed comments to be clearer after remarks from @banks

* Fixed error message in unit test to match test case

* Use uuid.ParseUUID to parse Node.ID as requested by @mkeeler
2018-08-10 11:30:45 -04:00
Paul Banks 3adfe86f03 Update Serf and memberlist (#4511)
This includes fixes that improve gossip scalability on very large (> 10k node) clusters.

The Serf changes:
 - take snapshot disk IO out of the critical path for handling messages hashicorp/serf#524
 - make snapshot compaction much less aggressive - the old fixed threshold caused snapshots to be constantly compacted (synchronously with request handling) on clusters larger than about 2000 nodes! hashicorp/serf#525

Memberlist changes:
 - prioritize handling alive messages over suspect/dead to improve stability, and handle queue in LIFO order to avoid acting on info that 's already stale in the queue by the time we handle it. hashicorp/memberlist#159
 - limit the number of concurrent pushPull requests being handled at once to 128. In one test scenario with 10s of thousands of servers we saw channel and lock blocking cause over 3000 pushPulls at once which ballooned the memory of the server because each push pull contained a de-serialised list of all known 10k+ nodes and their tags for a total of about 60 million objects and 7GB of memory stuck. While the rest of the fixes here should prevent the same root cause from blocking in the same way, this prevents any other bug or source of contention from allowing pushPull messages to stack up and eat resources. hashicorp/memberlist#158
2018-08-09 13:16:13 -04:00
Siva Prasad d98d02777f
PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512)
* Fixes TestAgent_IndexChurn

* Fixes TestPreparedQuery_Wrapper

* Increased sleep in agent_test for IndexChurn to 500ms

* Made the comment about joinWAN operation much less of a cliffhanger
2018-08-09 12:40:07 -04:00
Armon Dadgar 998235c50e
Merge pull request #4505 from hashicorp/f-channel-size
consul: Update buffer sizes
2018-08-08 12:26:23 -07:00
Freddy ba90922b27
Improve flaky connect/proxy Listener tests (#4498)
Improve flaky connect/proxy Listener tests

- Add sleep to TestEchoConn to allow for Read/Write to finish before fetching data in reportStats

- Account for flakiness around interval for Gauge

- Improve debug output when dumping metrics
2018-08-08 14:56:03 -04:00
Armon Dadgar a343392f63 consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
Geoffrey Grosenbach 8dd1d92283
Merge pull request #4485 from hashicorp/doc-remove-atlas
Remove all mention of Atlas, even in deprecated changelogs
2018-08-07 10:45:50 -07:00
Geoffrey Grosenbach ddfc5311aa
Merge pull request #4318 from hashicorp/doc-discovery-code-snippet
Improve styling of discovery snippet
2018-08-07 10:44:58 -07:00
Siva Prasad cfa436dc16
Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix." (#4497)
* Revert "BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)"

This reverts commit cec5d7239621e0732b3f70158addb1899442acb3.

* Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)"

This reverts commit 589b589b53e56af38de25db9b56967bdf1f2c069.
2018-08-07 08:29:48 -04:00
Pierre Souchay fd927ea110 BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)
- Improve resilience of testrpc.WaitForLeader()

- Add additionall retry to CI

- Increase "go test" timeout to 8m

- Add wait for cluster leader to several tests in the agent package

- Add retry to some tests in the api and command packages
2018-08-06 19:46:09 -04:00
Siva Prasad 29c181f5fa
CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)
* connect: fix an issue with Consul CA bootstrapping being interrupted

* streamline change server id test
2018-08-06 16:15:24 -04:00
Geoffrey Grosenbach c2c6765fc0 Remove all mention of Atlas, even in deprecated changelogs 2018-08-03 10:51:18 -07:00
Freddy f0a8f678c7
Merge pull request #4484 from freddygv/travis-in-containers
Move travis build env back to containers
2018-08-02 15:51:36 -04:00