Commit Graph

57 Commits

Author SHA1 Message Date
Freddy c19f46639b
Restore NotifyListen to avoid panic in newServer retry (#6200) 2019-07-23 14:33:00 -06:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
Freddy f59e6db9b1
Reduce number of servers in TestServer_Expect_NonVoters (#6155) 2019-07-17 11:35:33 -06:00
Freddy a295d9e5db
Flaky test overhaul (#6100) 2019-07-12 09:52:26 -06:00
Hans Hasselberg 0d8d7ae052
agent: transfer leadership when establishLeadership fails (#5247) 2019-06-19 14:50:48 +02:00
Paul Banks e90fab0aec
Add rate limiting to RPCs sent within a server instance too (#5927) 2019-06-13 04:26:27 -05:00
Freddy f7f0207f78
Run TestServer_Expect on its own (#5890) 2019-05-23 19:52:33 -04:00
Kyle Havlovitz ad24456f49
Set the dead node reclaim timer at 30s 2019-05-15 11:59:33 -07:00
Kyle Havlovitz 64174f13d6 Add HTTP endpoints for config entry management (#5718) 2019-04-29 18:08:09 -04:00
Matt Keeler 3ea9fe3bff
Implement bootstrapping proxy defaults from the config file (#5714) 2019-04-26 14:25:03 -04:00
Alvin Huang 96c2c79908
Add fmt and vet (#5671)
* add go fmt and vet

* go fmt fixes
2019-04-25 12:26:33 -04:00
Jeff Mitchell d3c7d57209
Move internal/ to sdk/ (#5568)
* Move internal/ to sdk/

* Add a readme to the SDK folder
2019-03-27 08:54:56 -04:00
Jeff Mitchell a41c865059
Convert to Go Modules (#5517)
* First conversion

* Use serf 0.8.2 tag and associated updated deps

* * Move freeport and testutil into internal/

* Make internal/ its own module

* Update imports

* Add replace statements so API and normal Consul code are
self-referencing for ease of development

* Adapt to newer goe/values

* Bump to new cleanhttp

* Fix ban nonprintable chars test

* Update lock bad args test

The error message when the duration cannot be parsed changed in Go 1.12
(ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test.

* Update another test as well

* Bump travis

* Bump circleci

* Bump go-discover and godo to get rid of launchpad dep

* Bump dockerfile go version

* fix tar command

* Bump go-cleanhttp
2019-03-26 17:04:58 -04:00
Hans Hasselberg d511e86491
agent: enable reloading of tls config (#5419)
This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.
2019-03-13 10:29:06 +01:00
R.B. Boyer c24e3584be improve flaky LANReap tests by expliciting configuring the tombstone timeout
In TestServer_LANReap autopilot is running, so the alternate flow
through the serf reaping function is possible. In that situation the
ReconnectTimeout is not relevant so for parity also override the
TombstoneTimeout value as well.

For additional parity update the TestServer_WANReap and
TestClient_LANReap versions of this test in the same way even though
autopilot is irrelevant here .
2019-03-05 14:34:03 -06:00
Matt Keeler 87f9365eee Fixes for CVE-2019-8336
Fix error in detecting raft replication errors.

Detect redacted token secrets and prevent attempting to insert.

Add a Redacted field to the TokenBatchRead and TokenRead RPC endpoints

This will indicate whether token secrets have been redacted.

Ensure any token with a redacted secret in secondary datacenters is removed.

Test that redacted tokens cannot be replicated.
2019-03-04 19:13:24 +00:00
Matt Keeler 416a6543a6
Call RemoveServer for reap events (#5317)
This ensures that servers are removed from RPC routing when they are reaped.
2019-03-04 09:19:35 -05:00
Hans Hasselberg 75ababb54f
Centralise tls configuration part 1 (#5366)
In order to be able to reload the TLS configuration, we need one way to generate the different configurations.

This PR introduces a `tlsutil.Configurator` which holds a `tlsutil.Config`. Afterwards it is responsible for rendering every `tls.Config`. In this particular PR I moved `IncomingHTTPSConfig`, `IncomingTLSConfig`, and `OutgoingTLSWrapper` into `tlsutil.Configurator`.

This PR is a pure refactoring - not a single feature added. And not a single test added. I only slightly modified existing tests as necessary.
2019-02-26 16:52:07 +01:00
R.B. Boyer b5d71ea779
testutil: redirect some test agent logs to testing.T.Logf (#5304)
When tests fail, only the logs for the failing run are dumped to the
console which helps in diagnosis. This is easily added to other test
scenarios as they come up.
2019-02-01 09:21:54 -06:00
Pierre Souchay d0ca1bade9 Fixed another list of unstable unit tests in travis (#4915)
* Fixed another list of unstable unit tests in travis

Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061

* Fixed another list of unstable unit tests in travis.

Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061
2018-11-20 11:27:26 +00:00
Kyle Havlovitz 19f9cad3fe
oss: bump test server version to 1.4.0 2018-11-13 13:13:26 -08:00
Kyle Havlovitz 038aefa0bc update non-voting server test to fix enterprise diff 2018-11-09 12:50:24 -08:00
Matt Keeler 0dd537e506
Fix the NonVoter Bootstrap test (#4786) 2018-10-24 10:23:50 -04:00
Alex Dadgar 90ed72fd70 do not bootstrap with non voters 2018-09-19 17:41:36 -07:00
Paul Banks 09e4c2995b
Fix CA pruning when CA config uses string durations. (#4669)
* Fix CA pruning when CA config uses string durations.

The tl;dr here is:

 - Configuring LeafCertTTL with a string like "72h" is how we do it by default and should be supported
 - Most of our tests managed to escape this by defining them as time.Duration directly
 - Out actual default value is a string
 - Since this is stored in a map[string]interface{} config, when it is written to Raft it goes through a msgpack encode/decode cycle (even though it's written from server not over RPC).
 - msgpack decode leaves the string as a `[]uint8`
 - Some of our parsers required string and failed
 - So after 1 hour, a default configured server would throw an error about pruning old CAs
 - If a new CA was configured that set LeafCertTTL as a time.Duration, things might be OK after that, but if a new CA was just configured from config file, intialization would cause same issue but always fail still so would never prune the old CA.
 - Mostly this is just a janky error that got passed tests due to many levels of complicated encoding/decoding.

tl;dr of the tl;dr: Yay for type safety. Map[string]interface{} combined with msgpack always goes wrong but we somehow get bitten every time in a new way :D

We already fixed this once! The main CA config had the same problem so @kyhavlov already wrote the mapstructure DecodeHook that fixes it. It wasn't used in several places it needed to be and one of those is notw in `structs` which caused a dependency cycle so I've moved them.

This adds a whole new test thta explicitly tests the case that broke here. It also adds tests that would have failed in other places before (Consul and Vaul provider parsing functions). I'm not sure if they would ever be affected as it is now as we've not seen things broken with them but it seems better to explicitly test that and support it to not be bitten a third time!

* Typo fix

* Fix bad Uint8 usage
2018-09-13 15:43:00 +01:00
Paul Banks dbcf286d4c
Ooops remove the CA stuff from actual server defaults and make it test server only 2018-06-14 09:42:16 -07:00
Paul Banks 834ed1d25f
Fixed many tests after rebase. Some still failing and seem unrelated to any connect changes. 2018-06-14 09:42:16 -07:00
Kyle Havlovitz 19b9399f2f
Add more tests for built-in provider 2018-06-14 09:42:06 -07:00
Josh Soref 1dd8c378b9 Spelling (#3958)
* spelling: another

* spelling: autopilot

* spelling: beginning

* spelling: circonus

* spelling: default

* spelling: definition

* spelling: distance

* spelling: encountered

* spelling: enterprise

* spelling: expands

* spelling: exits

* spelling: formatting

* spelling: health

* spelling: hierarchy

* spelling: imposed

* spelling: independence

* spelling: inspect

* spelling: last

* spelling: latest

* spelling: client

* spelling: message

* spelling: minimum

* spelling: notify

* spelling: nonexistent

* spelling: operator

* spelling: payload

* spelling: preceded

* spelling: prepared

* spelling: programmatically

* spelling: required

* spelling: reconcile

* spelling: responses

* spelling: request

* spelling: response

* spelling: results

* spelling: retrieve

* spelling: service

* spelling: significantly

* spelling: specifies

* spelling: supported

* spelling: synchronization

* spelling: synchronous

* spelling: themselves

* spelling: unexpected

* spelling: validations

* spelling: value
2018-03-19 16:56:00 +00:00
Devin Canterberry 881d20c606
🐛 Formatting changes only; add missing trailing commas 2018-03-15 10:19:46 -07:00
Preetha Appan 77d35f1829
Remove extra newline 2018-02-21 13:21:47 -06:00
Preetha Appan 573500dc51
Unit test that calls revokeLeadership twice to make sure its idempotent 2018-02-21 12:48:53 -06:00
James Phillips b94ba8aeb4
Removes bogus getPort() in favor of freeport. 2017-11-08 19:55:50 -08:00
James Phillips c6e0366c02
Relaxes Autopilot promotion logic. (#3623)
* Relaxes Autopilot promotion logic.

When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611

* Gets rid of unnecessary extra not-a-voter check.
2017-10-31 15:16:56 -05:00
Frank Schroeder 74859ff3c0 test: replace porter tool with freeport lib
This patch removes the porter tool which hands out free ports from a
given range with a library which does the same thing. The challenge for
acquiring free ports in concurrent go test runs is that go packages are
tested concurrently and run in separate processes. There has to be some
inter-process synchronization in preventing processes allocating the
same ports.

freeport allocates blocks of ports from a range expected to be not in
heavy use and implements a system-wide mutex by binding to the first
port of that block for the lifetime of the application. Ports are then
provided sequentially from that block and are tested on localhost before
being returned as available.
2017-10-21 22:01:09 +02:00
James Phillips d1ad538345 Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
James Phillips fcaa889116 Bumps default Raft protocol to version 3. (#3477)
* Changes default Raft protocol to 3.

* Changes numPeers() to report only voters.

This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.

* Fixes TestLeader_RollRaftServer.

* Fixes TestOperator_RaftRemovePeerByAddress.

* Fixes TestServer_*.

Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.

* Fixes TestACL_*.

These now just check for Raft replication to be set up, and don't
care about the number of voter peers.

* Fixes TestOperator_Raft_ListPeers.

* Fixes TestAutopilot_CleanupDeadServerPeriodic.

* Fixes TestCatalog_ListNodes_ConsistentRead_Fail.

* Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away
sockets when it sees io.EOF.

* Changes version to 1.0.0 in the options doc.

* Makes metrics test more deterministic with autopilot metrics possible.
2017-09-25 15:27:04 -07:00
Frank Schröder 69a088ca85 New config parser, HCL support, multiple bind addrs (#3480)
* new config parser for agent

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

* Upgrade github.com/hashicorp/hcl to support int64

* improve error messages

* fix directory permission test

* Fix rtt test

* Fix ForceLeave test

* Skip performance test for now until we know what to do

* Update github.com/hashicorp/memberlist to update log prefix

* Make memberlist use the default logger

* improve config error handling

* do not fail on non-existing data-dir

* experiment with non-uniform timeouts to get a handle on stalled leader elections

* Run tests for packages separately to eliminate the spurious port conflicts

* refactor private address detection and unify approach for ipv4 and ipv6.

Fixes #2825

* do not allow unix sockets for DNS

* improve bind and advertise addr error handling

* go through builder using test coverage

* minimal update to the docs

* more coverage tests fixed

* more tests

* fix makefile

* cleanup

* fix port conflicts with external port server 'porter'

* stop test server on error

* do not run api test that change global ENV concurrently with the other tests

* Run remaining api tests concurrently

* no need for retry with the port number service

* monkey patch race condition in go-sockaddr until we understand why that fails

* monkey patch hcl decoder race condidtion until we understand why that fails

* monkey patch spurious errors in strings.EqualFold from here

* add test for hcl decoder race condition. Run with go test -parallel 128

* Increase timeout again

* cleanup

* don't log port allocations by default

* use base command arg parsing to format help output properly

* handle -dc deprecation case in Build

* switch autopilot.max_trailing_logs to int

* remove duplicate test case

* remove unused methods

* remove comments about flag/config value inconsistencies

* switch got and want around since the error message was misleading.

* Removes a stray debug log.

* Removes a stray newline in imports.

* Fixes TestACL_Version8.

* Runs go fmt.

* Adds a default case for unknown address types.

* Reoders and reformats some imports.

* Adds some comments and fixes typos.

* Reorders imports.

* add unix socket support for dns later

* drop all deprecated flags and arguments

* fix wrong field name

* remove stray node-id file

* drop unnecessary patch section in test

* drop duplicate test

* add test for LeaveOnTerm and SkipLeaveOnInt in client mode

* drop "bla" and add clarifying comment for the test

* split up tests to support enterprise/non-enterprise tests

* drop raft multiplier and derive values during build phase

* sanitize runtime config reflectively and add test

* detect invalid config fields

* fix tests with invalid config fields

* use different values for wan sanitiziation test

* drop recursor in favor of recursors

* allow dns_config.udp_answer_limit to be zero

* make sure tests run on machines with multiple ips

* Fix failing tests in a few more places by providing a bind address in the test

* Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder.

* Add porter to server_test.go to make tests there less flaky

* go fmt
2017-09-25 11:40:42 -07:00
Preetha Appan e944370cde More cleanup from code review 2017-08-30 12:31:36 -05:00
Preetha Appan 5a29eb7486 Consolidate server lookup into one place and replace usages of localConsuls. 2017-08-30 09:30:33 -05:00
Frank Schroeder c38dcf2d17
agent: move agent/consul/agent to agent/metadata 2017-08-09 14:36:52 +02:00
James Phillips 803ed9a245 Adds secure introduction for the ACL replication token. (#3357)
Adds secure introduction for the ACL replication token, as well as a separate enable config for ACL replication.
2017-08-03 15:39:31 -07:00
James Phillips 7b54e325df Adds a comment about flood joining. 2017-07-07 09:22:34 +02:00
Frank Schroeder 74d3c4d896 rpc: make TestServer_JoinSeparateLanAndWanAddresses more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder 2eb2941e8c rpc: fix logging and try quicker timing of TestServer_JoinSeparateLanAndWanAddresses 2017-07-07 09:22:34 +02:00
Frank Schroeder 98510f898c rpc: less agressive raft timeouts
Allowing more time for raft to consolidate should
drop the number of leader elections.
2017-07-07 09:22:34 +02:00
Frank Schroeder 50c81a9397 rpc: run agent/consul tests in parallel 2017-07-07 09:22:34 +02:00
Frank Schroeder 06ad8e96be rpc: fix TestServer_Leave
wait for the leader election.
2017-07-07 09:22:34 +02:00
Frank Schroeder e3252f921a rpc: fix for 'no leader' in TLS tests
Ensure both servers know about each other before looking
for a leader.
2017-07-07 09:22:34 +02:00
Frank Schroeder 2497b8416b rpc: fix TestServer_JoinWAN_Flood
The second server in the first data center should not be
in bootstrap mode.
2017-07-07 09:22:34 +02:00