open-consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	047abdd73c	acl: remove ACLDatacenter This field has been unnecessary for a while now. It was always set to the same value as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter directly. This change was made by GoLand refactor, which did most of the work for me.	2021-08-06 18:27:00 -04:00
R.B. Boyer	62ac98b564	agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669 )	2021-07-22 13:20:45 -05:00
Daniel Nephin	58cf5767a8	Merge pull request #10479 from hashicorp/dnephin/ca-provider-explore-2 ca: move Server.SignIntermediate to CAManager	2021-07-12 19:03:43 -04:00
Daniel Nephin	a22bdb2ac9	Merge pull request #10445 from hashicorp/dnephin/ca-provider-explore ca: isolate more of the CA logic in CAManager	2021-07-12 15:26:23 -04:00
Daniel Nephin	34c8585b29	auto-config: move autoConfigBackend impl off of Server Most of these methods are used exclusively for the AutoConfig RPC endpoint. This PR uses a pattern that we've used in other places as an incremental step to reducing the scope of Server.	2021-07-12 13:42:40 -04:00
Daniel Nephin	d4bb9fd97a	ca: move provider creation into CAManager This further decouples the CAManager from Server. It reduces the interface between them and removes the need for the SetLogger method on providers.	2021-07-12 09:32:33 -04:00
Daniel Nephin	3c60a46376	config: remove duplicate TLSConfig fields from agent/consul.Config tlsutil.Config already presents an excellent structure for this configuration. Copying the runtime config fields to agent/consul.Config makes code harder to trace, and provides no advantage. Instead of copying the fields around, use the tlsutil.Config struct directly instead. This is one small step in removing the many layers of duplicate configuration.	2021-07-09 18:49:42 -04:00
Daniel Nephin	b4a10443d1	ca: remove unused RotationPeriod field This field was never used. Since it is persisted as part of a map[string]interface{} it is pretty easy to remove it.	2021-07-05 19:15:44 -04:00
Paul Banks	d47eea3a3f	Make Raft trailing logs and snapshot timing reloadable (#10129 ) * WIP reloadable raft config * Pre-define new raft gauges * Update go-metrics to change gauge reset behaviour * Update raft to pull in new metric and reloadable config * Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important * Update telemetry docs * Update config and telemetry docs * Add note to oldestLogAge on when it is visible * Add changelog entry * Update website/content/docs/agent/options.mdx Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>	2021-05-04 15:36:53 +01:00
Hans Hasselberg	052662bcf9	introduce certopts (#9606 ) * introduce cert opts * it should be using the same signer * lint and omit serial	2021-03-22 10:16:41 +01:00
Hans Hasselberg	623aab5880	Add flags to support CA generation for Connect (#9585 )	2021-01-27 08:52:15 +01:00
Daniel Nephin	e8427a48ab	agent/consuk: Rename RPCRate -> RPCRateLimit so that the field name is consistent across config structs.	2021-01-14 17:26:00 -05:00
Daniel Nephin	e5320c2db6	agent/consul: make Client/Server config reloading more obvious I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value. Also improve the test runtime by not doing a lot of unnecessary work.	2021-01-14 17:21:10 -05:00
Daniel Nephin	ef0999547a	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
Kyle Havlovitz	91d5d6c586	Merge pull request #9009 from hashicorp/update-secondary-ca connect: Fix an issue with updating CA config in a secondary datacenter	2020-11-30 14:49:28 -08:00
Matt Keeler	4bca029be9	Refactor to call non-voting servers read replicas (#9191 ) Co-authored-by: Kit Patella <kit@jepsen.io>	2020-11-17 10:53:57 -05:00
Kyle Havlovitz	5de81c1375	connect: Add CAManager for synchronizing CA operations	2020-11-13 14:33:44 -08:00
Daniel Nephin	c621b4a420	agent/consul: pass dependencies directly from agent In an upcoming change we will need to pass a grpc.ClientConnPool from BaseDeps into Server. While looking at that change I noticed all of the existing consulOption fields are already on BaseDeps. Instead of duplicating the fields, we can create a struct used by agent/consul, and use that struct in BaseDeps. This allows us to pass along dependencies without translating them into different representations. I also looked at moving all of BaseDeps in agent/consul, however that created some circular imports. Resolving those cycles wouldn't be too bad (it was only an error in agent/consul being imported from cache-types), however this change seems a little better by starting to introduce some structure to BaseDeps. This change is also a small step in reducing the scope of Agent. Also remove some constants that were only used by tests, and move the relevant comment to where the live configuration is set. Removed some validation from NewServer and NewClient, as these are not really runtime errors. They would be code errors, which will cause a panic anyway, so no reason to handle them specially here.	2020-09-15 17:29:32 -04:00
Daniel Nephin	0536b2047e	agent/consul: make router required	2020-09-15 17:26:26 -04:00
Daniel Nephin	8d35e37b3c	testing: Remove all the defer os.Removeall Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage the removal	2020-08-14 19:58:53 -04:00
Daniel Nephin	fc797a279a	Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown agent/consul: Remove NotifyShutdown	2020-08-13 11:16:48 -04:00
Hans Hasselberg	e0297b6e99	Refactor keyring ops: * changes some functions to return data instead of modifying pointer arguments * renames globalRPC() to keyringRPCs() to make its purpose more clear * restructures KeyringOperation() to make it more understandable	2020-08-11 13:42:03 +02:00
Daniel Nephin	bef9348ca8	testing: remove unnecessary defers in tests The data directory is now removed by the test helper that created it.	2020-08-07 17:28:16 -04:00
Daniel Nephin	f3b63514d5	testing: Remove NotifyShutdown NotifyShutdown was only used for testing. Now that t.Cleanup exists, we can use that instead of attaching cleanup to the Server shutdown. The Autopilot test which used NotifyShutdown doesn't need this notification because Shutdown is synchronous. Waiting for the function to return is equivalent.	2020-08-07 17:14:44 -04:00
Daniel Nephin	e6c94c1411	Remove LogOutput from Server	2020-08-05 14:00:44 -04:00
Daniel Nephin	80ff174880	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Matt Keeler	386ec3a2a2	Refactor AutoConfig RPC to not have a direct dependency on the Server type Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.	2020-07-08 11:05:44 -04:00
Matt Keeler	eda8cb39fd	Implement the insecure version of the Cluster.AutoConfig RPC endpoint Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.	2020-06-17 11:25:29 -04:00
Matt Keeler	cdc4b20afa	ACL Node Identities (#7970 ) A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy). Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.	2020-06-16 12:54:27 -04:00
Hans Hasselberg	dd8cd9bc24	Merge pull request #7966 from hashicorp/pool_improvements Agent connection pool cleanup	2020-06-04 08:56:26 +02:00
R.B. Boyer	16db20b1f3	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
Hans Hasselberg	9ef44ec3da	pool: remove version The version field has been used to decide which multiplexing to use. It was introduced in 2457293dceec95ecd12ef4f01442e13710ea131a. But this is 6y ago and there is no need for this differentiation anymore.	2020-05-28 23:06:01 +02:00
Pierre Souchay	3b548f0d77	Allow to restrict servers that can join a given Serf Consul cluster. (#7628 ) Based on work done in https://github.com/hashicorp/memberlist/pull/196 this allows to restrict the IP ranges that can join a given Serf cluster and be a member of the cluster. Restrictions on IPs can be done separatly using 2 new differents flags and config options to restrict IPs for LAN and WAN Serf.	2020-05-20 11:31:19 +02:00
Hans Hasselberg	e3e2b82a00	network_segments: stop advertising segment tags	2020-05-05 21:32:05 +02:00
Hans Hasselberg	6626cb69d6	rpc: oss changes for network area connection pooling (#7735 )	2020-04-30 22:12:17 +02:00
Daniel Nephin	ebb851f32d	agent: Remove unused Encrypted from interface It appears to be unused. It looks like it has been around a while, I geuss at some point we stopped using this method.	2020-03-26 12:34:31 -04:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Hans Hasselberg	71ce832990	connect: add validations around intermediate cert ttl (#7213 )	2020-02-11 00:05:49 +01:00
Matt Keeler	966d085066	Catalog + Namespace OSS changes. (#7219 ) * Various Prepared Query + Namespace things * Last round of OSS changes for a namespaced catalog	2020-02-10 10:40:44 -05:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	485a0a65ea	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
Hans Hasselberg	315ba7d6ad	connect: check if intermediate cert needs to be renewed. (#6835 ) Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates. This PR adds a check that renews the cert if it is half way through its validity period. In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.	2020-01-17 23:27:13 +01:00
Matt Keeler	baa89c7c65	Intentions ACL enforcement updates (#7028 ) * Renamed structs.IntentionWildcard to structs.WildcardSpecifier * Refactor ACL Config Get rid of remnants of enterprise only renaming. Add a WildcardName field for specifying what string should be used to indicate a wildcard. * Add wildcard support in the ACL package For read operations they can call anyAllowed to determine if any read access to the given resource would be granted. For write operations they can call allAllowed to ensure that write access is granted to everything. * Make v1/agent/connect/authorize namespace aware * Update intention ACL enforcement This also changes how intention:read is granted. Before the Intention.List RPC would allow viewing an intention if the token had intention:read on the destination. However Intention.Match allowed viewing if access was allowed for either the source or dest side. Now Intention.List and Intention.Get fall in line with Intention.Matches previous behavior. Due to this being done a few different places ACL enforcement for a singular intention is now done with the CanRead and CanWrite methods on the intention itself. * Refactor Intention.Apply to make things easier to follow.	2020-01-13 15:51:40 -05:00
Matt Keeler	185654b075	Unflake the TestACLEndpoint_TokenList test In order to do this I added a waitForLeaderEstablishment helper which does the right thing to ensure that leader establishment has finished. fixup	2019-12-18 14:07:07 -05:00
Todd Radel	19a3892f71	connect: Implement NeedsLogger interface for CA providers (#6556 ) * add NeedsLogger to Provider interface * implements NeedsLogger in default provider * pass logger through to provider * test for proper operation of NeedsLogger * remove public testServer function * Switch test to actually assert on logging output rather than reflection. --amend * Ooops actually set the logger in all the places we need it - CA config set wasn't and causing segfault * Fix all the other places in tests where we set the logger * Add TODO comment	2019-11-11 20:30:01 +00:00
R.B. Boyer	e74a6c44f1	server: ensure the primary dc and ACL dc match (#6634 ) This is mostly a sanity check for server tests that skip the normal config builder equivalent fixup.	2019-10-17 10:57:17 -05:00
R.B. Boyer	5c5f21088c	sdk: add freelist tracking and ephemeral port range skipping to freeport This should cut down on test flakiness. Problems handled: - If you had enough parallel test cases running, the former circular approach to handling the port block could hand out the same port to multiple cases before they each had a chance to bind them, leading to one of the two tests to fail. - The freeport library would allocate out of the ephemeral port range. This has been corrected for Linux (which should cover CI). - The library now waits until a formerly-in-use port is verified to be free before putting it back into circulation.	2019-09-17 14:30:43 -05:00
Freddy	1b97d65873	Make new config when retrying testServer creation (#6204 )	2019-07-24 08:41:00 -06:00
Freddy	c19f46639b	Restore NotifyListen to avoid panic in newServer retry (#6200 )	2019-07-23 14:33:00 -06:00
Christian Muehlhaeuser	2602f6907e	Simplified code in various places (#6176 ) All these changes should have no side-effects or change behavior: - Use bytes.Buffer's String() instead of a conversion - Use time.Since and time.Until where fitting - Drop unnecessary returns and assignment	2019-07-20 09:37:19 -04:00
Freddy	f59e6db9b1	Reduce number of servers in TestServer_Expect_NonVoters (#6155 )	2019-07-17 11:35:33 -06:00
Freddy	a295d9e5db	Flaky test overhaul (#6100 )	2019-07-12 09:52:26 -06:00
Hans Hasselberg	0d8d7ae052	agent: transfer leadership when establishLeadership fails (#5247 )	2019-06-19 14:50:48 +02:00
Paul Banks	e90fab0aec	Add rate limiting to RPCs sent within a server instance too (#5927 )	2019-06-13 04:26:27 -05:00
Freddy	f7f0207f78	Run TestServer_Expect on its own (#5890 )	2019-05-23 19:52:33 -04:00
Kyle Havlovitz	ad24456f49	Set the dead node reclaim timer at 30s	2019-05-15 11:59:33 -07:00
Kyle Havlovitz	64174f13d6	Add HTTP endpoints for config entry management (#5718 )	2019-04-29 18:08:09 -04:00
Matt Keeler	3ea9fe3bff	Implement bootstrapping proxy defaults from the config file (#5714 )	2019-04-26 14:25:03 -04:00
Alvin Huang	96c2c79908	Add fmt and vet (#5671 ) * add go fmt and vet * go fmt fixes	2019-04-25 12:26:33 -04:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	a41c865059	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Hans Hasselberg	d511e86491	agent: enable reloading of tls config (#5419 ) This PR introduces reloading tls configuration. Consul will now be able to reload the TLS configuration which previously required a restart. It is not yet possible to turn TLS ON or OFF with these changes. Only when TLS is already turned on, the configuration can be reloaded. Most importantly the certificates and CAs.	2019-03-13 10:29:06 +01:00
R.B. Boyer	c24e3584be	improve flaky LANReap tests by expliciting configuring the tombstone timeout In TestServer_LANReap autopilot is running, so the alternate flow through the serf reaping function is possible. In that situation the ReconnectTimeout is not relevant so for parity also override the TombstoneTimeout value as well. For additional parity update the TestServer_WANReap and TestClient_LANReap versions of this test in the same way even though autopilot is irrelevant here .	2019-03-05 14:34:03 -06:00
Matt Keeler	87f9365eee	Fixes for CVE-2019-8336 Fix error in detecting raft replication errors. Detect redacted token secrets and prevent attempting to insert. Add a Redacted field to the TokenBatchRead and TokenRead RPC endpoints This will indicate whether token secrets have been redacted. Ensure any token with a redacted secret in secondary datacenters is removed. Test that redacted tokens cannot be replicated.	2019-03-04 19:13:24 +00:00
Matt Keeler	416a6543a6	Call RemoveServer for reap events (#5317 ) This ensures that servers are removed from RPC routing when they are reaped.	2019-03-04 09:19:35 -05:00
Hans Hasselberg	75ababb54f	Centralise tls configuration part 1 (#5366 ) In order to be able to reload the TLS configuration, we need one way to generate the different configurations. This PR introduces a `tlsutil.Configurator` which holds a `tlsutil.Config`. Afterwards it is responsible for rendering every `tls.Config`. In this particular PR I moved `IncomingHTTPSConfig`, `IncomingTLSConfig`, and `OutgoingTLSWrapper` into `tlsutil.Configurator`. This PR is a pure refactoring - not a single feature added. And not a single test added. I only slightly modified existing tests as necessary.	2019-02-26 16:52:07 +01:00
R.B. Boyer	b5d71ea779	testutil: redirect some test agent logs to testing.T.Logf (#5304 ) When tests fail, only the logs for the failing run are dumped to the console which helps in diagnosis. This is easily added to other test scenarios as they come up.	2019-02-01 09:21:54 -06:00
Pierre Souchay	d0ca1bade9	Fixed another list of unstable unit tests in travis (#4915 ) * Fixed another list of unstable unit tests in travis Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061 * Fixed another list of unstable unit tests in travis. Fixed failing tests in https://travis-ci.org/hashicorp/consul/jobs/451357061	2018-11-20 11:27:26 +00:00
Kyle Havlovitz	19f9cad3fe	oss: bump test server version to 1.4.0	2018-11-13 13:13:26 -08:00
Kyle Havlovitz	038aefa0bc	update non-voting server test to fix enterprise diff	2018-11-09 12:50:24 -08:00
Matt Keeler	0dd537e506	Fix the NonVoter Bootstrap test (#4786 )	2018-10-24 10:23:50 -04:00
Alex Dadgar	90ed72fd70	do not bootstrap with non voters	2018-09-19 17:41:36 -07:00
Paul Banks	09e4c2995b	Fix CA pruning when CA config uses string durations. (#4669 ) * Fix CA pruning when CA config uses string durations. The tl;dr here is: - Configuring LeafCertTTL with a string like "72h" is how we do it by default and should be supported - Most of our tests managed to escape this by defining them as time.Duration directly - Out actual default value is a string - Since this is stored in a map[string]interface{} config, when it is written to Raft it goes through a msgpack encode/decode cycle (even though it's written from server not over RPC). - msgpack decode leaves the string as a `[]uint8` - Some of our parsers required string and failed - So after 1 hour, a default configured server would throw an error about pruning old CAs - If a new CA was configured that set LeafCertTTL as a time.Duration, things might be OK after that, but if a new CA was just configured from config file, intialization would cause same issue but always fail still so would never prune the old CA. - Mostly this is just a janky error that got passed tests due to many levels of complicated encoding/decoding. tl;dr of the tl;dr: Yay for type safety. Map[string]interface{} combined with msgpack always goes wrong but we somehow get bitten every time in a new way :D We already fixed this once! The main CA config had the same problem so @kyhavlov already wrote the mapstructure DecodeHook that fixes it. It wasn't used in several places it needed to be and one of those is notw in `structs` which caused a dependency cycle so I've moved them. This adds a whole new test thta explicitly tests the case that broke here. It also adds tests that would have failed in other places before (Consul and Vaul provider parsing functions). I'm not sure if they would ever be affected as it is now as we've not seen things broken with them but it seems better to explicitly test that and support it to not be bitten a third time! * Typo fix * Fix bad Uint8 usage	2018-09-13 15:43:00 +01:00
Paul Banks	dbcf286d4c	Ooops remove the CA stuff from actual server defaults and make it test server only	2018-06-14 09:42:16 -07:00
Paul Banks	834ed1d25f	Fixed many tests after rebase. Some still failing and seem unrelated to any connect changes.	2018-06-14 09:42:16 -07:00
Kyle Havlovitz	19b9399f2f	Add more tests for built-in provider	2018-06-14 09:42:06 -07:00
Josh Soref	1dd8c378b9	Spelling (#3958 ) * spelling: another * spelling: autopilot * spelling: beginning * spelling: circonus * spelling: default * spelling: definition * spelling: distance * spelling: encountered * spelling: enterprise * spelling: expands * spelling: exits * spelling: formatting * spelling: health * spelling: hierarchy * spelling: imposed * spelling: independence * spelling: inspect * spelling: last * spelling: latest * spelling: client * spelling: message * spelling: minimum * spelling: notify * spelling: nonexistent * spelling: operator * spelling: payload * spelling: preceded * spelling: prepared * spelling: programmatically * spelling: required * spelling: reconcile * spelling: responses * spelling: request * spelling: response * spelling: results * spelling: retrieve * spelling: service * spelling: significantly * spelling: specifies * spelling: supported * spelling: synchronization * spelling: synchronous * spelling: themselves * spelling: unexpected * spelling: validations * spelling: value	2018-03-19 16:56:00 +00:00
Devin Canterberry	881d20c606	🐛 Formatting changes only; add missing trailing commas	2018-03-15 10:19:46 -07:00
Preetha Appan	77d35f1829	Remove extra newline	2018-02-21 13:21:47 -06:00
Preetha Appan	573500dc51	Unit test that calls revokeLeadership twice to make sure its idempotent	2018-02-21 12:48:53 -06:00
James Phillips	b94ba8aeb4	Removes bogus getPort() in favor of freeport.	2017-11-08 19:55:50 -08:00
James Phillips	c6e0366c02	Relaxes Autopilot promotion logic. (#3623 ) * Relaxes Autopilot promotion logic. When we defaulted the Raft protocol version to 3 in #3477 we made the numPeers() routine more strict to only count voters (this is more conservative and more correct). This had the side effect of breaking rolling updates because it's at odds with the Autopilot non-voter promotion logic. That logic used to wait to only promote to maintain an odd quorum of servers. During a rolling update (add one new server, wait, and then kill an old server) the dead server cleanup would still count the old server as a peer, which is conservative and the right thing to do, and no longer count the non-voter. This would wait to promote, so you could get into a stalemate. It is safer to promote early than remove early, so by promoting as soon as possible we have chosen that as the solution here. Fixes #3611 * Gets rid of unnecessary extra not-a-voter check.	2017-10-31 15:16:56 -05:00
Frank Schroeder	74859ff3c0	test: replace porter tool with freeport lib This patch removes the porter tool which hands out free ports from a given range with a library which does the same thing. The challenge for acquiring free ports in concurrent go test runs is that go packages are tested concurrently and run in separate processes. There has to be some inter-process synchronization in preventing processes allocating the same ports. freeport allocates blocks of ports from a range expected to be not in heavy use and implements a system-wide mutex by binding to the first port of that block for the lifetime of the application. Ports are then provided sequentially from that block and are tested on localhost before being returned as available.	2017-10-21 22:01:09 +02:00
James Phillips	d1ad538345	Makes RPC handling more robust when rolling servers. (#3561 ) * Adds client-side retry for no leader errors. This paves over the case where the client was connected to the leader when it loses leadership. * Adds a configurable server RPC drain time and a fail-fast path for RPCs. When a server leaves it gets removed from the Raft configuration, so it will never know who the new leader server ends up being. Without this we'd be doomed to wait out the RPC hold timeout and then fail. This makes things fail a little quicker while a sever is draining, and since we added a client retry AND since the server doing this has already shut down and left the Serf LAN, clients should retry against some other server. * Makes the RPC hold timeout configurable. * Reorders struct members. * Sets the RPC hold timeout default for test servers. * Bumps the leave drain time up to 5 seconds. * Robustifies retries with a simpler client-side RPC hold. * Reverts untended delete.	2017-10-10 15:19:50 -07:00
James Phillips	fcaa889116	Bumps default Raft protocol to version 3. (#3477 ) * Changes default Raft protocol to 3. * Changes numPeers() to report only voters. This should have been there before, but it's more obvious that this is incorrect now that we default the Raft protocol to 3, which puts new servers in a read-only state while Autopilot waits for them to become healthy. * Fixes TestLeader_RollRaftServer. * Fixes TestOperator_RaftRemovePeerByAddress. * Fixes TestServer_. Relaxed the check for a given number of voter peers and instead do a thorough check that all servers see each other in their Raft configurations. Fixes TestACL_. These now just check for Raft replication to be set up, and don't care about the number of voter peers. Fixes TestOperator_Raft_ListPeers. * Fixes TestAutopilot_CleanupDeadServerPeriodic. * Fixes TestCatalog_ListNodes_ConsistentRead_Fail. * Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away sockets when it sees io.EOF. * Changes version to 1.0.0 in the options doc. * Makes metrics test more deterministic with autopilot metrics possible.	2017-09-25 15:27:04 -07:00
Frank Schröder	69a088ca85	New config parser, HCL support, multiple bind addrs (#3480 ) * new config parser for agent This patch implements a new config parser for the consul agent which makes the following changes to the previous implementation: * add HCL support * all configuration fragments in tests and for default config are expressed as HCL fragments * HCL fragments can be provided on the command line so that they can eventually replace the command line flags. * HCL/JSON fragments are parsed into a temporary Config structure which can be merged using reflection (all values are pointers). The existing merge logic of overwrite for values and append for slices has been preserved. * A single builder process generates a typed runtime configuration for the agent. The new implementation is more strict and fails in the builder process if no valid runtime configuration can be generated. Therefore, additional validations in other parts of the code should be removed. The builder also pre-computes all required network addresses so that no address/port magic should be required where the configuration is used and should therefore be removed. * Upgrade github.com/hashicorp/hcl to support int64 * improve error messages * fix directory permission test * Fix rtt test * Fix ForceLeave test * Skip performance test for now until we know what to do * Update github.com/hashicorp/memberlist to update log prefix * Make memberlist use the default logger * improve config error handling * do not fail on non-existing data-dir * experiment with non-uniform timeouts to get a handle on stalled leader elections * Run tests for packages separately to eliminate the spurious port conflicts * refactor private address detection and unify approach for ipv4 and ipv6. Fixes #2825 * do not allow unix sockets for DNS * improve bind and advertise addr error handling * go through builder using test coverage * minimal update to the docs * more coverage tests fixed * more tests * fix makefile * cleanup * fix port conflicts with external port server 'porter' * stop test server on error * do not run api test that change global ENV concurrently with the other tests * Run remaining api tests concurrently * no need for retry with the port number service * monkey patch race condition in go-sockaddr until we understand why that fails * monkey patch hcl decoder race condidtion until we understand why that fails * monkey patch spurious errors in strings.EqualFold from here * add test for hcl decoder race condition. Run with go test -parallel 128 * Increase timeout again * cleanup * don't log port allocations by default * use base command arg parsing to format help output properly * handle -dc deprecation case in Build * switch autopilot.max_trailing_logs to int * remove duplicate test case * remove unused methods * remove comments about flag/config value inconsistencies * switch got and want around since the error message was misleading. * Removes a stray debug log. * Removes a stray newline in imports. * Fixes TestACL_Version8. * Runs go fmt. * Adds a default case for unknown address types. * Reoders and reformats some imports. * Adds some comments and fixes typos. * Reorders imports. * add unix socket support for dns later * drop all deprecated flags and arguments * fix wrong field name * remove stray node-id file * drop unnecessary patch section in test * drop duplicate test * add test for LeaveOnTerm and SkipLeaveOnInt in client mode * drop "bla" and add clarifying comment for the test * split up tests to support enterprise/non-enterprise tests * drop raft multiplier and derive values during build phase * sanitize runtime config reflectively and add test * detect invalid config fields * fix tests with invalid config fields * use different values for wan sanitiziation test * drop recursor in favor of recursors * allow dns_config.udp_answer_limit to be zero * make sure tests run on machines with multiple ips * Fix failing tests in a few more places by providing a bind address in the test * Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder. * Add porter to server_test.go to make tests there less flaky * go fmt	2017-09-25 11:40:42 -07:00
Preetha Appan	e944370cde	More cleanup from code review	2017-08-30 12:31:36 -05:00
Preetha Appan	5a29eb7486	Consolidate server lookup into one place and replace usages of localConsuls.	2017-08-30 09:30:33 -05:00
Frank Schroeder	c38dcf2d17	agent: move agent/consul/agent to agent/metadata	2017-08-09 14:36:52 +02:00
James Phillips	803ed9a245	Adds secure introduction for the ACL replication token. (#3357 ) Adds secure introduction for the ACL replication token, as well as a separate enable config for ACL replication.	2017-08-03 15:39:31 -07:00
James Phillips	7b54e325df	Adds a comment about flood joining.	2017-07-07 09:22:34 +02:00
Frank Schroeder	74d3c4d896	rpc: make TestServer_JoinSeparateLanAndWanAddresses more robust	2017-07-07 09:22:34 +02:00
Frank Schroeder	2eb2941e8c	rpc: fix logging and try quicker timing of TestServer_JoinSeparateLanAndWanAddresses	2017-07-07 09:22:34 +02:00
Frank Schroeder	98510f898c	rpc: less agressive raft timeouts Allowing more time for raft to consolidate should drop the number of leader elections.	2017-07-07 09:22:34 +02:00
Frank Schroeder	50c81a9397	rpc: run agent/consul tests in parallel	2017-07-07 09:22:34 +02:00
Frank Schroeder	06ad8e96be	rpc: fix TestServer_Leave wait for the leader election.	2017-07-07 09:22:34 +02:00
Frank Schroeder	e3252f921a	rpc: fix for 'no leader' in TLS tests Ensure both servers know about each other before looking for a leader.	2017-07-07 09:22:34 +02:00
Frank Schroeder	2497b8416b	rpc: fix TestServer_JoinWAN_Flood The second server in the first data center should not be in bootstrap mode.	2017-07-07 09:22:34 +02:00
Frank Schroeder	7af30dd7d7	rpc: provide unique node names for server and client	2017-07-07 09:22:34 +02:00
Frank Schroeder	457910b191	rpc: prefix log output with test name	2017-07-07 09:22:34 +02:00

1 2 3 4

155 Commits