open-consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	e456945466	sdk/retry: a few small debug improvements On a few occasions I've had to read timeout stack traces for tests and noticed that retry.Run runs the function in a goroutine. This makes debuging a timeout more difficult because the gourinte of the retryable function is disconnected from the stack of the actual test. It requires searching through the entire stack trace to find the other goroutine. By using panic instead of runtime.Goexit() we remove the need for a separate goroutine. Also a few other small improvements: * add `R.Helper` so that an assertion function can be used with both testing.T and retry.R. * Pass t to `Retryer.NextOr`, and call `t.Helper` in a number of places so that the line number reported by `t.Log` is the line in the test where `retry.Run` was called, instead of some line in `retry.go` that is not relevant to the failure. * improve the implementation of `dedup` by removing the need to iterate twice. Instad track the lines and skip any duplicate when writing to the buffer.	2021-04-21 17:10:46 -04:00
Iryna Shustava	ff2e70f4ce	cli: Add new `consul connect redirect-traffic` command for applying traffic redirection rules when Transparent Proxy is enabled. (#9910 ) * Add new consul connect redirect-traffic command for applying traffic redirection rules when Transparent Proxy is enabled. * Add new iptables package for applying traffic redirection rules with iptables.	2021-04-09 11:48:10 -07:00
John Eikenberry	fe938b9849	[SDK] change all cases of *testing.T to testing.TB Using the interface opens up the use of all methods to benchmarks as well as tests.	2021-03-16 15:05:39 -07:00
Shengjing Zhu	403e1e2534	sdk: read procfs instead of using sysctl command This drops the dependency of sysctl command	2020-10-25 00:37:04 +08:00
R.B. Boyer	d6dce2332a	connect: intentions are now managed as a new config entry kind "service-intentions" (#8834 ) - Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older copies of consul will not see new config entries it doesn't understand replicate down. - Add shim conversion code so that the old API/CLI method of interacting with intentions will continue to work so long as none of these are edited via config entry endpoints. Almost all of the read-only APIs will continue to function indefinitely. - Add new APIs that operate on individual intentions without IDs so that the UI doesn't need to implement CAS operations. - Add a new serf feature flag indicating support for intentions-as-config-entries. - The old line-item intentions way of interacting with the state store will transparently flip between the legacy memdb table and the config entry representations so that readers will never see a hiccup during migration where the results are incomplete. It uses a piece of system metadata to control the flip. - The primary datacenter will begin migrating intentions into config entries on startup once all servers in the datacenter are on a version of Consul with the intentions-as-config-entries feature flag. When it is complete the old state store representations will be cleared. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up. - The secondary datacenters continue to run the old intentions replicator until all servers in the secondary DC and primary DC support intentions-as-config-entries (via serf flag). Once this condition it met the old intentions replicator ceases. - The secondary datacenters replicate the new config entries as they are migrated in the primary. When they detect that the primary has zeroed it's old state store table it waits until all config entries up to that point are replicated and then zeroes its own copy of the old state store table. We also record a piece of system metadata indicating this has occurred. We use this metadata to skip ALL of this code the next time the leader starts up.	2020-10-06 13:24:05 -05:00
Paul Burlumi	c28db3960e	testutil: use TestingTB instead of testing.TB	2020-09-10 21:37:02 +01:00
R.B. Boyer	a14a31ccf1	sdk: also print test agent logs in verbose mode (#8616 )	2020-09-04 11:24:11 -05:00
Daniel Nephin	b204e342c5	Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1 testing: small improvements to TestSessionCreate and testutil.retry	2020-08-18 18:26:05 -04:00
Daniel Nephin	3e0d63a6b7	testing: use t.Cleanup in testutil.TempFile So that it has the same behaviour as TempDir. Also remove the now unnecessary 'defer os.Remove'	2020-08-14 20:06:01 -04:00
Daniel Nephin	5d53e7dbde	Use t.Helper in testutil/retry	2020-08-14 18:55:52 -04:00
Daniel Nephin	2725513eea	testutil: Add t.Cleanup to TempDir TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP env var.	2020-08-14 13:19:10 -04:00
Daniel Nephin	e38d271bd1	Use SIGABRT to get a stack trace when the timeout is hit	2020-08-11 12:12:55 -04:00
Daniel Nephin	71e51263be	sdk: mitigate api test timeout Occasionally we are seeing the go-test-api job timeout at 10 minutes. Looking at the stack trace I saw the following: 1. Lots of tests blocked on server.Stop in NewTestServerConfigT. This suggests that SIGINT is being sent to the server, but the server is not properly shutting down. 2. Over 20k goroutines that look like this: goroutine 16355 [select, 8 minutes]: net/http.(persistConn).readLoop(0xc004270240) /usr/local/go/src/net/http/transport.go:2099 +0x99e created by net/http.(Transport).dialConn /usr/local/go/src/net/http/transport.go:1647 +0xc56 Issue 1 seems to be the main problem, but debugging that directly is not possible because our buffered logs do not get sent when the tests timeout. To mitigate this problem I've added a timeout to the cmd.Wait() to force kill the process and return an error. Unfortunately because we retry this operation, we still may not see the cause because the next attempt will likely pass. I'm tempted to remove the retry around NewTestServerConfigT. Issue 2 seems to be caused by not closing the response body. Since the request is performed many times in a loop, many goroutines are created and are not closed until the response body is closed.	2020-08-06 17:00:20 -04:00
Daniel Nephin	80ff174880	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Iryna Shustava	5eb8ee0cac	sdk: Use /v1/status/leader endpoint when starting a test server (#8192 ) Switch from /v1/agent/self to /v1/status/leader when checking if the test server has come up successfully in the waitForAPI function. Previously, the test server was relying (probably not intentionally) on the default value of the acl_enforce_version_8 in the TestConfig, which was false. So if you create a test server and enabled ACLs, they would not be enforced and the server would be able to come up pretty quickly because /v1/agent/self would return a 200 status pretty much as soon as the agent is running and most likely before leader election is finished. Now that we have removed acl_enforce_version_8 property (equivalent to being true by default) if you've created a test server with ACLs enabled, it will need to wait for leader election and for ACLs to be initialized before it'll get a successful response from the /v1/agent/self. Note: With this change, waitForAPI function no longer requires a 200 response status from the v1/status/leader endpoint. This is because in some tests, namely TestAPI_AgentLeave, we are only running clients, and this endpoint returns a 500 status.	2020-07-07 14:25:17 -07:00
R.B. Boyer	16db20b1f3	acl: remove the deprecated `acl_enforce_version_8` option (#7991 ) Fixes #7292	2020-05-29 16:16:03 -05:00
R.B. Boyer	9faf8c42d1	sdk: extracting testutil.RequireErrorContains from various places it was duplicated (#7753 )	2020-05-01 11:56:34 -05:00
Pierre Souchay	fa1f9eb144	[BUGFIX] Fix race condition in freeport (#7567 ) This removes a race condition in reset since pendingPorts can be set to nil in reset() If ticker is hit at wrong time, it would crash the unit test. We ensure in reset to avoid this race condition by cancelling the goroutine using killTicker chan. We also properly clean up eveything, so garbage collector can work as expected. To reproduce existing bug: `while go test -timeout 30s github.com/hashicorp/consul/sdk/freeport -run '^(Test.)$'; do go clean -testcache; done` Will crash after a few 10s runs on my machine. Error could be seen in unit tests sometimes: [INFO] freeport: resetting the freeport package state panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1125536] goroutine 25 [running]: container/list.(List).Len(...) /usr/local/Cellar/go/1.14/libexec/src/container/list/list.go:66 github.com/hashicorp/consul/sdk/freeport.checkFreedPortsOnce() /Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:157 +0x86 github.com/hashicorp/consul/sdk/freeport.checkFreedPorts() /Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:147 +0x71 created by github.com/hashicorp/consul/sdk/freeport.initialize /Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:113 +0x2cf FAIL github.com/hashicorp/consul/sdk/freeport 1.607s	2020-04-01 13:14:33 -05:00
Pierre Souchay	6cb2bccca6	Proper detection of ephemeral ports on Mac OS (#7539 ) Use systemctl to properly detect ephemeral ports on Mac OS (aka darwin) by fetching systemctl values: * net.inet.ip.portrange.first * net.inet.ip.portrange.last This will avoid the message: `[INFO] freeport: ephemeral port range detection not configured for GOOS="darwin"` and properly detect the correct port range	2020-03-30 09:13:03 -04:00
Daniel Nephin	4d485649d1	ci: Use golangci-lint for linting Using golangci-lint has a number of advantages: - adding new linters becomes much easier, its a couple lines of yaml config instead of more bash scripting - it enables whitelisting of issues using inline comments or regex - when running multiple linters less work is done. The parsed source can be reused by multiple linters - linters are run in parallel to reduce CI runtime.	2020-03-17 13:43:40 -04:00
Matt Keeler	111cb51fc8	Testing updates to support namespaced testing of the agent/xds… (#7185 ) * Various testing updates to support namespaced testing of the agent/xds package * agent/proxycfg package updates to support better namespace testing	2020-02-03 09:26:47 -05:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	01d647f16a	Unflake the TestAPI_AgentConnectCALeaf test (#7142 ) * Unflake the TestAPI_AgentConnectCALeaf test * Modify the WaitForActiveCARoot to actually verify that at least one root exists Also verify that the active root id field is set	2020-01-27 14:34:04 -05:00
Chris Piraino	61b92b92bd	Fix up formatting in sdk package (#7109 )	2020-01-22 12:45:34 -06:00
Song Yihan	88dc6121d8	tests: fix zombie consul process while invoking TestServer.Stop() method in sdk/testutil in Windows (#6032 ) * Fix zombie consul process in Windows Windows doesn't support Interrupt signal, thus while stop it on Windows platform it would fail and left zombie consul process	2020-01-22 17:34:35 +01:00
R.B. Boyer	6d90d873a4	fix the submodule go.mod and go.sum files (#7098 )	2020-01-21 14:49:26 -06:00
Paul Banks	1e88978aae	Fix TestAPI_DiscoveryChain_Get flake (#7082 )	2020-01-20 14:56:56 +00:00
Hans Hasselberg	315ba7d6ad	connect: check if intermediate cert needs to be renewed. (#6835 ) Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates. This PR adds a check that renews the cert if it is half way through its validity period. In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.	2020-01-17 23:27:13 +01:00
John Eikenberry	829c330a64	change sysctl call to use absolute path Fixes: #6804	2019-11-15 14:28:13 -08:00
Mike Morris	ac360aff98	sdk: add NewTestServerT, deprecate NewTestServer (#6761 ) * prevent nil pointer deref from t.Logf usage * enforce non-nil *testing.T in NewTestServerT	2019-11-08 17:51:49 -05:00
R.B. Boyer	0425acf58c	sdk: ignore panics due to stray goroutines logging after a test completes (#6632 ) If there is imperfect goroutine lifespan tracking if we pipe our logs through testing.T.Logf there is a chance of a stray goroutine attempting to log after the test that spawned it completes. This results in a panic of: panic: Log in goroutine after TestLeader_SecondaryCA_Initialize has completed... This isn't great and should be fixed, but quickly runs into situations around externally cancelling blocking queries which isn't terribly possible at the moment. The concession here is to ignore these specific panics for now. This can be triggered easily when running some tests with a high `-count=YYY` value.	2019-10-17 11:01:11 -05:00
Matt Keeler	2040e92e4d	Nil checks in the testWriter to prevent using a bad testing.TB (#6517 ) Also needed to update some funcs that were taking a *testing.T to use a testing.TB. This prevents passing a nil pointer as a non-nil interface value and thus making it impossible to detect nil before using the interfaces functions.	2019-09-20 17:01:08 -04:00
Matt Keeler	5e460b335c	Dont crash in the testutil server creation (#6516 )	2019-09-20 15:52:02 -04:00
R.B. Boyer	4a1a7d6fa6	api/watch: reduce timing dependence on tests of watch behavior Also for debugging purposes send the stdout/stderr streams from consul processes spawned for API tests to testing.T.Logf	2019-09-19 09:20:53 -05:00
R.B. Boyer	5c5f21088c	sdk: add freelist tracking and ephemeral port range skipping to freeport This should cut down on test flakiness. Problems handled: - If you had enough parallel test cases running, the former circular approach to handling the port block could hand out the same port to multiple cases before they each had a chance to bind them, leading to one of the two tests to fail. - The freeport library would allocate out of the ephemeral port range. This has been corrected for Linux (which should cover CI). - The library now waits until a formerly-in-use port is verified to be free before putting it back into circulation.	2019-09-17 14:30:43 -05:00
Mike Morris	88df658243	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
Alvin Huang	5b6fa58453	resolve circleci config conflicts	2019-07-23 20:18:36 -04:00
Christian Muehlhaeuser	2602f6907e	Simplified code in various places (#6176 ) All these changes should have no side-effects or change behavior: - Use bytes.Buffer's String() instead of a conversion - Use time.Since and time.Until where fitting - Drop unnecessary returns and assignment	2019-07-20 09:37:19 -04:00
Jack Pearkes	fa15914813	Merge branch 'master' into release/1-6	2019-07-12 14:51:25 -07:00
Matt Keeler	3914ec5c62	Various Gateway Fixes (#6093 ) * Ensure the mesh gateway configuration comes back in the api within each upstream * Add a test for the MeshGatewayConfig in the ToAPI functions * Ensure we don’t use gateways for dc local connections * Update the svc kind index for deletions * Replace the proxycfg.state cache with an interface for testing Also start implementing proxycfg state testing. * Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.	2019-07-12 17:19:37 -04:00
Freddy	74b7bcb612	Update TestServer creation in sdk/testutil (#6084 ) * Retry the creation of the test server three times. * Reduce the retry timeout for the API wait to 2 seconds, opting to fail faster and start over. * Remove wait for leader from server creation. This wait can be added on a test by test basis now that the function is being exported. * Remove wait for anti-entropy sync. This is built into the existing WaitForSerfCheck func, so that can be used if the anti-entropy wait is needed	2019-07-12 09:37:29 -06:00
Matt Keeler	ea6cbf01a5	Centralized Config CLI (#5731 ) * Add HTTP endpoints for config entry management * Finish implementing decoding in the HTTP Config entry apply endpoint * Add CAS operation to the config entry apply endpoint Also use this for the bootstrapping and move the config entry decoding function into the structs package. * First pass at the API client for the config entries * Fixup some of the ConfigEntry APIs Return a singular response object instead of a list for the ConfigEntry.Get RPC. This gets plumbed through the HTTP API as well. Dont return QueryMeta in the JSON response for the config entry listing HTTP API. Instead just return a list of config entries. * Minor API client fixes * Attempt at some ConfigEntry api client tests These don’t currently work due to weak typing in JSON * Get some of the api client tests passing * Implement reflectwalk magic to correct JSON encoding a ProxyConfigEntry Also added a test for the HTTP endpoint that exposes the problem. However, since the test doesn’t actually do the JSON encode/decode its still failing. * Move MapWalk magic into a binary marshaller instead of JSON. * Add a MapWalk test * Get rid of unused func * Get rid of unused imports * Fixup some tests now that the decoding from msgpack coerces things into json compat types * Stub out most of the central config cli Fully implement the config read command. * Basic config delete command implementation * Implement config write command * Implement config list subcommand Not entirely sure about the output here. Its basically the read output indented with a line specifying the kind/name of each type which is also duplicated in the indented output. * Update command usage * Update some help usage formatting * Add the connect enable helper cli command * Update list command output * Rename the config entry API client methods. * Use renamed apis * Implement config write tests Stub the others with the noTabs tests. * Change list output format Now just simply output 1 line per named config * Add config read tests * Add invalid args write test. * Add config delete tests * Add config list tests * Add connect enable tests * Update some CLI commands to use CAS ops This also modifies the HTTP API for a write op to return a boolean indicating whether the value was written or not. * Fix up the HTTP API CAS tests as I realized they weren’t testing what they should. * Update config entry rpc tests to properly test CAS * Fix up a few more tests * Fix some tests that using ConfigEntries.Apply * Update config_write_test.go * Get rid of unused import	2019-04-30 16:27:16 -07:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00

43 Commits