open-consul

Commit Graph

Author	SHA1	Message	Date
Aliaksandr Mianzhynski	c79180980c	Return grpc serving status in health check errors	2020-09-22 21:16:58 +03:00
Daniel Nephin	80ff174880	testutil: NewLogBuffer - buffer logs until a test fails Replaces #7559 Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case. Attaching log output from background goroutines also cause data races. If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race. The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long. This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful. All of the logs are printed from the test goroutine, so they should be associated with the correct test. Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly. Related: https://github.com/golang/go/issues/38458 (may be fixed in go1.15) https://github.com/golang/go/issues/38382#issuecomment-612940030	2020-07-21 12:50:40 -04:00
Daniel Nephin	f913bed325	checks: wait for goroutine to complete CheckAlias already had a waitGroup, but the Add() call was happening too late, which was causing a race in tests. The add must happen before the goroutine is started. CheckHTTP did not have a waitGroup, so I added it to match CheckAlias. It looks like a lot of the implementation could be shared, and may not need all of channel, waitgroup and bool, but I will leave that refactor for another time.	2020-07-20 18:55:39 -04:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Hans Hasselberg	7f14d3ac8a	tests: use constructor instead init (#8024 )	2020-06-04 22:59:06 +02:00
Pierre Souchay	7cd5477c3c	checks: when a service does not exists in an alias, consider it failing (#7384 ) In current implementation of Consul, check alias cannot determine if a service exists or not. Because a service without any check is semantically considered as passing, so when no healthchecks are found for an agent, the check was considered as passing. But this make little sense as the current implementation does not make any difference between: * a non-existing service (passing) * a service without any check (passing as well) In order to make it work, we have to ensure that when a check did not find any healthcheck, the service does indeed exists. If it does not, lets consider the check as failing.	2020-06-04 14:50:52 +02:00
Seth Hoenig	7ea75263ea	grpc: use default resolver scheme for grpc dialing (#7617 ) Currently checks of type gRPC will emit log messages such as, 2020/02/12 13:48:22 [INFO] parsed scheme: "" 2020/02/12 13:48:22 [INFO] scheme "" not registered, fallback to default scheme Without adding full support for using custom gRPC schemes (maybe that's right long-term path) we can just supply the default scheme as provided by the grpc library. Fixes https://github.com/hashicorp/consul/issues/7274 and https://github.com/hashicorp/nomad/issues/7415	2020-05-20 22:26:26 +02:00
Akshay Ganeshen	fd32016ce9	feat: support sending body in HTTP checks (#6602 )	2020-02-10 09:27:12 -07:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Anthony Scalisi	4b92c2deee	fix spelling errors (#7135 )	2020-01-27 07:00:33 -06:00
Matt Keeler	6de4eb8569	OSS changes for implementing token based namespace inferencing remove debug log	2019-12-18 14:07:08 -05:00
Matt Keeler	442924c35a	Sync of OSS changes to support namespaces (#6909 )	2019-12-09 21:26:41 -05:00
PHBourquin	16ca8340c1	Checks to passing/critical only after reaching a consecutive success/failure threshold (#5739 ) A check may be set to become passing/critical only if a specified number of successive checks return passing/critical in a row. Status will stay identical as before until the threshold is reached. This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.	2019-10-14 21:49:49 +01:00
Freddy	5eace88ce2	Expose HTTP-based paths through Connect proxy (#6446 ) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted.	2019-09-25 20:55:52 -06:00
hashicorp-ci	8b109e5f9f	Merge Consul OSS branch 'master' at commit ef257b084d2e2a474889518440515e360d0cd990	2019-07-20 02:00:29 +00:00
Christian Muehlhaeuser	877bfd280b	Fixed a few tautological condition mistakes (#6177 ) None of these changes should have any side-effects. They're merely fixing tautological mistakes.	2019-07-19 07:53:42 -04:00
Sarah Adams	20d9e5193b	http/tcp checks: fix long timeout behavior to default to user-configured value (#6094 ) Fixes #5834	2019-07-16 15:13:26 -07:00
Pierre Souchay	e394a9469b	Support for maximum size for Output of checks (#5233 ) * Support for maximum size for Output of checks This PR allows users to limit the size of output produced by checks at the agent and check level. When set at the agent level, it will limit the output for all checks monitored by the agent. When set at the check level, it can override the agent max for a specific check but only if it is lower than the agent max. Default value is 4k, and input must be at least 1.	2019-06-26 09:43:25 -06:00
R.B. Boyer	9b41199585	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Alvin Huang	aacb81a566	Merge pull request #5376 from hashicorp/fix-tests Fix tests in prep for CircleCI Migration	2019-04-04 17:09:32 -04:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	a41c865059	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Alvin Huang	3df8d84aae	skip TestCheckTCPPassing on CircleCI	2019-02-22 17:34:45 -05:00
Paul Banks	979e1c9c94	Add -sidecar-for and new /agent/service/:service_id endpoint (#4691 ) - A new endpoint `/v1/agent/service/:service_id` which is a generic way to look up the service for a single instance. The primary value here is that it: - supports hash-based blocking and so; - replaces `/agent/connect/proxy/:proxy_id` as the mechanism the built-in proxy uses to read its config. - It's not proxy specific and so works for any service. - It has a temporary shim to call through to the existing endpoint to preserve current managed proxy config defaulting behaviour until that is removed entirely (tested). - The built-in proxy now uses the new endpoint exclusively for it's config - The built-in proxy now has a `-sidecar-for` flag that allows the service ID of the _target_ service to be specified, on the condition that there is exactly one "sidecar" proxy (that is one that has `Proxy.DestinationServiceID` set) for the service registered. - Several fixes for edge cases for SidecarService - A fix for `Alias` checks - when running locally they didn't update their state until some external thing updated the target. If the target service has no checks registered as below, then the alias never made it past critical.	2018-10-10 16:55:34 +01:00
Mitchell Hashimoto	5159c0341c	agent/checks: prevent overflow of backoff	2018-07-12 10:21:49 -07:00
Mitchell Hashimoto	5889a3b6ff	agent: address some basic feedback	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	00d95f9214	agent/checks: support node-only checks	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	275d2b929a	agent/checks: set critical if RPC fails	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	175e74972d	agent/checks: use local state for local services	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	0c4cd2df01	agent/checks: reflect node failure as alias check failure	2018-07-12 09:36:10 -07:00
Mitchell Hashimoto	10d68ec56f	agent/checks: add Alias check type	2018-07-12 09:36:09 -07:00
Kieran Othen	4575fd378a	Update check.go Cosmetic fix to the agent's HTTP check function which always formats the result as "HTTP GET ...", ignoring any non-GET supplied HTTP method such as POST, PUT, etc.	2018-03-31 16:44:35 +01:00
Guido Iaquinti	244fc72b05	Add package name to log output	2018-03-21 15:56:14 +00:00
James Phillips	38f5b2e7ce	Gets rid of named return parameters. This wasn't wrong before but we don't generally use this style in Consul.	2018-01-25 14:29:50 -08:00
James Phillips	1acaaecbdd	Moves non-stdlib includes into their own section.	2018-01-25 14:26:15 -08:00
Dmytro Kostiuchenko	a45f6ad740	Add gRPC health-check #3073	2018-01-04 16:42:30 -05:00
James Phillips	50cdff36e5	Cleans up check logging. There were places where we still didn't have the script vs. args sorted correctly so changed all the logging to be just based on check IDs and also made everything uniform. Also removed some annoying debug logging, and moved some of the large output logging to TRACE level. Closes #3602	2017-11-10 12:48:44 -08:00
James Phillips	c060df20de	Adds missing os import.	2017-11-08 20:02:22 -08:00
James Phillips	04a7907a7e	Skips IPv6 test in Travis.	2017-11-08 18:28:45 -08:00
James Phillips	532cafe0af	Adds enable_agent_tls_for_checks configuration option which allows (#3661 ) HTTP health checks for services requiring 2-way TLS to be checked using the agent's credentials.	2017-11-07 18:22:09 -08:00
Preetha Appan	ae9e204b3a	Sets tty in docker client back to true, as a potential fix for docker exec weirdness	2017-11-05 09:44:55 -06:00
Frank Schroeder	82a52d3b50	docker: fix failing test	2017-10-31 09:26:34 +01:00
Frank Schroeder	ed1b1b54cd	docker: render errors with %v since they can be nil	2017-10-31 09:19:20 +01:00
Frank Schroeder	712447026f	docker: add comment about "connection reset by peer" error	2017-10-26 12:14:19 +02:00
Frank Schroeder	bf98779d84	docker: close idle connections on stop	2017-10-26 12:02:39 +02:00
Frank Schroeder	0a9d2a367e	docker: do not alloc a tty since this is not interactive	2017-10-26 11:56:54 +02:00
Frank Schroeder	b1a5a6b64d	docker: make sure to log the error when we fall through	2017-10-26 11:56:36 +02:00
Frank Schroeder	b907c4611d	docker: ignore "connection reset by peer" The Docker agent closes the connection during read after we have read the body. This causes a "connection reset by peer" even though the command was successful. We ignore that error here since we got the correct status code and a response body.	2017-10-26 11:56:08 +02:00
Frank Schroeder	1eb3d0e0d4	replace custom unique id with a UUID	2017-10-25 19:30:35 +02:00
Frank Schroeder	1dab004335	Decouple the code that executes checks from the agent	2017-10-25 11:18:07 +02:00

50 Commits