open-consul

Commit Graph

Author	SHA1	Message	Date
R.B. Boyer	9b41199585	agent: fix several data races and bugs related to node-local alias checks (#5876 ) The observed bug was that a full restart of a consul datacenter (servers and clients) in conjunction with a restart of a connect-flavored application with bring-your-own-service-registration logic would very frequently cause the envoy sidecar service check to never reflect the aliased service. Over the course of investigation several bugs and unfortunate interactions were corrected: (1) local.CheckState objects were only shallow copied, but the key piece of data that gets read and updated is one of the things not copied (the underlying Check with a Status field). When the stock code was run with the race detector enabled this highly-relevant-to-the-test-scenario field was found to be racy. Changes: a) update the existing Clone method to include the Check field b) copy-on-write when those fields need to change rather than incrementally updating them in place. This made the observed behavior occur slightly less often. (2) If anything about how the runLocal method for node-local alias check logic was ever flawed, there was no fallback option. Those checks are purely edge-triggered and failure to properly notice a single edge transition would leave the alias check incorrect until the next flap of the aliased check. The change was to introduce a fallback timer to act as a control loop to double check the alias check matches the aliased check every minute (borrowing the duration from the non-local alias check logic body). This made the observed behavior eventually go away when it did occur. (3) Originally I thought there were two main actions involved in the data race: A. The act of adding the original check (from disk recovery) and its first health evaluation. B. The act of the HTTP API requests coming in and resetting the local state when re-registering the same services and checks. It took awhile for me to realize that there's a third action at work: C. The goroutines associated with the original check and the later checks. The actual sequence of actions that was causing the bad behavior was that the API actions result in the original check to be removed and re-added _without waiting for the original goroutine to terminate_. This means for brief windows of time during check definition edits there are two goroutines that can be sending updates for the alias check status. In extremely unlikely scenarios the original goroutine sees the aliased check start up in `critical` before being removed but does not get the notification about the nearly immediate update of that check to `passing`. This is interlaced wit the new goroutine coming up, initializing its base case to `passing` from the current state and then listening for new notifications of edge triggers. If the original goroutine "finishes" its update, it then commits one more write into the local state of `critical` and exits leaving the alias check no longer reflecting the underlying check. The correction here is to enforce that the old goroutines must terminate before spawning the new one for alias checks.	2019-05-24 13:36:56 -05:00
Alvin Huang	aacb81a566	Merge pull request #5376 from hashicorp/fix-tests Fix tests in prep for CircleCI Migration	2019-04-04 17:09:32 -04:00
Jeff Mitchell	d3c7d57209	Move internal/ to sdk/ (#5568 ) * Move internal/ to sdk/ * Add a readme to the SDK folder	2019-03-27 08:54:56 -04:00
Jeff Mitchell	a41c865059	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
Alvin Huang	3df8d84aae	skip TestCheckTCPPassing on CircleCI	2019-02-22 17:34:45 -05:00
Paul Banks	979e1c9c94	Add -sidecar-for and new /agent/service/:service_id endpoint (#4691 ) - A new endpoint `/v1/agent/service/:service_id` which is a generic way to look up the service for a single instance. The primary value here is that it: - supports hash-based blocking and so; - replaces `/agent/connect/proxy/:proxy_id` as the mechanism the built-in proxy uses to read its config. - It's not proxy specific and so works for any service. - It has a temporary shim to call through to the existing endpoint to preserve current managed proxy config defaulting behaviour until that is removed entirely (tested). - The built-in proxy now uses the new endpoint exclusively for it's config - The built-in proxy now has a `-sidecar-for` flag that allows the service ID of the _target_ service to be specified, on the condition that there is exactly one "sidecar" proxy (that is one that has `Proxy.DestinationServiceID` set) for the service registered. - Several fixes for edge cases for SidecarService - A fix for `Alias` checks - when running locally they didn't update their state until some external thing updated the target. If the target service has no checks registered as below, then the alias never made it past critical.	2018-10-10 16:55:34 +01:00
Mitchell Hashimoto	5159c0341c	agent/checks: prevent overflow of backoff	2018-07-12 10:21:49 -07:00
Mitchell Hashimoto	5889a3b6ff	agent: address some basic feedback	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	00d95f9214	agent/checks: support node-only checks	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	275d2b929a	agent/checks: set critical if RPC fails	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	175e74972d	agent/checks: use local state for local services	2018-07-12 09:36:11 -07:00
Mitchell Hashimoto	0c4cd2df01	agent/checks: reflect node failure as alias check failure	2018-07-12 09:36:10 -07:00
Mitchell Hashimoto	10d68ec56f	agent/checks: add Alias check type	2018-07-12 09:36:09 -07:00
Kieran Othen	4575fd378a	Update check.go Cosmetic fix to the agent's HTTP check function which always formats the result as "HTTP GET ...", ignoring any non-GET supplied HTTP method such as POST, PUT, etc.	2018-03-31 16:44:35 +01:00
Guido Iaquinti	244fc72b05	Add package name to log output	2018-03-21 15:56:14 +00:00
James Phillips	38f5b2e7ce	Gets rid of named return parameters. This wasn't wrong before but we don't generally use this style in Consul.	2018-01-25 14:29:50 -08:00
James Phillips	1acaaecbdd	Moves non-stdlib includes into their own section.	2018-01-25 14:26:15 -08:00
Dmytro Kostiuchenko	a45f6ad740	Add gRPC health-check #3073	2018-01-04 16:42:30 -05:00
James Phillips	50cdff36e5	Cleans up check logging. There were places where we still didn't have the script vs. args sorted correctly so changed all the logging to be just based on check IDs and also made everything uniform. Also removed some annoying debug logging, and moved some of the large output logging to TRACE level. Closes #3602	2017-11-10 12:48:44 -08:00
James Phillips	c060df20de	Adds missing os import.	2017-11-08 20:02:22 -08:00
James Phillips	04a7907a7e	Skips IPv6 test in Travis.	2017-11-08 18:28:45 -08:00
James Phillips	532cafe0af	Adds enable_agent_tls_for_checks configuration option which allows (#3661 ) HTTP health checks for services requiring 2-way TLS to be checked using the agent's credentials.	2017-11-07 18:22:09 -08:00
Preetha Appan	ae9e204b3a	Sets tty in docker client back to true, as a potential fix for docker exec weirdness	2017-11-05 09:44:55 -06:00
Frank Schroeder	82a52d3b50	docker: fix failing test	2017-10-31 09:26:34 +01:00
Frank Schroeder	ed1b1b54cd	docker: render errors with %v since they can be nil	2017-10-31 09:19:20 +01:00
Frank Schroeder	712447026f	docker: add comment about "connection reset by peer" error	2017-10-26 12:14:19 +02:00
Frank Schroeder	bf98779d84	docker: close idle connections on stop	2017-10-26 12:02:39 +02:00
Frank Schroeder	0a9d2a367e	docker: do not alloc a tty since this is not interactive	2017-10-26 11:56:54 +02:00
Frank Schroeder	b1a5a6b64d	docker: make sure to log the error when we fall through	2017-10-26 11:56:36 +02:00
Frank Schroeder	b907c4611d	docker: ignore "connection reset by peer" The Docker agent closes the connection during read after we have read the body. This causes a "connection reset by peer" even though the command was successful. We ignore that error here since we got the correct status code and a response body.	2017-10-26 11:56:08 +02:00
Frank Schroeder	1eb3d0e0d4	replace custom unique id with a UUID	2017-10-25 19:30:35 +02:00
Frank Schroeder	1dab004335	Decouple the code that executes checks from the agent	2017-10-25 11:18:07 +02:00

1 2

82 Commits