The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a
bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.
This test is super racy (it's not just a single line).
This test also starts failing once streaming is enabled, because the
cache rate limit no longer applies to the requests in the test. The
queries use streaming instead of the cache.
This test is no longer valid, and the functionality is already well
tested by TestCacheThrottle. Instead of spending time rewriting this
test, let's remove it.
```
WARNING: DATA RACE
Read at 0x00c01de410fc by goroutine 735:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:1024 +0x9af
github.com/hashicorp/consul/testrpc.WaitForTestAgent()
/home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
Previous write at 0x00c01de410fc by goroutine 605:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1.2()
/home/daniel/pers/code/consul/agent/agent_test.go:998 +0xe9
Goroutine 735 (running) created at:
testing.(*T).Run()
/usr/lib/go/src/testing/testing.go:1238 +0x5d7
github.com/hashicorp/consul/agent.TestCacheRateLimit()
/home/daniel/pers/code/consul/agent/agent_test.go:961 +0x375
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
Goroutine 605 (finished) created at:
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:1022 +0x91e
github.com/hashicorp/consul/testrpc.WaitForTestAgent()
/home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
/home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
testing.tRunner()
/usr/lib/go/src/testing/testing.go:1193 +0x202
```
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
* add http2 ping checks
* fix test issue
* add h2ping check to config resources
* add new test and docs for h2ping
* fix grammatical inconsistency in H2PING documentation
* resolve rebase conflicts, add test for h2ping tls verification failure
* api documentation for h2ping
* update test config data with H2PING
* add H2PING to protocol buffers and update changelog
* fix typo in changelog entry
The method is only used in tests, and only exists for legacy calls.
There was one other package which used this method in tests. Export
the AddServiceRequest and a couple of its fields so the new function can
be used in those tests.
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.
Also remove some unused methods.
Add a skip condition to all tests slower than 100ms.
This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests
With this change:
```
$ time go test -count=1 -short ./agent
ok github.com/hashicorp/consul/agent 0.743s
real 0m4.791s
$ time go test -count=1 -short ./agent/consul
ok github.com/hashicorp/consul/agent/consul 4.229s
real 0m8.769s
```
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.
This problem is fixed by capturing the variable by passing it into a
function.
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.
Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.
I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.
This change is also a small step in reducing the scope of Agent.
Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.
Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
https server.
In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.
This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
And into token.Store. This change isolates any awareness of token
persistence in a single place.
It is a small step in allowing Agent.New to accept its dependencies.
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.
This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.
This also refactors the auto-config package a bit to split things out into multiple files.
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.
The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
This implements a solution for #7863
It does:
Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
Add cache.entry_fetch_max_burst size of rate limit (default value = 2)
The new configuration now supports the following syntax for instance to allow 1 query every 3s:
command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
in JSON
{
"cache": {
"entry_fetch_rate": 0.333
}
}
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
The old test case was a very specific regresion test for a case that is no longer possible.
Replaced with a new test that checks the default coordinate is returned.
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of or in addition to the statically defined one in the configuration.
There are a couple of things in here.
First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.
This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
In current implementation of Consul, check alias cannot determine
if a service exists or not. Because a service without any check
is semantically considered as passing, so when no healthchecks
are found for an agent, the check was considered as passing.
But this make little sense as the current implementation does not
make any difference between:
* a non-existing service (passing)
* a service without any check (passing as well)
In order to make it work, we have to ensure that when a check did
not find any healthcheck, the service does indeed exists. If it
does not, lets consider the check as failing.