The previous ordering of defers meant the listener's connWG could fire and wake up other goroutines before the connection closed. Unsure if this caused any real bugs but this commit should make the code more correct.
The watch is established in a background goroutine and the first assertion proves that the watcher is active so there is no reason for the update to happen in a racy goroutine.
Note that this does not completely remove the race condition as the first call to testGetConfigValTimeout could time out before a config is returned.
p.service is written to within the Serve method. The Serve method also waits for the stopChan to be closed.
The race was between Close being called on the proxy causing Close on the service which was written to around the same time in the Serve method.
The fix is to have Serve be responsible for closing p.service.
- Introduce a new telemetry configurable parameter retry_failed_connection. User can set the value to true to let consul agent continue its start process on failed connection to datadog server. When set to false, agent will stop on failed start. The default behavior is true.
Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.
https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.
This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.
This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector. While debugging this issue I found and fixed the
following problems.
1. the net.Listener was not being closed properly when Listener.Stop was
called. This caused the Listener.Serve goroutine to run forever.
Fixed by storing a reference to net.Listener and closing it properly
when Listener.Stop is called.
2. call connWG.Add in the correct place. WaitGroup.Add must be called
before starting a goroutine, not from inside the goroutine.
3. Set metrics config EnableRuntimeMetrics to `false` so that we don't
start a background goroutine in each test for no reason. There is no
way to shutdown this goroutine, and it was an added distraction while
debugging these timeouts.
5. two tests were calling require.NoError from a goroutine.
require.NoError calls t.FailNow, which MUST be called from the main
test goroutine. Instead use t.Errorf, which can be called from other
goroutines and will still fail the test.
6. `assertCurrentGaugeValue` wass breaking out of a for loop, which
would cause the `RWMutex.RUnlock` to be missed. Fixed by calling
unlock before `break`.
The core issue of a deadlock was fixed by https://github.com/armon/go-metrics/pull/124.
There is no way to compare x509.CertPools now that it has an unexpected
function field. This comparison is as close as we can get.
See https://github.com/golang/go/issues/26614 for a related issue.
* Allow passing ALPN next protocols down to connect services. Fixes#4466.
* Update connect/proxy/proxy_test.go
Co-authored-by: Paul Banks <banks@banksco.de>
Co-authored-by: Paul Banks <banks@banksco.de>
Add a skip condition to all tests slower than 100ms.
This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests
With this change:
```
$ time go test -count=1 -short ./agent
ok github.com/hashicorp/consul/agent 0.743s
real 0m4.791s
$ time go test -count=1 -short ./agent/consul
ok github.com/hashicorp/consul/agent/consul 4.229s
real 0m8.769s
```
Most packages should pass the race detector. An exclude list ensures
that new packages are automatically tested with -race.
Also fix a couple small test races to allow more packages to be tested.
Returning readyCh requires a lock because it can be set to nil, and
setting it to nil will race without the lock.
Move the TestServer.Listening calls around so that they properly guard
setting TestServer.l. Otherwise it races.
Remove t.Parallel in a small package. The entire package tests run in a
few seconds, so t.Parallel does very little.
In auto-config, wait for the AutoConfig.run goroutine to stop before
calling readPersistedAutoConfig. Without this change there was a data
race on reading ac.config.
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.
- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.
- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.
- Add a new serf feature flag indicating support for
intentions-as-config-entries.
- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.
- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.
- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.
- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
There are a couple of things in here.
First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.
This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.
In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.
* Ensure key type ad bits are populated from CA cert and clean up tests
* Add integration test and fix error when initializing secondary CA with RSA key.
* Add more tests, fix review feedback
* Update docs with key type config and output
* Apply suggestions from code review
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.
The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
This should cut down on test flakiness.
Problems handled:
- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.
- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).
- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
* Move the watch package into the api module
It was already just a thin wrapper around the API anyways. The biggest change was to the testing. Instead of using a test agent directly from the agent package it now uses the binary on the PATH just like the other API tests.
The other big changes were to fix up the connect based watch tests so that we didn’t need to pull in the connect package (and therefore all of Consul)
* First conversion
* Use serf 0.8.2 tag and associated updated deps
* * Move freeport and testutil into internal/
* Make internal/ its own module
* Update imports
* Add replace statements so API and normal Consul code are
self-referencing for ease of development
* Adapt to newer goe/values
* Bump to new cleanhttp
* Fix ban nonprintable chars test
* Update lock bad args test
The error message when the duration cannot be parsed changed in Go 1.12
(ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test.
* Update another test as well
* Bump travis
* Bump circleci
* Bump go-discover and godo to get rid of launchpad dep
* Bump dockerfile go version
* fix tar command
* Bump go-cleanhttp
This way we can avoid unnecessary panics which cause other tests not to run.
This doesn't remove all the possibilities for panics causing other tests not to run, it just fixes the TestAgent