Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.
https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.
This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.
This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector. While debugging this issue I found and fixed the
following problems.
1. the net.Listener was not being closed properly when Listener.Stop was
called. This caused the Listener.Serve goroutine to run forever.
Fixed by storing a reference to net.Listener and closing it properly
when Listener.Stop is called.
2. call connWG.Add in the correct place. WaitGroup.Add must be called
before starting a goroutine, not from inside the goroutine.
3. Set metrics config EnableRuntimeMetrics to `false` so that we don't
start a background goroutine in each test for no reason. There is no
way to shutdown this goroutine, and it was an added distraction while
debugging these timeouts.
5. two tests were calling require.NoError from a goroutine.
require.NoError calls t.FailNow, which MUST be called from the main
test goroutine. Instead use t.Errorf, which can be called from other
goroutines and will still fail the test.
6. `assertCurrentGaugeValue` wass breaking out of a for loop, which
would cause the `RWMutex.RUnlock` to be missed. Fixed by calling
unlock before `break`.
The core issue of a deadlock was fixed by https://github.com/armon/go-metrics/pull/124.
There is no way to compare x509.CertPools now that it has an unexpected
function field. This comparison is as close as we can get.
See https://github.com/golang/go/issues/26614 for a related issue.
* Allow passing ALPN next protocols down to connect services. Fixes#4466.
* Update connect/proxy/proxy_test.go
Co-authored-by: Paul Banks <banks@banksco.de>
Co-authored-by: Paul Banks <banks@banksco.de>
Add a skip condition to all tests slower than 100ms.
This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests
With this change:
```
$ time go test -count=1 -short ./agent
ok github.com/hashicorp/consul/agent 0.743s
real 0m4.791s
$ time go test -count=1 -short ./agent/consul
ok github.com/hashicorp/consul/agent/consul 4.229s
real 0m8.769s
```
Most packages should pass the race detector. An exclude list ensures
that new packages are automatically tested with -race.
Also fix a couple small test races to allow more packages to be tested.
Returning readyCh requires a lock because it can be set to nil, and
setting it to nil will race without the lock.
Move the TestServer.Listening calls around so that they properly guard
setting TestServer.l. Otherwise it races.
Remove t.Parallel in a small package. The entire package tests run in a
few seconds, so t.Parallel does very little.
In auto-config, wait for the AutoConfig.run goroutine to stop before
calling readPersistedAutoConfig. Without this change there was a data
race on reading ac.config.
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.
- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.
- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.
- Add a new serf feature flag indicating support for
intentions-as-config-entries.
- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.
- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.
- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.
- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
There are a couple of things in here.
First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.
This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.
In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
* Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs.
* Ensure key type ad bits are populated from CA cert and clean up tests
* Add integration test and fix error when initializing secondary CA with RSA key.
* Add more tests, fix review feedback
* Update docs with key type config and output
* Apply suggestions from code review
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
The fields in the certs are meant to hold the original binary
representation of this data, not some ascii-encoded version.
The only time we should be colon-hex-encoding fields is for display
purposes or marshaling through non-TLS mediums (like RPC).
This should cut down on test flakiness.
Problems handled:
- If you had enough parallel test cases running, the former circular
approach to handling the port block could hand out the same port to
multiple cases before they each had a chance to bind them, leading to
one of the two tests to fail.
- The freeport library would allocate out of the ephemeral port range.
This has been corrected for Linux (which should cover CI).
- The library now waits until a formerly-in-use port is verified to be
free before putting it back into circulation.
* Move the watch package into the api module
It was already just a thin wrapper around the API anyways. The biggest change was to the testing. Instead of using a test agent directly from the agent package it now uses the binary on the PATH just like the other API tests.
The other big changes were to fix up the connect based watch tests so that we didn’t need to pull in the connect package (and therefore all of Consul)
* First conversion
* Use serf 0.8.2 tag and associated updated deps
* * Move freeport and testutil into internal/
* Make internal/ its own module
* Update imports
* Add replace statements so API and normal Consul code are
self-referencing for ease of development
* Adapt to newer goe/values
* Bump to new cleanhttp
* Fix ban nonprintable chars test
* Update lock bad args test
The error message when the duration cannot be parsed changed in Go 1.12
(ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test.
* Update another test as well
* Bump travis
* Bump circleci
* Bump go-discover and godo to get rid of launchpad dep
* Bump dockerfile go version
* fix tar command
* Bump go-cleanhttp
This way we can avoid unnecessary panics which cause other tests not to run.
This doesn't remove all the possibilities for panics causing other tests not to run, it just fixes the TestAgent
- A new endpoint `/v1/agent/service/:service_id` which is a generic way to look up the service for a single instance. The primary value here is that it:
- **supports hash-based blocking** and so;
- **replaces `/agent/connect/proxy/:proxy_id`** as the mechanism the built-in proxy uses to read its config.
- It's not proxy specific and so works for any service.
- It has a temporary shim to call through to the existing endpoint to preserve current managed proxy config defaulting behaviour until that is removed entirely (tested).
- The built-in proxy now uses the new endpoint exclusively for it's config
- The built-in proxy now has a `-sidecar-for` flag that allows the service ID of the _target_ service to be specified, on the condition that there is exactly one "sidecar" proxy (that is one that has `Proxy.DestinationServiceID` set) for the service registered.
- Several fixes for edge cases for SidecarService
- A fix for `Alias` checks - when running locally they didn't update their state until some external thing updated the target. If the target service has no checks registered as below, then the alias never made it past critical.
* Refactor Service Definition ProxyDestination.
This includes:
- Refactoring all internal structs used
- Updated tests for both deprecated and new input for:
- Agent Services endpoint response
- Agent Service endpoint response
- Agent Register endpoint
- Unmanaged deprecated field
- Unmanaged new fields
- Managed deprecated upstreams
- Managed new
- Catalog Register
- Unmanaged deprecated field
- Unmanaged new fields
- Managed deprecated upstreams
- Managed new
- Catalog Services endpoint response
- Catalog Node endpoint response
- Catalog Service endpoint response
- Updated API tests for all of the above too (both deprecated and new forms of register)
TODO:
- config package changes for on-disk service definitions
- proxy config endpoint
- built-in proxy support for new fields
* Agent proxy config endpoint updated with upstreams
* Config file changes for upstreams.
* Add upstream opaque config and update all tests to ensure it works everywhere.
* Built in proxy working with new Upstreams config
* Command fixes and deprecations
* Fix key translation, upstream type defaults and a spate of other subtele bugs found with ned to end test scripts...
TODO: tests still failing on one case that needs a fix. I think it's key translation for upstreams nested in Managed proxy struct.
* Fix translated keys in API registration.
≈
* Fixes from docs
- omit some empty undocumented fields in API
- Bring back ServiceProxyDestination in Catalog responses to not break backwards compat - this was removed assuming it was only used internally.
* Documentation updates for Upstreams in service definition
* Fixes for tests broken by many refactors.
* Enable travis on f-connect branch in this branch too.
* Add consistent Deprecation comments to ProxyDestination uses
* Update version number on deprecation notices, and correct upstream datacenter field with explanation in docs
This improves the checking so that if a certificate were to expire or the roots changed then we will go into a non-ready state.
This parses the x509 certificates from the TLS certificate when the leaf is set. The readyCh will be closed whenever a parseable certificate is set and the ca roots are set. This does not mean that the certificate is valid but that it has been setup and is generally valid. The Ready function will now do x509 certificate verification which will in addition to verifying the signatures with the installed CA roots will also verify the certificate isn't expired or not set to become valid in the future.
The correct way to use these functions is to wait for the ReadyWait chan to be closed and then periodically check the readiness to determine if the certificate is currently useable.
Improve flaky connect/proxy Listener tests
- Add sleep to TestEchoConn to allow for Read/Write to finish before fetching data in reportStats
- Account for flakiness around interval for Gauge
- Improve debug output when dumping metrics