Pick the random ports only once and try starting with them
a number of times so that the configuration can be re-used.
This is because the ports are written into the data files
and a subsequent agent reading the files needs to have the
same ports.
For the same reason we do not remove the data directory on
every attempt since this makes it impossible to re-read the
data files.
* refactor DNS server to be ready for multiple bind addresses
* drop tcpKeepAliveListener since it is default for the HTTP servers
* add startup timeout watcher for HTTP servers identical to DNS server
* don't use retry to try restarting the agent
this caused some issues when the startup would fail in
a separate go routine
* clear out the data directory on every retry since the ports
are stored in the raft data files
* set a unique id for every agent to allow for tracking of
concurrent output
This brings down the test run from 108 sec to 15 sec.
There is an occasional port conflict because of the nature
the next port is chosen. So far it seems rare enough to live
with it.
TestAgent will replace the following mechanisms to
start test agents in subsequent requests:
* makeAgentXXX
* makeDNSServerXXX
* makeHTTPServerXXX
* testServer
* httpTest
Move the HTTP and DNS endpoints into the agent and control
their lifespan via the agent.
This removes the requirement to manage HTTP and DNS servers
indpendent of the agent since the agent is mostly useless
without an endpoint and the endpoints without the agent.
This patch adds support for a custom check id and name when
registering a service.
This is achieved by adding a CheckID and a Name field to the
CheckType structure which is used to register checks with a
service and when returning health check definitions.
CheckDefinition is a superset of CheckType which duplicates
some of the fields of CheckType. This patch decouples these
two structures by removing the embedding of CheckType in
CheckDefinition.
Fixes#3047
This patch adds a new internal interface clientServer
which defines the common methods of consul.Client and
consul.Server. This allows to replace the following
code
if a.server != nil {
a.server.do()
} else {
a.client.do()
}
with
a.delegate.do()
In case a specific type is required a type check can
be performed:
if srv, ok := a.delegate.(*consul.Server); ok {
srv.doSrv()
}
This creates a simplified helper for temporary directories and files.
All path names are prefixed with the name of the current test.
All files and directories are stored either in /tmp/consul-test
or /tmp if the former could not be created.
Using the system temp dir breaks some tests on macOS where the unix
socket path becomes too long.
macOS displays a firewall warning dialog when an unsigned
application is trying to bind to a non-loopback address.
This patch updates some test configurations to ensure binding
to a loopback address where possible to suppress these warnings.
Use the bind address as source address for outgoing
RPC connections unless it is INADDR_ANY.
The current code uses the advertise address which will
not work in certain environments where the advertise
address is not routable in the network of the agent,
e.g. NAT environment, container... After all, that is
the purpose of the advertise address.
See #2822
Since this was doing registration to a foreign DC, it needs extra time
for the route to the ACL datacenter to be set up. ACLs aren't part of
this test, so by disabling them we make this more reliable and converge
faster than if we had added a retry.
Refactor tests that use testutil.WaitForResult to use retry.
Since this requires refactoring the test functions in general this patch
also shows the use of the github.com/pascaldekloe/goe/verify library
which provides a good mechanism for comparing nested data structures.
Instead of just converting the tests from testutil.WaitForResult to
retry the tests that performing a nested comparison of data structures
are converted to the verify library at the same time.
This patch removes duplicate internal copies of constants in the structs
package which are also defined in the api package. The api.KVOp type
with all its values for the TXN endpoint and the api.HealthXXX constants
are now used throughout the codebase.
This resulted in some circular dependencies in the testutil package
which have been resolved by copying code and constants and moving the
WaitForLeader function into a separate testrpc package.
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.
Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.
When consul-template is communicating with consul and the job is done, consul thread receives SIGPIPE.
This cause the logs to be filled "Caught signal: broken pipe" and they does not bring any usefull info with them.
Skipping those.
Ended up removing the leader_test.go server address change test as part
of this. The join was failing becase we were using a new node name with
the new logic here, but realized this was hitting some of the memberlist
conflict logic and not working as we expected. We need some additional
work to fully support address changes, so removed the test for now.
We fixed a few related issues while we were in here. We now only let
services register checks with a matching token, and we also close out
service and check delete operations if the catalog deregister claims
it doesn't know about the ID of the service or check being deleted.
This makes the upgrade path a bit nicer, since people will likely have
older configurations. This prints out a warning instead of just failing
if the old rpc addr or ports definition is in the config.
This has the next wave of RTT integration with the router and also
factors some common RTT-related helpers out to lib. While we were
in here we also got rid of the coordinate disable config so we don't
need to deal with the complexity in the router (there was never a
user-visible way to disable coordinates).
This adds two goroutines to perform autopilot tasks on the leader - one
to monitor the health of servers and another to periodically clean up
dead servers with a limit on removal count. Also adds a new http endpoint,
`/v1/operator/autopilot/health`, for querying this information through an
operator RPC endpoint.