* Adds a constraint to prevent tests from landing on Windows
* Improve Terraform output for mixed windows/linux clients
* Makes some Windows client config fixes from 0.10.2 testing
The test asserts that alloc counts get reported accurately in metrics by
inspecting the metrics endpoint directly. Sadly, the metrics as
collected by `armon/go-metrics` seem to be stateful and may contain info
from other tests.
This means that the test can fail depending on the order of returned
metrics.
Inspecting the metrics output of one failing run, you can see the
duplicate guage entries but for different node_ids:
```
{
"Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
"Value": 10,
"Labels": {
"datacenter": "dc1",
"node_class": "none",
"node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a"
}
},
{
"Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
"Value": 0,
"Labels": {
"datacenter": "dc1",
"node_class": "none",
"node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c"
}
},
```
Looks like the RecoverTask doesn't set taskHandle.logger field causing
a panic when the handle attempts to log (e.g. when Shutdown or Signaling
fails).
make test-nomad sets 15 minute time out for build. Increase the ci
timeout to 20m, so we can get meaningful output and goroutine stack
traces rather than have test be simply killed by CircleCI.
The extra 5 minutes is a buffer for generating-structs and some
unnecessary padding.
TestClient_UpdateNodeFromFingerprintKeepsConfig checks a test node
network interface, which is hardcoded to `eth0` and is updated
asynchronously. This causes flakiness when eth0 isn't available.
Here, we hardcode the value to an arbitrary network interface.
When spinning a second client, ensure that it uses new driver
instances, rather than reuse the already shutdown unhealthy drivers from
first instance.
This speeds up tests significantly, but cutting ~50 seconds or so, the
timeout in NewClient until drivers fingerprints. They never do because
drivers were shutdown already.
TestClient_RestoreError is very slow, taking ~81 seconds.
It has few problematic patterns. It's unclear what it tests, it
simulates a failure condition where all state db lookup fails and
asserts that alloc fails. Though starting from
https://github.com/hashicorp/nomad/pull/6216 , we don't fail allocs in
that condition but rather restart them.
Also, the drivers used in second client `c2` are the same singleton
instances used in `c1` and already shutdown. We ought to start healthy
new driver instances.
This adopts pattern used by Vault, where we split CircleCI yaml config
into multiple files that get packed and translated to 2.0.
This has two motivations: First, to ease translating config to CircleCI
2.0 so it can run on Enterprise private repository. Second and most
importantly, it also adding Enterprise specific jobs in separate files
with reduced config file merging conflict resolution.
This is a remenant of the time we used a custom hashicorp docker image for CI.
Currently, we use the official golang image, so no longer need the job
or manage the dockerhub credentials.
`stable-website` branch is only meant for updating the nomadproject.io
website, and the backend tests are irrelevant. Also, the ci workflow
uses up the plans containers and may delay website deployments by 20
minutes or more while we are cutting a release.