Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.
Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.
* Making pull activity timeout configurable in Docker plugin config, first pass
* Fixing broken function call
* Fixing broken tests
* Fixing linter suggestion
* Adding documentation on new parameter in Docker plugin config
* Adding unit test
* Setting min value for pull_activity_timeout, making pull activity duration a private var
copy struct values
ensure groupserviceHook implements RunnerPreKillhook
run deregister first
test that shutdown times are delayed
move magic number into variable
Fixes a bug where if a command flag parsing errors, the resulting error
and help usage messages get interleaved in unexpected and non-user
friendly way.
The reason is that we have flag parsing library effectively writes to
ui.Error in a goroutine. This is problematic: first, we lose the sequencing between help
usage and error message; second, cli.Ui methods are not concurrent safe.
Here, we introduce a custom error writer that buffers result and calls
ui.Error() in the write method and in the same goroutine.
For context, we need to wrap ui.Error because it's line-oriented, while
flags library expects a io.Writer which is bytes oriented.
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.
Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
Avoid logging in the `watch` function as much as possible, since
it is not waited on during a server shutdown. When the logger
logs after a test passes, it may or may not cause the testing
framework to panic.
More info in:
https://github.com/golang/go/issues/29388#issuecomment-453648436
Previously, Nomad used hand rolled HTTP requests to interact with the
EC2 metadata API. Recently however, we switched to using the AWS SDK for
this fingerprinting.
The default behaviour of the AWS SDK is to perform retries with
exponential backoff when a request fails. This is problematic for Nomad,
because interacting with the EC2 API is in our client start path.
Here we revert to our pre-existing behaviour of not performing retries
in the fast path, as if the metadata service is unavailable, it's likely
that nomad is not running in AWS.
I unintentionally introduced a flapping test in #6817. The
draining status of the node will be randomly chosen and
that flag takes precedence over eligibility. This forces
the draining flag to be false rather than random so the
test should no longer flap.
See here for an example failure:
https://circleci.com/gh/hashicorp/nomad/26368
Also make Connect related fixes more consistent in the changelog. I
suspect users won't care if a Connect related fix is in the server's
admission controller or in the client's groupservice hook or somewhere
else, so I think grouping them by `consul/connect:` makes the most
sense.
Fixes#6853
Canonicalize jobs first before adding any sidecars. This fixes a bug
where sidecar tasks were added without interpolated names and broke
validation. Sidecar tasks must be canonicalized independently.
Also adds a group network to the mock connect job because it wasn't a
valid connect job before!