This commit changes the force closing of the stdout/stderr file
descriptor from closing immediately to being closed after a grace
period. This allows the created process to close its own file and allows
copying of the data.
GH #4290
Add digest support to the docker driver image config. This commit
factors out some common code to print the repo:tag (dockerImageRef) for
events/logs as well as parsing the image to retreive the repo,tag
(parseDockerImage) so that the results are consistent/sane for both
repo:tag and repo@sha256:... references.
When pulling an image with a digest, the tag is blank and the repo
contains the digest. See:
https://github.com/fsouza/go-dockerclient/blob/master/image_test.go#L471
We attempt to avoid splitting a log line between two files by detecting
if we are near the file size limit and scanning for new lines and only
flushing those.
BenchmarkRotator/1KB-8 300000 5613 ns/op
BenchmarkRotator/2KB-8 200000 8384 ns/op
BenchmarkRotator/4KB-8 100000 14604 ns/op
BenchmarkRotator/8KB-8 50000 25002 ns/op
BenchmarkRotator/16KB-8 30000 47572 ns/op
BenchmarkRotator/32KB-8 20000 92080 ns/op
BenchmarkRotator/64KB-8 10000 165883 ns/op
BenchmarkRotator/128KB-8 5000 294405 ns/op
BenchmarkRotator/256KB-8 2000 572374 ns/op
Guard against Canary being set to false at the same time as an
allocation is being stopped: this could cause RemoveTask to be called
with the wrong Canary value and leaking a service.
Deleting both Canary values is the safest route.
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.
Adds an e2e test for canary tags.
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
In the old code `sending` in the `send()` method shared the Data slice's
underlying backing array with its caller. Clearing StreamFrame.Data
didn't break the reference from the sent frame to the StreamFramer's
data slice.
According to go/codec's docs, Reset(...) should be called on
Decoders/Encoders before reuse:
https://godoc.org/github.com/ugorji/go/codec
I could find no evidence that *not* calling Reset() caused bugs, but
might as well do what the docs say?
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.
An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...
Ref: https://github.com/hashicorp/nomad/pull/4185
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
I could not reproduce the failure locally even with `stress -cpu ...`
eating all the cpu it could on my machine.
But I think the race was in one of two places:
* The task could restart which could create new events
* I think there could be a race between the updater's version of events
and alloc runners as updates are async
I fixed both. Here's hoping that fixes this flaky test.