Commit Graph

3128 Commits

Author SHA1 Message Date
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Preetha 9084bb025e
Merge pull request #4303 from hashicorp/b-docker-client-nil-panic
Add nil check before setting timeout on docker client
2018-05-21 19:34:44 -07:00
Jesus Vazquez 23d959e42c Add job, task, taskgroup to open method 2018-05-21 20:37:18 +02:00
Jesus Vazquez 0a062a04c7 Remove allocID from dockerhandle struct 2018-05-21 20:33:01 +02:00
Jesus Vazquez e5a81815bb Rename labels job, task_group and task 2018-05-21 20:32:50 +02:00
Jesus Vazquez ffe1b1a1b6 Remove allocid label from driver.docker.oom counter metric 2018-05-21 20:30:56 +02:00
Alex Dadgar 38762d9bde
Merge pull request #4282 from hashicorp/f-rotator
Avoid splitting log line across two files
2018-05-21 17:52:13 +00:00
Alex Dadgar d95698e2c5
Merge pull request #4298 from justenwalker/docker-driver-digest-tags
driver/docker: pull image with digest
2018-05-21 17:46:14 +00:00
Nick Ethier 6392009dd6
client/driver: use correct repo address when using docker-credential helper (#4266) 2018-05-15 17:39:48 -04:00
Justen Walker a8989f33bb driver/docker: add test for dockerImageRef 2018-05-14 14:24:03 -04:00
Justen Walker 194b2231d6 driver/docker: fix up TestParseDockerImage 2018-05-14 14:23:48 -04:00
Justen Walker 25b2807ce3 driver/docker: fix TestDockerDriver_ForcePull_RepoDigest 2018-05-14 14:23:02 -04:00
Nick Ethier c4d07a2200
client/driver: gaurd authHelper test from running on windows 2018-05-14 13:46:57 -04:00
Justen Walker b23ca7574c driver/docker: cleanup parseDockerImage 2018-05-14 11:11:51 -04:00
Justen Walker 60f7f1aa08 driver/docker: pull image with digest
GH #4290

Add digest support to the docker driver image config. This commit
factors out some common code to print the repo:tag (dockerImageRef) for
events/logs as well as parsing the image to retreive the repo,tag
(parseDockerImage) so that the results are consistent/sane for both
repo:tag and repo@sha256:... references.

When pulling an image with a digest, the tag is blank and the repo
contains the digest. See:
https://github.com/fsouza/go-dockerclient/blob/master/image_test.go#L471
2018-05-14 10:42:58 -04:00
Preetha Appan de66ec7394
Add nil check before setting timeout on docker client 2018-05-11 17:09:26 -05:00
Alex Dadgar 7ad5c76734 Add new line test 2018-05-11 10:52:09 -07:00
Alex Dadgar 3671ed139d Avoid splitting log line across two files
We attempt to avoid splitting a log line between two files by detecting
if we are near the file size limit and scanning for new lines and only
flushing those.

BenchmarkRotator/1KB-8            300000              5613 ns/op
BenchmarkRotator/2KB-8            200000              8384 ns/op
BenchmarkRotator/4KB-8            100000             14604 ns/op
BenchmarkRotator/8KB-8             50000             25002 ns/op
BenchmarkRotator/16KB-8            30000             47572 ns/op
BenchmarkRotator/32KB-8            20000             92080 ns/op
BenchmarkRotator/64KB-8            10000            165883 ns/op
BenchmarkRotator/128KB-8            5000            294405 ns/op
BenchmarkRotator/256KB-8            2000            572374 ns/op
2018-05-10 15:11:01 -07:00
Alex Dadgar f5d91b5338 Benchmark for rotator
BenchmarkRotator/1KB-8            200000              5572 ns/op
BenchmarkRotator/2KB-8            200000              8338 ns/op
BenchmarkRotator/4KB-8            100000             14246 ns/op
BenchmarkRotator/8KB-8             50000             25279 ns/op
BenchmarkRotator/16KB-8            30000             48602 ns/op
BenchmarkRotator/32KB-8            20000             92159 ns/op
BenchmarkRotator/64KB-8            10000            154766 ns/op
BenchmarkRotator/128KB-8            5000            296872 ns/op
BenchmarkRotator/256KB-8            3000            551793 ns/op
2018-05-10 14:15:15 -07:00
Nick Ethier 91603a377e
client/driver: parse repo instead of attempting to pull repo info 2018-05-09 22:34:25 -04:00
Nick Ethier 38a33f9c75
client/driver: add test for docker auth helper 2018-05-09 22:33:56 -04:00
Alex Dadgar e067a9ae06 naming of constants 2018-05-09 16:46:52 -07:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Alex Dadgar 0e79e1a46e Keep stream and logs in sync for detecting closed pipe 2018-05-09 11:22:52 -07:00
Preetha e7ae6e98d9
Merge pull request #4259 from hashicorp/f-deployment-improvements 2018-05-08 16:37:10 -05:00
Nick Ethier 3598925ca4
client/driver: use correct repo address when using docker-credential helper 2018-05-08 15:17:28 -04:00
Nick Ethier 54c86a0292
client/driver/env: interpolate empty optional meta params as empty strings 2018-05-07 20:19:51 -04:00
Nick Ethier 016ab7a105
client/driver: remove unused const 'dockerPullProgressEmitInterval' 2018-05-07 16:24:48 -04:00
Michael Schurter f1d13683e6
consul: remove services with/without canary tags
Guard against Canary being set to false at the same time as an
allocation is being stopped: this could cause RemoveTask to be called
with the wrong Canary value and leaking a service.

Deleting both Canary values is the safest route.
2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Alex Dadgar df8fce4347
Ensure canaries tags are interpolated 2018-05-07 14:50:01 -05:00
Alex Dadgar 552604451c
rework where time gets set 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Nick Ethier d8de354dbf
client/driver: add waiting layer status count to pull progress status msg 2018-05-07 12:18:20 -04:00
Nick Ethier 77af17efbc
client/driver: add seperate handler for emitting pull progress 2018-05-07 12:17:34 -04:00
Nick Ethier 0bdd976b7d
client/driver: remove pull timeout due to race condition that can lead to unexpected timeouts
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
2018-05-07 12:18:11 -04:00
Nick Ethier 7c5821d7c6
client/driver: do accounting on layer pull progress 2018-05-07 12:17:53 -04:00
Nick Ethier 8efda7dc6c
client/driver: emit progress to all allocs pulling same image 2018-05-07 12:17:34 -04:00
Nick Ethier e35948ab91
client/driver: add image pull progress monitoring 2018-05-07 12:17:38 -04:00
Michael Schurter 0d534d30d6
Merge pull request #4251 from hashicorp/f-grpc-checks
Support Consul gRPC Health Checks
2018-05-04 14:55:16 -07:00
Michael Schurter f6a4713141 consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00
Michael Schurter 382caec1e1 consul: initial grpc implementation
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Jesus Vazquez 08a390448b Update counter driver.docker.oom labels 2018-05-04 14:02:34 +08:00
Jesus Vazquez 4f6db56283 Initialize dockerhandle with jobname, taskgroupname, taskname and allocid 2018-05-04 14:02:19 +08:00
Jesus Vazquez 127b764dfb Add Job, taskgroupname, taskname, and allocid to the DockerHandle struct 2018-05-04 14:01:26 +08:00
Jesus Vazquez fd1ff1a0cf Run goimports 2018-05-04 13:46:36 +08:00
Jesus Vazquez 5dd4059527 Add driver.docker counter metric for OOM Killer events 2018-05-04 13:46:36 +08:00
Michael Schurter 526af6a246 framer: fix early exit/truncation in framer 2018-05-02 10:46:16 -07:00
Michael Schurter f1a6aa103a framer: fix race and remove unused error var
In the old code `sending` in the `send()` method shared the Data slice's
underlying backing array with its caller. Clearing StreamFrame.Data
didn't break the reference from the sent frame to the StreamFramer's
data slice.
2018-05-02 10:46:16 -07:00
Michael Schurter 7360fe3a6d client: squelch errors on cleanly closed pipes 2018-05-02 10:46:16 -07:00
Michael Schurter ffff97e25f client: don't spin on read errors 2018-05-02 10:46:16 -07:00
Michael Schurter 5ef0a82e6e client: reset encoders between uses
According to go/codec's docs, Reset(...) should be called on
Decoders/Encoders before reuse:

https://godoc.org/github.com/ugorji/go/codec

I could find no evidence that *not* calling Reset() caused bugs, but
might as well do what the docs say?
2018-05-02 10:46:16 -07:00
Alex Dadgar de4af37249 version bump and remove generated 2018-04-27 11:10:00 -07:00
Alex Dadgar 845a43864a generated files 2018-04-27 10:45:40 -07:00
Alex Dadgar 35e06ddb31 Remove generated and version bump 2018-04-26 16:49:19 -07:00
Alex Dadgar 43192cefae generated files 2018-04-26 16:28:58 -07:00
Michael Schurter 0e602d4779
Merge pull request #4188 from hashicorp/f-rkt-stats
rkt: create parent cgroup to enable stats
2018-04-24 14:54:36 -07:00
Michael Schurter d687761ebf rkt: test Stats() and always run tests
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
2018-04-24 11:05:42 -07:00
Javier Palomo Almena 3e6c01ffa1 docker tests: Fix usage of NewDriverContext 2018-04-23 22:51:06 +02:00
Javier Palomo Almena 74d3c5df07 DriverContext: Add the TaskGroup and the Job name
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.

An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...

Ref: https://github.com/hashicorp/nomad/pull/4185
2018-04-23 00:15:29 +02:00
Michael Schurter 4cee6cca6c rkt: create parent cgroup to enable stats
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
2018-04-19 15:14:56 -07:00
Michael Schurter 1a85d0c990 run goimports 2018-04-19 11:16:28 -07:00
Michael Schurter d77c265d1f
Merge pull request #4168 from ninoles/b-2117-windows-group-process
B 2117 windows group process
2018-04-19 11:10:51 -07:00
Michael Schurter fdbcbd4e5b
Merge pull request #4058 from hashicorp/f-mock-by-default
[Post-0.8] test: build with mock_driver by default
2018-04-18 15:57:00 -07:00
Michael Schurter d3650fb2cd test: build with mock_driver by default
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Michael Schurter a991923389 tests: fix race in alloc_runner_test.go
I could not reproduce the failure locally even with `stress -cpu ...`
eating all the cpu it could on my machine.

But I think the race was in one of two places:
* The task could restart which could create new events
* I think there could be a race between the updater's version of events
  and alloc runners as updates are async

I fixed both. Here's hoping that fixes this flaky test.
2018-04-17 17:14:59 -07:00
Fabien Ninoles c81bec48c9 Merge branch 'master' into b-2117-windows-group-process 2018-04-17 13:47:25 -04:00
Fabien Ninoles 35cf641416 Update based on PR request. 2018-04-17 13:43:04 -04:00
Alex Dadgar c4ad76091d
Merge pull request #4166 from hashicorp/b-panic-fix-update
Fixes races accessing node and updating it during fingerprinting
2018-04-17 10:02:19 -07:00
Chelsea Holland Komlo 9b8a079558 fix up comments 2018-04-17 11:53:08 -04:00
Alex Dadgar 9d612c8cb0 Cleanup 2018-04-16 15:48:34 -07:00
Alex Dadgar 32adaf9dfc Copy the config given to the alloc runner 2018-04-16 15:45:52 -07:00
Alex Dadgar 3ff2d4d795 fix race node access 2018-04-16 15:45:51 -07:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar 0b799822ff Operate on copy 2018-04-16 15:45:49 -07:00
Fabien Ninoles 27cf4995ce - Clean up for windows compilation.
- Set CREATE_NEW_PROCESS_GROUP for Windows subprocess.
- Ensure we only kill actual process that need to.
2018-04-14 13:58:42 -04:00
Michael Schurter 3836b8a335
Merge pull request #3572 from emate/master
Create new process group on process startup.
2018-04-13 11:56:38 -07:00
Alex Dadgar adaf4fa7e0 Remove generated structs 2018-04-12 16:35:31 -07:00
Alex Dadgar 663c4d0433 Version bump and generated files 2018-04-12 16:21:50 -07:00
Alex Dadgar ff1a1a63e8 Move where attribute for driver detection is set 2018-04-12 15:50:25 -07:00
Chelsea Holland Komlo 5291788b40 delete driver name from only health check attributes 2018-04-12 18:24:41 -04:00
Alex Dadgar 3d53d380f7 Fix tests 2018-04-12 14:29:30 -07:00
Alex Dadgar f24ce2c50c Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Charlie Voiselle ba88f00ccb Changed "til" to "until"
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Chelsea Komlo eb5aac16e6
Merge pull request #4111 from hashicorp/b-undetected-set-health-to-false
Immediately set driver health status to false when driver moves to undetected
2018-04-10 18:30:31 -04:00
Chelsea Holland Komlo d58b3e473c update comment for when the fingerprinter setting health status 2018-04-10 16:53:00 -04:00
Chelsea Holland Komlo f7ef13cc64 fingerprinter should set health check status if health check is not periodic 2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo ede4f518bd add setters for access to the fingerprint manager's node
refactor extracting driver info
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo f479da19f5 guard against overwriting health status 2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo ece1618815 immediately set healthy to false when driver moves to undetected 2018-04-10 15:29:51 -04:00
Alex Dadgar 3d367d6fd7 Fix client uptime metric missing client prefix 2018-04-10 10:39:36 -07:00
Seth Vargo df4fe7e76c
Set user-agent when talking to GCE metadata 2018-04-10 10:36:46 -04:00
Chelsea Komlo d3bd8fb96e
Merge pull request #4109 from hashicorp/f-shorten-docker-health-timeout
Shorten docker health timeout
2018-04-09 15:38:39 -04:00
Chelsea Holland Komlo ea4b65dd41 only initialize docker clients if they are nil 2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo 288c7a33a1 refacotoring simplification from code review 2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo 6e3b056c37 only run health check if driver moves from undetected to detected 2018-04-09 10:10:43 -04:00
Alex Dadgar ae1f76477e Start rebalance after discovering new servers 2018-04-05 15:41:59 -07:00
Alex Dadgar 929b6823a3
Merge pull request #4106 from hashicorp/b-servers
Improved Client handling of failed RPCs
2018-04-05 13:48:50 -07:00
Alex Dadgar be2513e0f9 more jitter 2018-04-05 13:48:33 -07:00
Chelsea Holland Komlo d3637825ef group similar functions; update comments
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00