Commit graph

1545 commits

Author SHA1 Message Date
Alex Dadgar 7e6dd498c9 Remove debug logging 2018-05-31 15:52:42 -07:00
Alex Dadgar b1b908527f spelling 2018-05-31 15:29:55 -07:00
Alex Dadgar a3b29553a5 Force close stdout/stderr after grace
This commit changes the force closing of the stdout/stderr file
descriptor from closing immediately to being closed after a grace
period. This allows the created process to close its own file and allows
copying of the data.
2018-05-31 15:21:36 -07:00
Alex Dadgar 5e787e2d72 test build 2018-05-31 12:22:31 -07:00
Alex Dadgar ead1b7f423 Log more info for TestExecutor_IsolationAndConstraints 2018-05-31 11:57:44 -07:00
Alex Dadgar b05740ad13
Merge pull request #4341 from hashicorp/f-docker-pids
Support Docker Pids Limit
2018-05-31 17:59:29 +00:00
Alex Dadgar f4d4bbdc97 test pid limit 2018-05-30 12:55:24 -07:00
Chelsea Holland Komlo 94d510e969 Support Docker Pids Limit 2018-05-25 19:54:14 -04:00
Alex Dadgar 1685c8ebe4 cleanup 2018-05-24 16:25:20 -07:00
Alex Dadgar 2eacdb6bd6 Force closing of pipe to child process 2018-05-24 16:03:48 -07:00
Preetha 9084bb025e
Merge pull request #4303 from hashicorp/b-docker-client-nil-panic
Add nil check before setting timeout on docker client
2018-05-21 19:34:44 -07:00
Jesus Vazquez 23d959e42c Add job, task, taskgroup to open method 2018-05-21 20:37:18 +02:00
Jesus Vazquez 0a062a04c7 Remove allocID from dockerhandle struct 2018-05-21 20:33:01 +02:00
Jesus Vazquez e5a81815bb Rename labels job, task_group and task 2018-05-21 20:32:50 +02:00
Jesus Vazquez ffe1b1a1b6 Remove allocid label from driver.docker.oom counter metric 2018-05-21 20:30:56 +02:00
Alex Dadgar 38762d9bde
Merge pull request #4282 from hashicorp/f-rotator
Avoid splitting log line across two files
2018-05-21 17:52:13 +00:00
Alex Dadgar d95698e2c5
Merge pull request #4298 from justenwalker/docker-driver-digest-tags
driver/docker: pull image with digest
2018-05-21 17:46:14 +00:00
Justen Walker a8989f33bb driver/docker: add test for dockerImageRef 2018-05-14 14:24:03 -04:00
Justen Walker 194b2231d6 driver/docker: fix up TestParseDockerImage 2018-05-14 14:23:48 -04:00
Justen Walker 25b2807ce3 driver/docker: fix TestDockerDriver_ForcePull_RepoDigest 2018-05-14 14:23:02 -04:00
Nick Ethier c4d07a2200
client/driver: gaurd authHelper test from running on windows 2018-05-14 13:46:57 -04:00
Justen Walker b23ca7574c driver/docker: cleanup parseDockerImage 2018-05-14 11:11:51 -04:00
Justen Walker 60f7f1aa08 driver/docker: pull image with digest
GH #4290

Add digest support to the docker driver image config. This commit
factors out some common code to print the repo:tag (dockerImageRef) for
events/logs as well as parsing the image to retreive the repo,tag
(parseDockerImage) so that the results are consistent/sane for both
repo:tag and repo@sha256:... references.

When pulling an image with a digest, the tag is blank and the repo
contains the digest. See:
https://github.com/fsouza/go-dockerclient/blob/master/image_test.go#L471
2018-05-14 10:42:58 -04:00
Preetha Appan de66ec7394
Add nil check before setting timeout on docker client 2018-05-11 17:09:26 -05:00
Alex Dadgar 7ad5c76734 Add new line test 2018-05-11 10:52:09 -07:00
Alex Dadgar 3671ed139d Avoid splitting log line across two files
We attempt to avoid splitting a log line between two files by detecting
if we are near the file size limit and scanning for new lines and only
flushing those.

BenchmarkRotator/1KB-8            300000              5613 ns/op
BenchmarkRotator/2KB-8            200000              8384 ns/op
BenchmarkRotator/4KB-8            100000             14604 ns/op
BenchmarkRotator/8KB-8             50000             25002 ns/op
BenchmarkRotator/16KB-8            30000             47572 ns/op
BenchmarkRotator/32KB-8            20000             92080 ns/op
BenchmarkRotator/64KB-8            10000            165883 ns/op
BenchmarkRotator/128KB-8            5000            294405 ns/op
BenchmarkRotator/256KB-8            2000            572374 ns/op
2018-05-10 15:11:01 -07:00
Alex Dadgar f5d91b5338 Benchmark for rotator
BenchmarkRotator/1KB-8            200000              5572 ns/op
BenchmarkRotator/2KB-8            200000              8338 ns/op
BenchmarkRotator/4KB-8            100000             14246 ns/op
BenchmarkRotator/8KB-8             50000             25279 ns/op
BenchmarkRotator/16KB-8            30000             48602 ns/op
BenchmarkRotator/32KB-8            20000             92159 ns/op
BenchmarkRotator/64KB-8            10000            154766 ns/op
BenchmarkRotator/128KB-8            5000            296872 ns/op
BenchmarkRotator/256KB-8            3000            551793 ns/op
2018-05-10 14:15:15 -07:00
Nick Ethier 91603a377e
client/driver: parse repo instead of attempting to pull repo info 2018-05-09 22:34:25 -04:00
Nick Ethier 38a33f9c75
client/driver: add test for docker auth helper 2018-05-09 22:33:56 -04:00
Alex Dadgar e067a9ae06 naming of constants 2018-05-09 16:46:52 -07:00
Nick Ethier 3598925ca4
client/driver: use correct repo address when using docker-credential helper 2018-05-08 15:17:28 -04:00
Nick Ethier 54c86a0292
client/driver/env: interpolate empty optional meta params as empty strings 2018-05-07 20:19:51 -04:00
Nick Ethier 016ab7a105
client/driver: remove unused const 'dockerPullProgressEmitInterval' 2018-05-07 16:24:48 -04:00
Nick Ethier d8de354dbf
client/driver: add waiting layer status count to pull progress status msg 2018-05-07 12:18:20 -04:00
Nick Ethier 77af17efbc
client/driver: add seperate handler for emitting pull progress 2018-05-07 12:17:34 -04:00
Nick Ethier 0bdd976b7d
client/driver: remove pull timeout due to race condition that can lead to unexpected timeouts
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
2018-05-07 12:18:11 -04:00
Nick Ethier 7c5821d7c6
client/driver: do accounting on layer pull progress 2018-05-07 12:17:53 -04:00
Nick Ethier 8efda7dc6c
client/driver: emit progress to all allocs pulling same image 2018-05-07 12:17:34 -04:00
Nick Ethier e35948ab91
client/driver: add image pull progress monitoring 2018-05-07 12:17:38 -04:00
Jesus Vazquez 08a390448b Update counter driver.docker.oom labels 2018-05-04 14:02:34 +08:00
Jesus Vazquez 4f6db56283 Initialize dockerhandle with jobname, taskgroupname, taskname and allocid 2018-05-04 14:02:19 +08:00
Jesus Vazquez 127b764dfb Add Job, taskgroupname, taskname, and allocid to the DockerHandle struct 2018-05-04 14:01:26 +08:00
Jesus Vazquez fd1ff1a0cf Run goimports 2018-05-04 13:46:36 +08:00
Jesus Vazquez 5dd4059527 Add driver.docker counter metric for OOM Killer events 2018-05-04 13:46:36 +08:00
Michael Schurter 0e602d4779
Merge pull request #4188 from hashicorp/f-rkt-stats
rkt: create parent cgroup to enable stats
2018-04-24 14:54:36 -07:00
Michael Schurter d687761ebf rkt: test Stats() and always run tests
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
2018-04-24 11:05:42 -07:00
Javier Palomo Almena 3e6c01ffa1 docker tests: Fix usage of NewDriverContext 2018-04-23 22:51:06 +02:00
Javier Palomo Almena 74d3c5df07 DriverContext: Add the TaskGroup and the Job name
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.

An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...

Ref: https://github.com/hashicorp/nomad/pull/4185
2018-04-23 00:15:29 +02:00
Michael Schurter 4cee6cca6c rkt: create parent cgroup to enable stats
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
2018-04-19 15:14:56 -07:00
Michael Schurter 1a85d0c990 run goimports 2018-04-19 11:16:28 -07:00
Michael Schurter d77c265d1f
Merge pull request #4168 from ninoles/b-2117-windows-group-process
B 2117 windows group process
2018-04-19 11:10:51 -07:00
Michael Schurter d3650fb2cd test: build with mock_driver by default
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Fabien Ninoles 35cf641416 Update based on PR request. 2018-04-17 13:43:04 -04:00
Fabien Ninoles 27cf4995ce - Clean up for windows compilation.
- Set CREATE_NEW_PROCESS_GROUP for Windows subprocess.
- Ensure we only kill actual process that need to.
2018-04-14 13:58:42 -04:00
Michael Schurter 3836b8a335
Merge pull request #3572 from emate/master
Create new process group on process startup.
2018-04-13 11:56:38 -07:00
Alex Dadgar f24ce2c50c Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Chelsea Holland Komlo ea4b65dd41 only initialize docker clients if they are nil 2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo 288c7a33a1 refacotoring simplification from code review 2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo d3637825ef group similar functions; update comments
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo e8743f1f7b remove do once block when creating a new docker client
only set cached connections upon no error
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo d0d793fc23 use client with shorter timeouts for health checks 2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo 5d1b2b77cb refactor docker clients method to be able to extend to creating new clients 2018-04-05 16:19:02 -04:00
Charlie Voiselle ea10588227 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Alex Dadgar da27fc3880 Driver Info output 2018-03-22 17:18:32 -07:00
Michael Schurter a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter bb0ff44fb4 mock_driver: improve Kill() logging 2018-03-21 16:49:48 -07:00
Alex Dadgar 5df4b3728d Docker driver doesn't return errors but injects into the DriverInfo 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter 1022170bf3 executor: increase level for helpful log lines
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Marcin Matlaszek 6019a88824
Make raw_exec processes cleanup function more precise. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek bb36c122e2
Fix errors when trying to kill whole process group. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 86d650d7b0
Make starting & cleaning process group Windows compatible. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 79c139f2ef
Create new process group on process startup.
Clean up by sending SIGKILL to the whole process group.
2018-03-20 13:40:21 +01:00
Michael Schurter 32ee5e0d53
Merge pull request #3990 from hashicorp/f-rkt-groups
rkt: allow specifying --group
2018-03-16 11:19:53 -07:00
Michael Schurter bd78cfb039 rkt: allow specifying --group 2018-03-16 11:08:22 -07:00
Michael Schurter fb10ec9c01 docker: make volume errors recoverable
The interface+mock just to test this one little error handling may seem
like overkill but there was just no other way to write an automated test
around this logic as there's no way to simluate this error with stock
Docker.
2018-03-15 17:52:43 -07:00
Michael Schurter 79df90acb0
Merge pull request #3958 from simplesurance/swappiness
fix: disable swap for executor_linux allocations
2018-03-13 10:10:22 -07:00
Fabian Holler e6af051c93 fix: disable swap for executor_linux allocations
A comment in the nomad source code states that swapping for
executor_linux allocations is disabled but it wasn't.

Nomad wrote -1 to the memsw.limit_in_bytes cgroup file to disable
swapping.
This has the following problems:

1.) Writing -1 to the file does not disable swapping. It sets
    the limit for memory and swap to unlimited.
2.) On common Linux distributions like Ubuntu 16.04 LTS the
    memsw.limit_in_bytes cgroup file does not exist by default.
    The memsw.limit_in_bytes file only exist if the Linux kernel is
    build with CONFIG_MEMCG_SWAP=yes and either
    CONFIG_MEMCG_SWAP_ENABLED=yes or when the kernel parameter
    swapaccount=1 is passed during boot.
    Most Linux distributions disable swap accounting by default because
    of higher memory usage.
    Nomad silently ignores if writing to the memsw.limit_in_bytes file
    fails. The allocation succeeds, no message is logged to notify the
    user.

To ensure that disabling swap works on common Linux kernels, disable
swapping by writing 0 to the memory.swappiness file.
Using the memory.swappiness file only requires that the kernel is
compiled with CONFIG_MEMCG=yes. This is the default in common Linux
kernels.
2018-03-13 10:52:50 +01:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 1359fd2c3d spelling: unexpected 2018-03-11 19:08:07 +00:00
Josh Soref 8978caea28 spelling: shutdown 2018-03-11 18:55:49 +00:00
Josh Soref 8d191c9273 spelling: severity 2018-03-11 18:53:52 +00:00
Josh Soref 3787d8141e spelling: serialize 2018-03-11 18:53:39 +00:00
Josh Soref e4639ac62f spelling: secrets 2018-03-11 18:53:26 +00:00
Josh Soref cec45c6bc8 spelling: safety 2018-03-11 18:52:54 +00:00
Josh Soref de9d0c7180 spelling: retrieved 2018-03-11 18:51:40 +00:00
Josh Soref e949d23e1b spelling: resource 2018-03-11 18:51:03 +00:00
Josh Soref b47ab9ab8c spelling: removes 2018-03-11 18:41:43 +00:00
Josh Soref db166c6cf6 spelling: remnants 2018-03-11 18:41:26 +00:00
Josh Soref b6ec60fb5f spelling: isolation 2018-03-11 18:19:02 +00:00
Josh Soref c1a0ae3161 spelling: inspect 2018-03-11 18:15:27 +00:00
Josh Soref 2a1cf2f216 spelling: initialization 2018-03-11 18:18:37 +00:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Josh Soref 7cd95f6eb3 spelling: executor 2018-03-11 18:05:31 +00:00