Commit graph

1560 commits

Author SHA1 Message Date
Nick Ethier 0bdd976b7d
client/driver: remove pull timeout due to race condition that can lead to unexpected timeouts
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
2018-05-07 12:18:11 -04:00
Nick Ethier 7c5821d7c6
client/driver: do accounting on layer pull progress 2018-05-07 12:17:53 -04:00
Nick Ethier 8efda7dc6c
client/driver: emit progress to all allocs pulling same image 2018-05-07 12:17:34 -04:00
Nick Ethier e35948ab91
client/driver: add image pull progress monitoring 2018-05-07 12:17:38 -04:00
Jesus Vazquez 08a390448b Update counter driver.docker.oom labels 2018-05-04 14:02:34 +08:00
Jesus Vazquez 4f6db56283 Initialize dockerhandle with jobname, taskgroupname, taskname and allocid 2018-05-04 14:02:19 +08:00
Jesus Vazquez 127b764dfb Add Job, taskgroupname, taskname, and allocid to the DockerHandle struct 2018-05-04 14:01:26 +08:00
Jesus Vazquez fd1ff1a0cf Run goimports 2018-05-04 13:46:36 +08:00
Jesus Vazquez 5dd4059527 Add driver.docker counter metric for OOM Killer events 2018-05-04 13:46:36 +08:00
Michael Schurter 0e602d4779
Merge pull request #4188 from hashicorp/f-rkt-stats
rkt: create parent cgroup to enable stats
2018-04-24 14:54:36 -07:00
Michael Schurter d687761ebf rkt: test Stats() and always run tests
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
2018-04-24 11:05:42 -07:00
Javier Palomo Almena 3e6c01ffa1 docker tests: Fix usage of NewDriverContext 2018-04-23 22:51:06 +02:00
Javier Palomo Almena 74d3c5df07 DriverContext: Add the TaskGroup and the Job name
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.

An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...

Ref: https://github.com/hashicorp/nomad/pull/4185
2018-04-23 00:15:29 +02:00
Michael Schurter 4cee6cca6c rkt: create parent cgroup to enable stats
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
2018-04-19 15:14:56 -07:00
Michael Schurter 1a85d0c990 run goimports 2018-04-19 11:16:28 -07:00
Michael Schurter d77c265d1f
Merge pull request #4168 from ninoles/b-2117-windows-group-process
B 2117 windows group process
2018-04-19 11:10:51 -07:00
Michael Schurter d3650fb2cd test: build with mock_driver by default
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Fabien Ninoles 35cf641416 Update based on PR request. 2018-04-17 13:43:04 -04:00
Fabien Ninoles 27cf4995ce - Clean up for windows compilation.
- Set CREATE_NEW_PROCESS_GROUP for Windows subprocess.
- Ensure we only kill actual process that need to.
2018-04-14 13:58:42 -04:00
Michael Schurter 3836b8a335
Merge pull request #3572 from emate/master
Create new process group on process startup.
2018-04-13 11:56:38 -07:00
Alex Dadgar f24ce2c50c Driver health detection cleanups
This PR does:

1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Chelsea Holland Komlo ea4b65dd41 only initialize docker clients if they are nil 2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo 288c7a33a1 refacotoring simplification from code review 2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo d3637825ef group similar functions; update comments
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo e8743f1f7b remove do once block when creating a new docker client
only set cached connections upon no error
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo d0d793fc23 use client with shorter timeouts for health checks 2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo 5d1b2b77cb refactor docker clients method to be able to extend to creating new clients 2018-04-05 16:19:02 -04:00
Charlie Voiselle ea10588227 rkt: logging enhancements (#4044)
* Added extra debug logging; extended timeout; added jitter.

* small log changes

* increase timeout

* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Alex Dadgar da27fc3880 Driver Info output 2018-03-22 17:18:32 -07:00
Michael Schurter a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter bb0ff44fb4 mock_driver: improve Kill() logging 2018-03-21 16:49:48 -07:00
Alex Dadgar 5df4b3728d Docker driver doesn't return errors but injects into the DriverInfo 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 0425be8f48 updating comments; locking concurrent node access 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter 1022170bf3 executor: increase level for helpful log lines
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Marcin Matlaszek 6019a88824
Make raw_exec processes cleanup function more precise. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek bb36c122e2
Fix errors when trying to kill whole process group. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 86d650d7b0
Make starting & cleaning process group Windows compatible. 2018-03-20 13:40:21 +01:00
Marcin Matlaszek 79c139f2ef
Create new process group on process startup.
Clean up by sending SIGKILL to the whole process group.
2018-03-20 13:40:21 +01:00
Michael Schurter 32ee5e0d53
Merge pull request #3990 from hashicorp/f-rkt-groups
rkt: allow specifying --group
2018-03-16 11:19:53 -07:00
Michael Schurter bd78cfb039 rkt: allow specifying --group 2018-03-16 11:08:22 -07:00
Michael Schurter fb10ec9c01 docker: make volume errors recoverable
The interface+mock just to test this one little error handling may seem
like overkill but there was just no other way to write an automated test
around this logic as there's no way to simluate this error with stock
Docker.
2018-03-15 17:52:43 -07:00
Michael Schurter 79df90acb0
Merge pull request #3958 from simplesurance/swappiness
fix: disable swap for executor_linux allocations
2018-03-13 10:10:22 -07:00
Fabian Holler e6af051c93 fix: disable swap for executor_linux allocations
A comment in the nomad source code states that swapping for
executor_linux allocations is disabled but it wasn't.

Nomad wrote -1 to the memsw.limit_in_bytes cgroup file to disable
swapping.
This has the following problems:

1.) Writing -1 to the file does not disable swapping. It sets
    the limit for memory and swap to unlimited.
2.) On common Linux distributions like Ubuntu 16.04 LTS the
    memsw.limit_in_bytes cgroup file does not exist by default.
    The memsw.limit_in_bytes file only exist if the Linux kernel is
    build with CONFIG_MEMCG_SWAP=yes and either
    CONFIG_MEMCG_SWAP_ENABLED=yes or when the kernel parameter
    swapaccount=1 is passed during boot.
    Most Linux distributions disable swap accounting by default because
    of higher memory usage.
    Nomad silently ignores if writing to the memsw.limit_in_bytes file
    fails. The allocation succeeds, no message is logged to notify the
    user.

To ensure that disabling swap works on common Linux kernels, disable
swapping by writing 0 to the memory.swappiness file.
Using the memory.swappiness file only requires that the kernel is
compiled with CONFIG_MEMCG=yes. This is the default in common Linux
kernels.
2018-03-13 10:52:50 +01:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 1359fd2c3d spelling: unexpected 2018-03-11 19:08:07 +00:00
Josh Soref 8978caea28 spelling: shutdown 2018-03-11 18:55:49 +00:00
Josh Soref 8d191c9273 spelling: severity 2018-03-11 18:53:52 +00:00
Josh Soref 3787d8141e spelling: serialize 2018-03-11 18:53:39 +00:00
Josh Soref e4639ac62f spelling: secrets 2018-03-11 18:53:26 +00:00
Josh Soref cec45c6bc8 spelling: safety 2018-03-11 18:52:54 +00:00
Josh Soref de9d0c7180 spelling: retrieved 2018-03-11 18:51:40 +00:00
Josh Soref e949d23e1b spelling: resource 2018-03-11 18:51:03 +00:00
Josh Soref b47ab9ab8c spelling: removes 2018-03-11 18:41:43 +00:00
Josh Soref db166c6cf6 spelling: remnants 2018-03-11 18:41:26 +00:00
Josh Soref b6ec60fb5f spelling: isolation 2018-03-11 18:19:02 +00:00
Josh Soref c1a0ae3161 spelling: inspect 2018-03-11 18:15:27 +00:00
Josh Soref 2a1cf2f216 spelling: initialization 2018-03-11 18:18:37 +00:00
Josh Soref 7f6e4012a0 spelling: existent 2018-03-11 18:30:37 +00:00
Josh Soref 7cd95f6eb3 spelling: executor 2018-03-11 18:05:31 +00:00
Josh Soref e8478c4065 spelling: documentation 2018-03-11 17:55:21 +00:00
Josh Soref 4241ffc5ab spelling: disable 2018-03-11 17:55:12 +00:00
Josh Soref 2f135f0ed7 spelling: destroy 2018-03-11 17:54:13 +00:00
Josh Soref f2a7c95379 spelling: constraints 2018-03-11 17:50:28 +00:00
Josh Soref cb1303e47a spelling: conjunction 2018-03-11 17:48:37 +00:00
Josh Soref 42fa13bbc6 spelling: cancelled 2018-03-11 17:45:47 +00:00
Josh Soref 7077386916 spelling: cancelable 2018-03-11 17:45:34 +00:00
Josh Soref a70fe97556 spelling: assert 2018-03-11 17:41:33 +00:00
Michael Schurter 557a70f78d
Merge pull request #3917 from jaininshah9/master
changing the formula to correctly pass the CPUQota to docker
2018-02-28 20:00:37 -08:00
Jainin Shah 39e1fc06e5 adding comments to the change 2018-02-28 16:19:51 -08:00
Preetha Appan eaedffc7f7
Fix go vet errors 2018-02-28 12:21:27 -06:00
Jainin Shah 6eb7da002f changing the formula to correctly pass the CPUQota to docker 2018-02-27 12:32:23 -08:00
Alex Dadgar f9cf642436 Client tls 2018-02-15 15:22:57 -08:00
Alex Dadgar 69def2ff22 Server tests of logs 2018-02-15 13:59:02 -08:00
Chelsea Komlo 0c0b56a1a4
Merge pull request #3807 from hashicorp/f-client-add-fingerprint-manager
Add fingerprint manager to manage fingerprinting node
2018-02-13 11:22:50 -05:00
Chelsea Holland Komlo b321287712 extract test helper
lock concurrent accesses to node

comment exported method
2018-02-12 18:30:10 -05:00
Michael Schurter 101e85f078
Merge pull request #3819 from schmichael/qemu-graceful-shutdown-alpine
Test QEMU graceful shutdown
2018-02-12 12:32:14 -08:00
Michael Schurter ed6bce2ccf Improve test logging 2018-02-12 11:25:52 -08:00
Michael Schurter 06397ba59d
Merge pull request #3825 from jaininshah9/master
add a flag for cpu_hard_limit
2018-02-08 20:40:38 -08:00
Michael Schurter 6e6915e7f5 Merge branch 'master' into f-cpu_hard_limit 2018-02-08 20:14:29 -08:00
Alan Scherger eee7144643 drivers: use ctx.TaskEnv for mount points 2018-02-08 12:59:20 -06:00
Jainin Shah a4516aa71a removing underscore in variable name 2018-02-07 16:28:43 -08:00
Jainin Shah 8149587abe clearing the confusion between microsecond,nanosecond and millisecond 2018-02-06 19:11:39 -08:00
Jainin Shah d3087d6069 using d.node.Resources.CPU as suggested 2018-02-06 14:52:15 -08:00
Michael Schurter 279a3b3f28
Merge pull request #3790 from 42wim/dockerv6
Service registration for IPv6 docker addresses (Fixes #3785)
2018-02-05 17:07:53 -08:00
Michael Schurter 25f0ad050f docker: Skip IPv6 test if IPv6 disabled 2018-02-05 16:24:30 -08:00
Chelsea Komlo 42d20234a3
Merge pull request #3781 from hashicorp/f-client-fingerprint-refactor
Refactor client fingerprinters to return a diff of node attributes
2018-02-01 20:13:44 -05:00
Chelsea Holland Komlo 6f9c0ab361 req/resp should be within config locks; rename for detected fingerprints
changelog
2018-02-01 19:00:39 -05:00
Wim a1a2ca8e33 Add AdvertiseIPv6Address test 2018-02-01 23:21:47 +01:00
Jainin Shah 94d0ce6006 wrapping the line to less than 80 characters 2018-02-01 14:16:38 -08:00
Jainin Shah 0d99f256de changes after running go fmt 2018-02-01 12:07:05 -08:00
Jainin Shah 04c14b3cb2 add a flag for cpu_hard_limit 2018-02-01 10:09:12 -08:00
Chelsea Holland Komlo b54203eddc add detected to more drivers where the driver is found but unusable 2018-02-01 11:28:17 -05:00
Michael Schurter 0ac43a7622 Skip QEMU graceful shutdown test except on Travis
Hopefully we can reuse the SkipSlow helper elsewhere.
2018-01-31 15:47:26 -08:00
Chelsea Holland Komlo b8e8064835 code review fixup 2018-01-31 18:34:03 -05:00
Michael Schurter 24d060bbb4 Test graceful shutdown
Uses an Alpine image which supports ACPI poweroff signal handling.
Handling is only enabled after the VM has booted, so this test blocks
until sshd starts before issuing the command.
2018-01-31 15:05:02 -08:00
Wim db3bdfe898 * Change use_ipv6_address to advertise_ipv6_address.
* Set autoadvertise to true.
* Update documentation.
2018-02-01 00:01:25 +01:00
Chelsea Holland Komlo 7b53474a6e add applicable boolean to fingerprint response
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Michael Schurter cc54e36f91
Merge pull request #3798 from simar7/qemu-graceful-shutdown-bug
[QEMU] Fixing an unintentional variable shadowing
2018-01-30 17:43:44 -08:00
Michael Schurter c662cc0172
Merge pull request #3773 from mikemccracken/2018-01-18/destroy-container-on-err
lxc: cleanup partially configured containers after errors in Start
2018-01-30 14:52:29 -08:00
Wim 76f09db067 Service registration for IPv6 docker addresses 2018-01-30 17:07:47 +01:00
Alex Dadgar 3ad5916f72
Merge pull request #3799 from mikemccracken/2018-01-25/lxc-log-outside-container
lxc: move lxc log file out of container-visible alloc dir
2018-01-29 14:32:22 -08:00
Chelsea Holland Komlo 14147c8496 remove attributes from periodic fingerprints when state changes
write test for client periodic fingerprinters
2018-01-29 13:48:54 -05:00
Alex Dadgar 3d28774f74
Merge pull request #3802 from filipochnik/docker-readonly-rootfs
Add ReadonlyRootfs option to the Docker driver
2018-01-29 09:47:27 -08:00
Indradhanush Gupta 7db4ee1122 rkt_test.go: Remove underscore from variable names 2018-01-29 11:39:50 +01:00
Filip Ochnik 80a17ee8dd Add ReadonlyRootfs option to the Docker driver 2018-01-27 14:38:29 +01:00
Chelsea Holland Komlo 7c19de797c create safe getters and setters for fingerprint response 2018-01-26 11:22:05 -05:00
Simarpreet Singh ac720b84f0
qemu: Make the driver debugging output more indicative
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:40:16 -08:00
Simarpreet Singh 8b058f7570
qemu: Fix unintentional shadowing of monitorPath variable
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:24:10 -08:00
Michael McCracken 09c9ca23f5 lxc: move lxc log file out of container-visible alloc dir
The LXC runtime's log file is currently written to TaskDir.LogDir,
which is mounted as alloc/logs inside the containers in the task
group.

This file is not intended to be visible to containers, and depending
on the log level, may have information about the host that a container
should not be allowed to see.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 14:41:37 -08:00
Michael McCracken 88e3063717 fix speling in log
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 13:56:14 -08:00
Chelsea Holland Komlo 9a8344333b refactor Fingerprint to request/response construct 2018-01-24 11:54:02 -05:00
Michael McCracken f8fe2ea8cb review cleanup
don't export an internal function, and simplify some code

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-23 15:03:09 -08:00
Alex Dadgar a43e0a7b08 Allow overriding an image's entrypoint in Docker
Fixes https://github.com/hashicorp/nomad/issues/2219
2018-01-23 14:05:00 -08:00
Alex Dadgar 98a03ad689
Merge pull request #3754 from filipochnik/docker-caps
Add an option to add and drop capabilities in the Docker driver
2018-01-23 12:02:50 -08:00
Filip Ochnik 4abd269a68
Merge branch 'master' into docker-caps 2018-01-21 12:18:22 +01:00
Filip Ochnik 558812350d Finish implementation of the capabilities whitelist 2018-01-21 12:14:24 +01:00
Michael McCracken 00dcfa6db9 lxc: cleanup partially configured containers after errors in Start
If there are any errors in container setup after c.Create() in
Start(), the container will be left around, with no way to clean it up
because the handle will not be created or returned from Start.

Added a wrapper that checks for errors and performs appropriate
cleanup. Returning a cleanup function from a wrapped function instead
of just doing the cleanup before returning the error helps to ensure
that future changes that might add or change error exits can't forget
to consider a cleanup function.

Adds a check to the invalid config test case to check that a container
created with an invalid config doesn't get left behind.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 16:03:03 -08:00
Michael Schurter 9d410c88a7 Improve driver network logging 2018-01-18 15:35:24 -08:00
Michael Schurter 583e17fad5 Always advertise driver IP when in driver mode
Fixes #3681

When in drive address mode Nomad should always advertise the driver's IP
in Consul even when no network exists. This matches the 0.6 behavior.

When in host address mode Nomad advertises the alloc's network's IP if
one exists. Otherwise it lets Consul determine the IP.

I also added some much needed logging around Docker's network discovery.
2018-01-18 15:35:24 -08:00
Michael McCracken 70817f728c lxc_test: add test for contents of file in bind-mounted dir
Ensure that bind mounting via the volumes config really did work.

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 05:36:45 -08:00
Michael McCracken fd44bdee37 Simplify with gofmt -s
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken f176e02a64 lxc: add tests for volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken c78c00a2d2 lxc: Add config flag to disable volume support
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken d694a8921f Add volumes config to LXC driver
Allow lxc driver to accept bind mount config similarly to the docker
driver.

Includes some static sanity checks in Validate step

Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Filip Ochnik 4eeb552a4f Add a sketch of capabilities whitelist logic for the Docker driver 2018-01-14 20:01:47 +01:00
Filip Ochnik 8ee3ce7a26 Add an option to add and drop capabilities in the Docker driver 2018-01-14 19:56:57 +01:00
Alex Dadgar bec9a72eec Remove networking from basic resources 2018-01-12 14:33:42 -08:00
Charlie Voiselle 867bb6f7f9 Found more priviledge.
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Charlie Voiselle 1bb1ab5069 fix typo
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Michael Schurter 5032bf4f5a Skip tests that require root when not root
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar f0b0697b57 Keyify struct 2017-12-11 17:23:14 -08:00
Michael Schurter c4d4ead199 Fix test broken by mock updates 2017-12-08 16:45:25 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Chelsea Holland Komlo 61fa8ad4ba code review fixes 2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo 77ab41124b set default kill signal on executor shutdown 2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo 6cae8fe6e6 extend configurable kill signal to java driver 2017-12-07 11:40:10 -05:00
Chelsea Holland Komlo 350319239c change location of default kill signal 2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo 7dfb64f941 extract signal helper into utils 2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo b08611cfac move kill_signal to task level, extend to docker 2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo 80de7d5ebd allow controlling the stop signal in exec/raw_exec 2017-12-06 11:28:45 -05:00
Chelsea Komlo 9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Chelsea Holland Komlo 4463dc607e fix up test 2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo 7284f2385a remove unused user option 2017-12-04 18:01:31 -05:00
Michael Schurter 6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00