open-nomad

Author	SHA1	Message	Date
Mahmood Ali	ff48dbb8a9	Merge pull request #5163 from hashicorp/r-minor-changes-20180108 Fix a panic on node.Deregister fail	2019-01-09 09:56:00 -05:00
Mahmood Ali	1f2473263e	fix more cases of logging arity errors	2019-01-09 09:22:47 -05:00
Mahmood Ali	4952f2a182	Merge pull request #5159 from hashicorp/r-macos-tests Fix Travis MacOS job	2019-01-09 08:22:30 -05:00
Mahmood Ali	c78ed7246f	tests: run docker tests in macOS out of box Use `/tmp` as temporary directory for docker driver tests, so tests can run out of the box without any intervention. macOS sets tempdir as `/var`, which Docker does not whitelist as a path that can be bind-mounted.	2019-01-08 14:35:40 -05:00
Mahmood Ali	8f20bc8ce2	Merge pull request #5154 from hashicorp/f-revert-exec-devs drivers/exec: restrict devices exposed to tasks	2019-01-08 12:43:06 -05:00
Mahmood Ali	d19b92edec	executor: add a comment detailing isolation	2019-01-08 12:10:26 -05:00
Mahmood Ali	62a7f951c0	remove lxc references	2019-01-08 09:28:20 -05:00
Mahmood Ali	426c981c34	Remove some dead code	2019-01-08 09:11:48 -05:00
Mahmood Ali	64f80343fc	drivers: re-export ResourceUsage structs Re-export the ResourceUsage structs in drivers package to avoid drivers directly depending on the internal client/structs package directly. I attempted moving the structs to drivers, but that caused some import cycles that was a bit hard to disentagle. Alternatively, I added an alias here that's sufficient for our purposes of avoiding external drivers depend on internal packages, while allowing us to restructure packages in future without breaking source compatibility.	2019-01-08 09:11:47 -05:00
Mahmood Ali	916a40bb9e	move cstructs.DeviceNetwork to drivers pkg	2019-01-08 09:11:47 -05:00
Mahmood Ali	9369b123de	use drivers.FSIsolation	2019-01-08 09:11:47 -05:00
Danielle Tomlinson	a9b9ad34dc	drivers: Implement InternalPluginDriver interface This implements the InternalPluginDriver interface in each driver, and calls the cancellation fn for their respective eventers. This fixes a per task goroutine leak during test suite execution.	2019-01-08 13:49:31 +01:00
Alex Dadgar	0106f23aaa	Review comments	2019-01-07 14:50:28 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Alex Dadgar	f40f8ce02e	Mock driver has recovery, stats	2019-01-07 14:49:40 -08:00
Alex Dadgar	c9825a9c36	recover	2019-01-07 14:49:40 -08:00
Alex Dadgar	6c6e035dba	add docker logger to separate main	2019-01-07 14:49:40 -08:00
Alex Dadgar	39542b4cf0	rkt fingerprint logs once	2019-01-07 14:49:40 -08:00
Alex Dadgar	a6b36df4de	remove nil logger	2019-01-07 14:48:01 -08:00
Mahmood Ali	58fb6812db	tests: busybox only depends on arch Busybox is compiled for linux only. Making the file used in executor tests even for non-linux targets, as having the file present has no side-effects.	2019-01-07 08:36:32 -05:00
Mahmood Ali	0ba7b0c132	tests: helper function for checking docker presense	2019-01-07 08:27:06 -05:00
Mahmood Ali	796d625ab6	Skip tests requiring Docker deamon if not found.	2019-01-07 07:59:13 -05:00
Preetha Appan	2fb2de3cef	Standardize driver health description messages for all drivers	2019-01-06 22:06:38 -06:00
Preetha Appan	76c09c7cbf	remove unnecessary logging in rkt driver fingerprint method	2019-01-06 20:59:20 -06:00
Mahmood Ali	8797a4f0ea	drivers/exec: restrict devices exposed to tasks We ultimately decided to provide a limited set of devices in exec/java drivers instead of all of host ones. Pre-0.9, we made all host devices available to exec tasks accidentally, yet most applications only use a small subset, and this choice limits our ability to restrict/isolate GPU and other devices. Starting with 0.9, by default, we only provide the same subset of devices Docker provides, and allow users to provide more devices as needed on case-by-case basis. This reverts commit 5805c64a9f1c3b409693493dfa30e7136b9f547b. This reverts commit ff9a4a17e59388dcab067949e0664f645b2f5bcf.	2019-01-06 17:03:19 -05:00
Mahmood Ali	56e3171310	driver/exec: use dedicated /dev mount (#5147 ) Use a dedicated /dev mount so we can inject more devices if necessary, and avoid allowing a container to contaminate host /dev. Follow up to https://github.com/hashicorp/nomad/pull/5143 - and fixes master.	2019-01-04 10:36:05 -05:00
Mahmood Ali	5b0702c9eb	drivers/exec: bind mount /dev into rootfs Restores pre-0.9 behavior, where Nomad makes /dev available to exec task. Switching to libcontainer, we accidentally made only a small subset available. Here, we err on the side of preserving behavior of 0.8, instead of going for the sensible route, where only a reasonable subset of devices is mounted by default and user can opt to request more.	2019-01-03 14:29:18 -05:00
Mahmood Ali	d23f47736c	drivers/exec: run as `nobody` by default libcontainer based drivers (e.g. exec, java) should default to running processes as `nobody` [1]; but libcontainer treats empty user as `root` in our case (either because of default or due to `root` being current user). [1] `94c28a4c6c/website/source/docs/job-specification/task.html.md (task-parameters)`	2019-01-03 14:29:18 -05:00
Danielle Tomlinson	6c9b9dc9f1	rkt: Return consistent error when not root	2018-12-20 13:02:46 +01:00
Danielle Tomlinson	6709de199b	java: Return undetected when not running as root This is an unrecoverable error, so we should only do this check once, rather than returning unhealthy constantly.	2018-12-20 12:55:07 +01:00
Danielle Tomlinson	7b31027ea3	exec: Return undetected when not running as root This is an unrecoverable error, so we should only do this check once, rather than returning unhealthy constantly.	2018-12-20 12:54:19 +01:00
Nick Ethier	ce1a5cba0e	drivermanager: use allocID and task name to route task events	2018-12-18 23:01:51 -05:00
Alex Dadgar	bc55ec81b5	fix docker launching plugins	2018-12-18 16:48:01 -08:00
Alex Dadgar	730a6f5b9a	lint	2018-12-18 16:48:00 -08:00
Alex Dadgar	4c57d2ec4d	Add plugin API versioning to plugin loader and plugins	2018-12-18 16:48:00 -08:00
Alex Dadgar	b8268d9a46	Lint	2018-12-18 15:50:44 -08:00
Alex Dadgar	327b551b39	Drivers	2018-12-18 15:50:11 -08:00
Alex Dadgar	b9ee03b2c1	protos	2018-12-18 15:48:52 -08:00
Danielle Tomlinson	b61da13c20	docker: Delete Task on Destroy Currently the docker driver does not remove tasks from its state map when destroying the task, which leads to issues when restarting tasks in place, and leaks expired handles over time.	2018-12-18 15:53:31 +01:00
Mahmood Ali	56dfdd0874	tests: fix rkt command environment (#5011 ) The environment variables needed for envoking `rkt` command line should include host PATH (to access `iptables`). Given that the command runs outside the VM, untrusted task environment variables should NOT be honored here. We do this already with `rkt`, but the change is quite subtle to miss when refactoring.	2018-12-15 20:25:36 -05:00
Mahmood Ali	168749ffd1	Merge pull request #5008 from hashicorp/b-docker-test-20181214 Fix flakiness in docker tests	2018-12-15 16:03:00 -05:00
Mahmood Ali	e4f44b9be5	testes: remove TestDockerDriver_Kill We already have two other Kill tests (e.g. TestDockerDriver_Start_Kill_Wait and TestDockerDriver_Start_KillTimeout), so don't need yet another flaky test.	2018-12-15 15:03:56 -05:00
Mahmood Ali	990a7d6776	driver/docker: stopping a dead container not error	2018-12-15 15:03:56 -05:00
Mahmood Ali	eaaaaf5c69	tests: assert docker containers start	2018-12-15 15:03:56 -05:00
Mahmood Ali	6631d42bfa	tests: try deflake TestDockerDriver_OOMKilled Noticed an issue in Docker daemon failing to handle the OOM test case failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 , and I suspect it's related to the process dying so quickly, and potentially the way we are starting the task, so added a start up delay and made it more consistent with other tests that don't seem as flaky. The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR. ``` testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0" testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18 testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"} " ```	2018-12-15 15:03:56 -05:00
Mahmood Ali	6b216a6015	tests: pin busybox image to a specific point tag Using `:latest` tag is typically a cause of pain, as underlying image changes behavior. Here, I'm switching to using a point release, and re-updating the stored tarballs with it. Sadly, when saving/loading images, the repo digeset is not supported: https://github.com/moby/moby/issues/22011 ; but using point releases should mitigate the problem. The motivation here is that docker tests have some flakiness due to accidental importing of `busybox:latest` which has `/bin/nc` that no longer supports `-p 0`: ``` $ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0 Unable to find image 'busybox:latest' locally latest: Pulling from library/busybox Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812 Status: Downloaded newer image for busybox:latest nc: bad local port '0' ``` Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0` as the test expect, but I would rather update busybox to fix.	2018-12-15 15:03:56 -05:00
Nick Ethier	0c50a51c19	executor: encode mounts and devices correctly when using grpc	2018-12-15 00:08:23 -05:00
Nick Ethier	a771ee59aa	rawexec: fix misleading log	2018-12-14 23:40:37 -05:00
Nick Ethier	49e03542cc	executor: use int when encoding signal in RPC	2018-12-14 22:20:01 -05:00
Nick Ethier	09dadf0a23	Merge branch 'master' into f-grpc-executor * master: (71 commits) Fix output of 'nomad deployment fail' with no arg Always create a running allocation when testing task state tests: ensure exec tests pass valid task resources (#4992) some changes for more idiomatic code fix iops related tests fixed bug in loop delay gofmt improved code for readability client: updateAlloc release lock after read fixup! device attributes in `nomad node status -verbose` drivers/exec: support device binds and mounts fix iops bug and increase test matrix coverage tests: tag image explicitly changelog ci: install lxc-templates explicitly tests: skip checking rdma cgroup ci: use Ubuntu 16.04 (Xenial) in TravisCI client: update driver info on new fingerprint drivers/docker: enforce volumes.enabled (#4983) client: Style: use fluent style for building loggers ...	2018-12-13 14:41:09 -05:00
Mahmood Ali	f0ec27da3c	tests: ensure exec tests pass valid task resources (#4992 ) Prior to 97f33bb1537d04905cb84199672bcdf46ebb4e65, executor cgroup validation errors were silently ignored. Enforcing them reveals test cases that missed them. This doesn't change customer facing contract, as resource struct is is either configured or we default to 100 (much higher than 2).	2018-12-12 20:40:38 -05:00
Mahmood Ali	74bd0be6ea	drivers/exec: support device binds and mounts	2018-12-11 18:35:21 -05:00
Mahmood Ali	8726ab3b9e	Merge pull request #4985 from hashicorp/test-with-xenial ci: Test with Ubuntu 16.04 in TravisCI	2018-12-11 18:00:39 -05:00
Mahmood Ali	69b2355274	Merge pull request #4975 from hashicorp/fix-master-20181209 Some test fixes and remedies	2018-12-11 18:00:21 -05:00
Mahmood Ali	979a65486d	tests: tag image explicitly	2018-12-11 17:59:45 -05:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	e6e71fb47a	tests: skip checking rdma cgroup rdma was added in most recent kernels and libcontainer/docker don't isolate them by default.	2018-12-11 15:49:11 -05:00
Mahmood Ali	84ded28c6d	drivers/docker: enforce volumes.enabled (#4983 ) When volumes.enable flag is off in Docker driver, disable all mounts of paths outside alloc dir.	2018-12-11 14:22:50 -05:00
Mahmood Ali	f6f39f1314	add a note about busybox license	2018-12-11 09:35:26 -05:00
Mahmood Ali	5a487ac884	tests: prevent indefinite blocking in some tests Noticed few places where tests seem to block indefinitely and panic after the test run reaches the test package timeout. I intend to follow up with the proper fix later, but timing out is much better than indefinitely blocking.	2018-12-11 09:35:26 -05:00
Mahmood Ali	23c07b9afe	tests: update stop/kill tests with new pattern Update rawexec and rkt stop/kill tests with the patterns introduced in 7a49e9b68e519050a0c2ef0b67c33503bfbc51be. This implementation should be more resilient to discrepancy between task stopping and task being marked as exited.	2018-12-11 09:35:26 -05:00
Mahmood Ali	8453ce7d56	tests: setup libcontainer rootfs Using statically linked busybox binary to setup a basic rootfs for testing, by symlinking it to provide the basic commands used in tests. I considered using a proper rootfs tarball, but the overhead of managing tarfile and expanding it seems significant enough that I went with this implementation.	2018-12-11 09:35:26 -05:00
Mahmood Ali	97829a3f02	fix dtestutil.NewDriverHarness ref	2018-12-08 09:58:23 -05:00
Mahmood Ali	021d3720b5	Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill executor: kill all container processes	2018-12-08 09:52:42 -05:00
Nick Ethier	35268fdb54	executor: misspell	2018-12-08 01:52:06 -05:00
Nick Ethier	86e9c11ec2	executor: don't drop errors when configuring libcontainer cfg, add nil check on resources	2018-12-07 14:03:42 -05:00
Mahmood Ali	7d5b5bb5f9	Merge pull request #4933 from hashicorp/f-mount-device Mount Devices in container based drivers	2018-12-07 10:32:03 -05:00
Nick Ethier	47df1dde10	Merge branch 'master' into f-grpc-executor	2018-12-06 21:42:38 -05:00
Nick Ethier	19a695308f	executor: fix tests	2018-12-06 21:39:53 -05:00
Nick Ethier	913efed9f5	executor: fix broken non-linux build	2018-12-06 21:33:20 -05:00
Nick Ethier	2283cb2c39	executor: use drivers.Resources as resource model	2018-12-06 21:22:02 -05:00
Nick Ethier	29ef54c0ee	executor: merge plugin shim with executor package	2018-12-06 21:13:45 -05:00
Nick Ethier	71353a88d4	executor: remove structs package	2018-12-06 20:54:14 -05:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Mahmood Ali	a7b205daf2	Merge pull request #4955 from hashicorp/fix-docker-tests-20181203 Fix docker driver tests	2018-12-06 16:41:33 -05:00
Mahmood Ali	bdc53b1d8e	driver/rkt: mount plugin devices	2018-12-06 15:46:35 -05:00
Mahmood Ali	2c0fd2a902	driver/lxc: mount plugin devices Also, LXC requires target paths to be relative. Container paths in LXC binds should never be absolute paths, so we strip any preceeding `/`, even if a user sets one.	2018-12-06 15:46:35 -05:00
Mahmood Ali	699875eb1c	fixup: add missed docker utils test	2018-12-06 15:46:35 -05:00
Mahmood Ali	e9557ae596	tests: ensure image is loaded as test setup	2018-12-06 15:36:43 -05:00
Nick Ethier	57ffece7f8	executor: update test references	2018-12-05 11:07:48 -05:00
Nick Ethier	02f4b0fac5	executor: update driver references	2018-12-05 11:04:18 -05:00
Nick Ethier	8b20de4801	executor: use grpc instead of netrpc as plugin protocol * Added protobuf spec for executor * Seperated executor structs into their own package	2018-12-05 11:03:56 -05:00
Mahmood Ali	b55fb642f1	driver/docker: honor plugin devices	2018-12-04 21:31:28 -05:00
Mahmood Ali	a580cef986	refactor device manipulation	2018-12-04 20:55:59 -05:00
Mahmood Ali	3a18105d06	drivers/exec: refactor stop/kill tests Simplify the tests to do all assertions within the main goroutine and account for status propagation delay.	2018-12-04 20:34:43 -05:00
Mahmood Ali	428d35a5a9	executor: Keep 0.8.6 exit code for wait() failures 0.8.6 uses exit code 1 when `proc.Wait()` fails: https://github.com/hashicorp/nomad/blob/v0.8.6/client/driver/executor/executor.go#L442	2018-12-04 19:38:25 -05:00
Mahmood Ali	8df9de6fd5	driver/rkt: use rkt environment The rkt command itself needs an environment with PATH set to find iptables.	2018-12-04 14:00:45 -05:00
Mahmood Ali	06a5cadf35	drivers/rkt: use image isolation for rkt	2018-12-04 11:40:10 -05:00
Mahmood Ali	178365848e	tests: don't assert in WaitForResult WaitForResult expects body to fail and retries few times before giving up. Assertions inside the testfn body causes it to terminate abruptly without retrying.	2018-12-04 11:40:10 -05:00
Mahmood Ali	f8ceeebf11	no t.Parallel() in excutor table driven tests (#4948 ) When `t.Parallel()` is used inside a `t.Run()` sub-set, the closure doesn't behave as expected, and some cases effectively get skipped. More details can be found in https://gist.github.com/posener/92a55c4cd441fc5e5e85f27bca008721	2018-12-04 09:04:04 -05:00
Mahmood Ali	216a2566c7	Update LXC with drivers/testutils changes (#4951 )	2018-12-04 08:57:54 -05:00
Mahmood Ali	c88e3723eb	Fix docker tests Some tests have containers that die almost immediately, and may die and cleaned up before `driver.WaitUntilStarted` runs. The causes for container dying seems special for each test: * TestDockerDriver_Cleanup: `hello-world` image just emits a message and exits immediately * TestDockerDriver_ForcePull_RepoDigest: the busybox image in `TestDockerDriver_ForcePull_RepoDigest` test didn't support `-p 0` argument * TestDockerDriver_Entrypoint: with the entrypoint being `/bin/sh -c`, the command needs to be the entire string; otherwise, it ignores the comments	2018-12-03 23:08:52 -05:00
Mahmood Ali	2516cb16b9	Kill all container processes on shutdown Currently, libcontainer-based executor, upon shutdown, kills the container initial process. The children of the killed process remain running, and the executor is never marked as terminated until they do. Also, fix a case where we treat processes as successful, when `proc.Wait()` fails. In some attempts, I was getting "waitid no child processes" errors and such error shouldn't get process to be considered successful.	2018-12-03 20:40:49 -05:00
Mahmood Ali	bd8e4f1c15	Test Stopping a multi-process exec Ensure that exec children processes get killed as well.	2018-12-03 20:40:19 -05:00
Danielle Tomlinson	10b3e68a6d	Merge pull request #4925 from hashicorp/f-driver-plugins-dani Third Party Driver Plugins Support	2018-12-03 20:48:19 +01:00
Mahmood Ali	88622b97bd	libcontainer to manage /dev and /proc (#4945 ) libcontainer already manages `/dev`, overriding task_dir - so let's use it for `/proc` as well and remove deadcode.	2018-12-03 10:41:01 -05:00
Danielle Tomlinson	393b76ed7f	plugins: Move driver testing support to subpackage this allows us to drop a cyclical import, but is subobptimal as it requires BaseDriver tests to move. This falls firmly into the realm of being a hack. Alternatives welcome.	2018-12-01 17:29:39 +01:00
Danielle Tomlinson	66c521ca17	client: Move fingerprint structs to pkg This removes a cyclical dependency when importing client/structs from dependencies of the plugin_loader, specifically, drivers. Due to client/config also depending on the plugin_loader. It also better reflects the ownership of fingerprint structs, as they are fairly internal to the fingerprint manager.	2018-12-01 17:10:39 +01:00
Danielle Tomlinson	51a9f7369e	Merge pull request #4936 from hashicorp/f-legacy-refactor Refactor and repackage client/driver	2018-11-30 13:38:06 +01:00
Mahmood Ali	84e04cfa40	Merge pull request #4926 from hashicorp/f-docker-image-ref Use user provided image name to launch container	2018-11-30 07:27:39 -05:00

1 2 3 4 5 ...

283 commits