open-nomad

Author	SHA1	Message	Date
Mahmood Ali	6631d42bfa	tests: try deflake TestDockerDriver_OOMKilled Noticed an issue in Docker daemon failing to handle the OOM test case failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 , and I suspect it's related to the process dying so quickly, and potentially the way we are starting the task, so added a start up delay and made it more consistent with other tests that don't seem as flaky. The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR. ``` testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0" testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18 testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"} " ```	2018-12-15 15:03:56 -05:00
Mahmood Ali	6b216a6015	tests: pin busybox image to a specific point tag Using `:latest` tag is typically a cause of pain, as underlying image changes behavior. Here, I'm switching to using a point release, and re-updating the stored tarballs with it. Sadly, when saving/loading images, the repo digeset is not supported: https://github.com/moby/moby/issues/22011 ; but using point releases should mitigate the problem. The motivation here is that docker tests have some flakiness due to accidental importing of `busybox:latest` which has `/bin/nc` that no longer supports `-p 0`: ``` $ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0 Unable to find image 'busybox:latest' locally latest: Pulling from library/busybox Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812 Status: Downloaded newer image for busybox:latest nc: bad local port '0' ``` Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0` as the test expect, but I would rather update busybox to fix.	2018-12-15 15:03:56 -05:00
Mahmood Ali	d55ed53b83	Merge pull request #5005 from hashicorp/dev-update-golang-1.11.3 Upgrade to Golang 1.11.3	2018-12-14 11:11:55 -05:00
Mahmood Ali	6e32113e33	dev: expand ... in go get workaround a regression in 1.11.3 > We are aware of a functionality regression in "go get" when executed in GOPATH mode on an import path pattern containing "..." (e.g., "go get github.com/golang/pkg/..."), when downloading packages not already present in the GOPATH workspace. This is issue golang.org/issue/29241. It will be resolved in the next minor patch releases, Go 1.11.4 and Go 1.10.7, which we plan to release soon. We apologize for any disruption.	2018-12-14 09:42:23 -05:00
Mahmood Ali	c999400208	dev: upgrade go to 1.11.3	2018-12-14 09:42:23 -05:00
Mahmood Ali	e9f829c6d8	Update changelog (#4993 )	2018-12-14 09:20:17 -05:00
Preetha	782a709b9f	Merge pull request #4999 from hashicorp/blalor-patch-1 Fix output of 'nomad deployment fail' with no arg	2018-12-13 12:30:43 -06:00
Brian Lalor	31ef34838e	Fix output of 'nomad deployment fail' with no arg	2018-12-13 13:22:17 -05:00
Michael Lange	48adb393d8	Merge pull request #4998 from hashicorp/b-ui-test-failure UI: Fix intermittent test failure "cannot read property name of undefined"	2018-12-13 07:52:30 -08:00
Michael Lange	3b044816ac	Always create a running allocation when testing task state	2018-12-13 07:39:16 -08:00
Danielle Tomlinson	8b06e8d297	Merge pull request #4990 from hashicorp/dani/b-alloc-lock client: updateAlloc release lock after read	2018-12-13 12:43:59 +01:00
Mahmood Ali	f0ec27da3c	tests: ensure exec tests pass valid task resources (#4992 ) Prior to 97f33bb1537d04905cb84199672bcdf46ebb4e65, executor cgroup validation errors were silently ignored. Enforcing them reveals test cases that missed them. This doesn't change customer facing contract, as resource struct is is either configured or we default to 100 (much higher than 2).	2018-12-12 20:40:38 -05:00
Chris Baker	af593c401c	Merge pull request #4974 from hashicorp/b-1173-log-spam rpc accept loop: added backoff on logging	2018-12-12 16:54:42 -08:00
Mahmood Ali	d497729826	Merge pull request #4978 from hashicorp/f-device-tweaks Display device attributes in `nomad node status -verbose`	2018-12-12 19:45:07 -05:00
Chris Baker	121a9eb8cb	some changes for more idiomatic code	2018-12-12 23:11:17 +00:00
Alex Dadgar	20c59df8b9	Merge pull request #4969 from hashicorp/f-alloc-hooks Make alloc health watcher a postrun hook rather than shutdown hook	2018-12-12 14:34:36 -08:00
Alex Dadgar	fbe4d67d1b	fix iops related tests	2018-12-12 14:32:22 -08:00
Chris Baker	34600f8b75	fixed bug in loop delay	2018-12-12 19:16:41 +00:00
Chris Baker	89c64932c1	gofmt	2018-12-12 19:09:06 +00:00
Chris Baker	22c11d8799	improved code for readability	2018-12-12 18:52:06 +00:00
Danielle Tomlinson	4184eadaf4	client: updateAlloc release lock after read The allocLock is used to synchronize access to the alloc runner map, not to ensure internal consistency of the alloc runners themselves. This updates the updateAlloc process to avoid hanging on to an exclusive lock of the map while applying changes to allocrunners themselves, as they should be internally consistent. This fixes a bug where any client allocation api will block during the shutdown or updating of an allocrunner and its child taskrunners.	2018-12-12 16:30:01 +01:00
Mahmood Ali	00c9385a2b	fixup! device attributes in `nomad node status -verbose`	2018-12-12 09:17:31 -05:00
Mahmood Ali	567f1930fe	Merge pull request #4962 from hashicorp/f-exec-device-mounts drivers/exec: Support devices mounts	2018-12-11 20:20:01 -05:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Alex Dadgar	fc14a04612	Merge pull request #4986 from hashicorp/b-vault-e2e fix iops bug and increase test matrix coverage	2018-12-11 15:52:29 -08:00
Mahmood Ali	74bd0be6ea	drivers/exec: support device binds and mounts	2018-12-11 18:35:21 -05:00
Alex Dadgar	86d9ad4397	fix iops bug and increase test matrix coverage	2018-12-11 15:28:21 -08:00
Mahmood Ali	3d166e6e9c	Merge pull request #4984 from hashicorp/b-client-update-driver client: update driver info on new driver fingerprint	2018-12-11 18:01:03 -05:00
Mahmood Ali	8726ab3b9e	Merge pull request #4985 from hashicorp/test-with-xenial ci: Test with Ubuntu 16.04 in TravisCI	2018-12-11 18:00:39 -05:00
Mahmood Ali	69b2355274	Merge pull request #4975 from hashicorp/fix-master-20181209 Some test fixes and remedies	2018-12-11 18:00:21 -05:00
Mahmood Ali	979a65486d	tests: tag image explicitly	2018-12-11 17:59:45 -05:00
Alex Dadgar	2df8a56b76	changelog	2018-12-11 12:52:45 -08:00
Alex Dadgar	1531b6d534	Merge pull request #4970 from hashicorp/f-no-iops Deprecate IOPS	2018-12-11 12:51:22 -08:00
Mahmood Ali	76bc2a3fd0	ci: install lxc-templates explicitly LXC package on Ubuntu 16.04 doesn't depend on lxc-template, but we require it in our tests.	2018-12-11 15:49:11 -05:00
Mahmood Ali	e6e71fb47a	tests: skip checking rdma cgroup rdma was added in most recent kernels and libcontainer/docker don't isolate them by default.	2018-12-11 15:49:11 -05:00
Mahmood Ali	095dba72ac	ci: use Ubuntu 16.04 (Xenial) in TravisCI	2018-12-11 15:49:11 -05:00
Mahmood Ali	ba515947c2	client: update driver info on new fingerprint Fixes a bug where a driver health and attributes are never updated from their initial status. If a driver started unhealthy, it may never go into a healthy status.	2018-12-11 14:25:10 -05:00
Mahmood Ali	84ded28c6d	drivers/docker: enforce volumes.enabled (#4983 ) When volumes.enable flag is off in Docker driver, disable all mounts of paths outside alloc dir.	2018-12-11 14:22:50 -05:00
Danielle Tomlinson	d11c62fa3a	Merge pull request #4963 from hashicorp/dani/f-preempt-alloc-wait client: Wait for preemptions to terminate	2018-12-11 18:06:34 +01:00
Danielle Tomlinson	ed1791f4bf	client: Style: use fluent style for building loggers	2018-12-11 18:03:45 +01:00
Preetha Appan	977a4a540d	Early continue after meeting needed count Also adds another optimization that filters out un-needed allocations as a final filtering step	2018-12-11 10:12:18 -06:00
Danielle Tomlinson	805669ead4	client: Correctly pass a noop PrevAllocMigrator when restoring	2018-12-11 15:46:58 +01:00
Mahmood Ali	f6f39f1314	add a note about busybox license	2018-12-11 09:35:26 -05:00
Mahmood Ali	3babda5d45	tests: no need for buffer channel	2018-12-11 09:35:26 -05:00
Mahmood Ali	5a487ac884	tests: prevent indefinite blocking in some tests Noticed few places where tests seem to block indefinitely and panic after the test run reaches the test package timeout. I intend to follow up with the proper fix later, but timing out is much better than indefinitely blocking.	2018-12-11 09:35:26 -05:00
Mahmood Ali	23c07b9afe	tests: update stop/kill tests with new pattern Update rawexec and rkt stop/kill tests with the patterns introduced in 7a49e9b68e519050a0c2ef0b67c33503bfbc51be. This implementation should be more resilient to discrepancy between task stopping and task being marked as exited.	2018-12-11 09:35:26 -05:00
Mahmood Ali	4635168f20	test: fix TestFingerprintManager_Run_Combination Let's use a fingerprinter that doesn't have values prepopulated in test fixtures.	2018-12-11 09:35:26 -05:00
Mahmood Ali	8453ce7d56	tests: setup libcontainer rootfs Using statically linked busybox binary to setup a basic rootfs for testing, by symlinking it to provide the basic commands used in tests. I considered using a proper rootfs tarball, but the overhead of managing tarfile and expanding it seems significant enough that I went with this implementation.	2018-12-11 09:35:26 -05:00
Mahmood Ali	994b9d967c	tests: Lower package runtime Lowering the runtime here to pre 7ca535aa90748caff1522468cc0c4ab672a74abb expectations. The longest package at the time `client/driver` shrunk significantly, and now the longest packages take less than 5 minutes. We do have some long running timed out projects due to a stuck shutdown, but in completed jobs (though they failed), the longest packages took less than 5 minutes. The longest running packages in https://travis-ci.org/hashicorp/nomad/jobs/464640776 were: ``` FAIL github.com/hashicorp/nomad/nomad 268.089s ok github.com/hashicorp/nomad/drivers/docker 203.903s coverage: 68.8% of statements ok github.com/hashicorp/nomad/drivers/rkt 132.104s coverage: 65.0% of statements ok github.com/hashicorp/nomad/api 123.193s coverage: 62.9% of statements ok github.com/hashicorp/nomad/command/agent 74.657s coverage: 72.3% of statements ok github.com/hashicorp/nomad/command 63.592s coverage: 42.7% of statements ```	2018-12-11 09:35:26 -05:00
Danielle Tomlinson	6fb5ca6ad5	allocrunner: Test alloc runners should include a noop migrator	2018-12-11 13:12:35 +01:00

1 2 3 4 5 ...

13518 commits