open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Lange	e1e5aa21ff	Merge pull request #4994 from hashicorp/b-ui-dots-in-tasks UI: Bugs around dots in task/task-group/driver names	2018-12-17 15:50:31 -08:00
Danielle Tomlinson	d9174d8dcf	Merge pull request #4989 from hashicorp/dani/b-client-update-race-condition client: Give a copy of clientconfig to allocrunner	2018-12-17 10:49:46 +01:00
Danielle Tomlinson	53aa1bc198	Merge pull request #5004 from hashicorp/dani/f-hook-errors client: Emit TaskEvents when task hooks fail	2018-12-17 10:42:57 +01:00
Danielle Tomlinson	a50ea29da4	taskrunner: Use hook errors for artifacts	2018-12-17 10:39:38 +01:00
Mahmood Ali	2d2c562e18	Remove implicit check I intended to remove this line in 29ef7ecf2372f980d12a9900e1b2a351568dd415 - see my notes there for details.	2018-12-16 09:14:26 -05:00
Mahmood Ali	56dfdd0874	tests: fix rkt command environment (#5011 ) The environment variables needed for envoking `rkt` command line should include host PATH (to access `iptables`). Given that the command runs outside the VM, untrusted task environment variables should NOT be honored here. We do this already with `rkt`, but the change is quite subtle to miss when refactoring.	2018-12-15 20:25:36 -05:00
Mahmood Ali	168749ffd1	Merge pull request #5008 from hashicorp/b-docker-test-20181214 Fix flakiness in docker tests	2018-12-15 16:03:00 -05:00
Mahmood Ali	d58e38e912	tests: avoid implicitly asserting clean shutdown The assertion here is causing many spurious failures that aren't actually relevant to the test itself. We are tracking the cause for this failure independently, and it would make more sense to have a dedicated test for clean shutdown.	2018-12-15 15:30:09 -05:00
Mahmood Ali	e4f44b9be5	testes: remove TestDockerDriver_Kill We already have two other Kill tests (e.g. TestDockerDriver_Start_Kill_Wait and TestDockerDriver_Start_KillTimeout), so don't need yet another flaky test.	2018-12-15 15:03:56 -05:00
Mahmood Ali	990a7d6776	driver/docker: stopping a dead container not error	2018-12-15 15:03:56 -05:00
Mahmood Ali	eaaaaf5c69	tests: assert docker containers start	2018-12-15 15:03:56 -05:00
Mahmood Ali	6631d42bfa	tests: try deflake TestDockerDriver_OOMKilled Noticed an issue in Docker daemon failing to handle the OOM test case failure in build https://travis-ci.org/hashicorp/nomad/jobs/468027848 , and I suspect it's related to the process dying so quickly, and potentially the way we are starting the task, so added a start up delay and made it more consistent with other tests that don't seem as flaky. The following is the log line showing Docker returning 500 error condition; while we can probably handle it gracefully without retrying, the retry is very cheap in this case and it's more of an optimization that we can handle in follow up PR. ``` testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:852: docker: setting container startup command: task_name=nc-demo command="/bin/nc -l 127.0.0.1 -p 0" testlog.go:32: 2018-12-14T14:57:52.626Z [DEBUG] docker/driver.go:866: docker: setting container name: task_name=nc-demo container_name=724a3e77-8b15-e657-f6aa-84c2d3243b18 testlog.go:32: 2018-12-14T14:57:52.694Z [INFO ] docker/driver.go:196: docker: created container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be testlog.go:32: 2018-12-14T14:57:53.523Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=1 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:55.394Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=2 error="API error (500): {"message":"cannot start a stopped process: unknown"} " testlog.go:32: 2018-12-14T14:57:57.243Z [DEBUG] docker/driver.go:416: docker: failed to start container: container_id=362b6ea183f3c4ce472d7d7571ca47023cea1df0f5eb920827921716f17718be attempt=3 error="API error (500): {"message":"cannot start a stopped process: unknown"} " ```	2018-12-15 15:03:56 -05:00
Mahmood Ali	6b216a6015	tests: pin busybox image to a specific point tag Using `:latest` tag is typically a cause of pain, as underlying image changes behavior. Here, I'm switching to using a point release, and re-updating the stored tarballs with it. Sadly, when saving/loading images, the repo digeset is not supported: https://github.com/moby/moby/issues/22011 ; but using point releases should mitigate the problem. The motivation here is that docker tests have some flakiness due to accidental importing of `busybox:latest` which has `/bin/nc` that no longer supports `-p 0`: ``` $ docker run -it --rm busybox /bin/nc -l 127.0.0.1 -p 0 Unable to find image 'busybox:latest' locally latest: Pulling from library/busybox Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812 Status: Downloaded newer image for busybox:latest nc: bad local port '0' ``` Looks like older busybox versions (e.g. `busybox:1.24` do honor `-p 0` as the test expect, but I would rather update busybox to fix.	2018-12-15 15:03:56 -05:00
Nick Ethier	113d879e65	Merge pull request #4961 from hashicorp/f-grpc-executor GRPC Executor	2018-12-15 00:34:36 -05:00
Nick Ethier	0c50a51c19	executor: encode mounts and devices correctly when using grpc	2018-12-15 00:08:23 -05:00
Nick Ethier	a771ee59aa	rawexec: fix misleading log	2018-12-14 23:40:37 -05:00
Nick Ethier	49e03542cc	executor: use int when encoding signal in RPC	2018-12-14 22:20:01 -05:00
Mahmood Ali	d55ed53b83	Merge pull request #5005 from hashicorp/dev-update-golang-1.11.3 Upgrade to Golang 1.11.3	2018-12-14 11:11:55 -05:00
Mahmood Ali	6e32113e33	dev: expand ... in go get workaround a regression in 1.11.3 > We are aware of a functionality regression in "go get" when executed in GOPATH mode on an import path pattern containing "..." (e.g., "go get github.com/golang/pkg/..."), when downloading packages not already present in the GOPATH workspace. This is issue golang.org/issue/29241. It will be resolved in the next minor patch releases, Go 1.11.4 and Go 1.10.7, which we plan to release soon. We apologize for any disruption.	2018-12-14 09:42:23 -05:00
Mahmood Ali	c999400208	dev: upgrade go to 1.11.3	2018-12-14 09:42:23 -05:00
Mahmood Ali	e9f829c6d8	Update changelog (#4993 )	2018-12-14 09:20:17 -05:00
Nick Ethier	09dadf0a23	Merge branch 'master' into f-grpc-executor * master: (71 commits) Fix output of 'nomad deployment fail' with no arg Always create a running allocation when testing task state tests: ensure exec tests pass valid task resources (#4992) some changes for more idiomatic code fix iops related tests fixed bug in loop delay gofmt improved code for readability client: updateAlloc release lock after read fixup! device attributes in `nomad node status -verbose` drivers/exec: support device binds and mounts fix iops bug and increase test matrix coverage tests: tag image explicitly changelog ci: install lxc-templates explicitly tests: skip checking rdma cgroup ci: use Ubuntu 16.04 (Xenial) in TravisCI client: update driver info on new fingerprint drivers/docker: enforce volumes.enabled (#4983) client: Style: use fluent style for building loggers ...	2018-12-13 14:41:09 -05:00
Preetha	782a709b9f	Merge pull request #4999 from hashicorp/blalor-patch-1 Fix output of 'nomad deployment fail' with no arg	2018-12-13 12:30:43 -06:00
Brian Lalor	31ef34838e	Fix output of 'nomad deployment fail' with no arg	2018-12-13 13:22:17 -05:00
Danielle Tomlinson	3647b701a6	taskrunner: Emit task events when a hook fails	2018-12-13 18:20:18 +01:00
Michael Lange	d83be97d78	Don't use Ember.get in conjunction with dynamic strings in the job-plan serializer	2018-12-13 07:53:37 -08:00
Michael Lange	5902842d6b	Don't use Ember.get in conjunction with dynamic strings in the allocation serializer	2018-12-13 07:53:37 -08:00
Michael Lange	7b466f9f60	Don't use Ember.get in conjunction with dynamic strings in the node serializer	2018-12-13 07:53:37 -08:00
Michael Lange	13b7434eca	Don't use Ember.get in conjunction with dynamic strings in the deployment serializer	2018-12-13 07:53:37 -08:00
Michael Lange	a00544e302	Don't use Ember.get in conjunction with dynamic strings in the job-summary serializer	2018-12-13 07:53:37 -08:00
Michael Lange	b5c11b4e43	Don't use Ember.get in conjunction with dynamic strings in the evaluation serializer	2018-12-13 07:53:37 -08:00
Michael Lange	48adb393d8	Merge pull request #4998 from hashicorp/b-ui-test-failure UI: Fix intermittent test failure "cannot read property name of undefined"	2018-12-13 07:52:30 -08:00
Michael Lange	3b044816ac	Always create a running allocation when testing task state	2018-12-13 07:39:16 -08:00
Danielle Tomlinson	8b06e8d297	Merge pull request #4990 from hashicorp/dani/b-alloc-lock client: updateAlloc release lock after read	2018-12-13 12:43:59 +01:00
Danielle Tomlinson	3823599da9	client: Give a copy of clientconfig to allocrunner Currently, there is a race condition between creating a taskrunner, and updating node attributes via fingerprinting. This is because the taskenv builder will try to iterate over the clientconfig.Node.Attributes map, which can be concurrently updated by the fingerprinting process, thus causing a panic. This fixes that by providing a copy of the clientconfg to the allocrunner inside the Read lock during config creation.	2018-12-13 12:42:15 +01:00
Mahmood Ali	f0ec27da3c	tests: ensure exec tests pass valid task resources (#4992 ) Prior to 97f33bb1537d04905cb84199672bcdf46ebb4e65, executor cgroup validation errors were silently ignored. Enforcing them reveals test cases that missed them. This doesn't change customer facing contract, as resource struct is is either configured or we default to 100 (much higher than 2).	2018-12-12 20:40:38 -05:00
Chris Baker	af593c401c	Merge pull request #4974 from hashicorp/b-1173-log-spam rpc accept loop: added backoff on logging	2018-12-12 16:54:42 -08:00
Mahmood Ali	d497729826	Merge pull request #4978 from hashicorp/f-device-tweaks Display device attributes in `nomad node status -verbose`	2018-12-12 19:45:07 -05:00
Chris Baker	121a9eb8cb	some changes for more idiomatic code	2018-12-12 23:11:17 +00:00
Alex Dadgar	20c59df8b9	Merge pull request #4969 from hashicorp/f-alloc-hooks Make alloc health watcher a postrun hook rather than shutdown hook	2018-12-12 14:34:36 -08:00
Alex Dadgar	fbe4d67d1b	fix iops related tests	2018-12-12 14:32:22 -08:00
Chris Baker	34600f8b75	fixed bug in loop delay	2018-12-12 19:16:41 +00:00
Chris Baker	89c64932c1	gofmt	2018-12-12 19:09:06 +00:00
Chris Baker	22c11d8799	improved code for readability	2018-12-12 18:52:06 +00:00
Danielle Tomlinson	4184eadaf4	client: updateAlloc release lock after read The allocLock is used to synchronize access to the alloc runner map, not to ensure internal consistency of the alloc runners themselves. This updates the updateAlloc process to avoid hanging on to an exclusive lock of the map while applying changes to allocrunners themselves, as they should be internally consistent. This fixes a bug where any client allocation api will block during the shutdown or updating of an allocrunner and its child taskrunners.	2018-12-12 16:30:01 +01:00
Mahmood Ali	00c9385a2b	fixup! device attributes in `nomad node status -verbose`	2018-12-12 09:17:31 -05:00
Mahmood Ali	567f1930fe	Merge pull request #4962 from hashicorp/f-exec-device-mounts drivers/exec: Support devices mounts	2018-12-11 20:20:01 -05:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Alex Dadgar	fc14a04612	Merge pull request #4986 from hashicorp/b-vault-e2e fix iops bug and increase test matrix coverage	2018-12-11 15:52:29 -08:00
Mahmood Ali	74bd0be6ea	drivers/exec: support device binds and mounts	2018-12-11 18:35:21 -05:00

1 2 3 4 5 ...

13553 Commits All Branches Search

13553 Commits

All Branches