Commit Graph

13775 Commits

Author SHA1 Message Date
Michael Lange 3b044816ac Always create a running allocation when testing task state 2018-12-13 07:39:16 -08:00
Danielle Tomlinson 8b06e8d297
Merge pull request #4990 from hashicorp/dani/b-alloc-lock
client: updateAlloc release lock after read
2018-12-13 12:43:59 +01:00
Danielle Tomlinson 3823599da9 client: Give a copy of clientconfig to allocrunner
Currently, there is a race condition between creating a taskrunner, and
updating node attributes via fingerprinting.

This is because the taskenv builder will try to iterate over the
clientconfig.Node.Attributes map, which can be concurrently updated by
the fingerprinting process, thus causing a panic.

This fixes that by providing a copy of the clientconfg to the
allocrunner inside the Read lock during config creation.
2018-12-13 12:42:15 +01:00
Mahmood Ali f0ec27da3c
tests: ensure exec tests pass valid task resources (#4992)
Prior to 97f33bb1537d04905cb84199672bcdf46ebb4e65, executor cgroup validation errors were
silently ignored.  Enforcing them reveals test cases that missed them.

This doesn't change customer facing contract, as resource struct is
is either configured or we default to 100 (much higher than 2).
2018-12-12 20:40:38 -05:00
Chris Baker af593c401c
Merge pull request #4974 from hashicorp/b-1173-log-spam
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00
Mahmood Ali d497729826
Merge pull request #4978 from hashicorp/f-device-tweaks
Display device attributes in `nomad node status -verbose`
2018-12-12 19:45:07 -05:00
Chris Baker 121a9eb8cb some changes for more idiomatic code 2018-12-12 23:11:17 +00:00
Alex Dadgar 20c59df8b9
Merge pull request #4969 from hashicorp/f-alloc-hooks
Make alloc health watcher a postrun hook rather than shutdown hook
2018-12-12 14:34:36 -08:00
Alex Dadgar fbe4d67d1b fix iops related tests 2018-12-12 14:32:22 -08:00
Chris Baker 34600f8b75 fixed bug in loop delay 2018-12-12 19:16:41 +00:00
Chris Baker 89c64932c1 gofmt 2018-12-12 19:09:06 +00:00
Chris Baker 22c11d8799 improved code for readability 2018-12-12 18:52:06 +00:00
Danielle Tomlinson 4184eadaf4 client: updateAlloc release lock after read
The allocLock is used to synchronize access to the alloc runner map, not
to ensure internal consistency of the alloc runners themselves. This
updates the updateAlloc process to avoid hanging on to an exclusive lock
of the map while applying changes to allocrunners themselves, as they
should be internally consistent.

This fixes a bug where any client allocation api will block during the
shutdown or updating of an allocrunner and its child taskrunners.
2018-12-12 16:30:01 +01:00
Mahmood Ali 00c9385a2b fixup! device attributes in `nomad node status -verbose` 2018-12-12 09:17:31 -05:00
Danielle Tomlinson cbfa1388db fixup: Code Review 2018-12-12 12:43:16 +01:00
Mahmood Ali 567f1930fe
Merge pull request #4962 from hashicorp/f-exec-device-mounts
drivers/exec: Support devices mounts
2018-12-11 20:20:01 -05:00
Preetha f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
Device preemption
2018-12-11 18:34:19 -06:00
Alex Dadgar fc14a04612
Merge pull request #4986 from hashicorp/b-vault-e2e
fix iops bug and increase test matrix coverage
2018-12-11 15:52:29 -08:00
Mahmood Ali 74bd0be6ea drivers/exec: support device binds and mounts 2018-12-11 18:35:21 -05:00
Alex Dadgar 86d9ad4397 fix iops bug and increase test matrix coverage 2018-12-11 15:28:21 -08:00
Mahmood Ali 3d166e6e9c
Merge pull request #4984 from hashicorp/b-client-update-driver
client: update driver info on new driver fingerprint
2018-12-11 18:01:03 -05:00
Mahmood Ali 8726ab3b9e
Merge pull request #4985 from hashicorp/test-with-xenial
ci: Test with Ubuntu 16.04 in TravisCI
2018-12-11 18:00:39 -05:00
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Mahmood Ali 979a65486d tests: tag image explicitly 2018-12-11 17:59:45 -05:00
Alex Dadgar 2df8a56b76 changelog 2018-12-11 12:52:45 -08:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali 76bc2a3fd0 ci: install lxc-templates explicitly
LXC package on Ubuntu 16.04 doesn't depend on lxc-template, but we
require it in our tests.
2018-12-11 15:49:11 -05:00
Mahmood Ali e6e71fb47a tests: skip checking rdma cgroup
rdma was added in most recent kernels and libcontainer/docker
don't isolate them by default.
2018-12-11 15:49:11 -05:00
Mahmood Ali 095dba72ac ci: use Ubuntu 16.04 (Xenial) in TravisCI 2018-12-11 15:49:11 -05:00
Mahmood Ali ba515947c2 client: update driver info on new fingerprint
Fixes a bug where a driver health and attributes are never updated from
their initial status.  If a driver started unhealthy, it may never go
into a healthy status.
2018-12-11 14:25:10 -05:00
Mahmood Ali 84ded28c6d
drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Danielle Tomlinson d11c62fa3a
Merge pull request #4963 from hashicorp/dani/f-preempt-alloc-wait
client: Wait for preemptions to terminate
2018-12-11 18:06:34 +01:00
Danielle Tomlinson ed1791f4bf client: Style: use fluent style for building loggers 2018-12-11 18:03:45 +01:00
Danielle Tomlinson 1ee0777521 fixup: Correctly sort based on distance, use iradix for ordering 2018-12-11 17:35:51 +01:00
Preetha Appan 977a4a540d
Early continue after meeting needed count
Also adds another optimization that filters out un-needed allocations
as a final filtering step
2018-12-11 10:12:18 -06:00
Danielle Tomlinson 8d76d9c24b acl: Add support for globbing namespaces
This commit adds basic support for globbing namespaces in acl
definitions.

For concrete definitions, we merge all of the defined policies at load time, and
perform a simple lookup later on. If an exact match of a concrete
definition is found, we do not attempt to resolve globs.

For glob definitions, we merge definitions of exact replicas of a glob.

When loading a policy for a glob defintion, we choose the glob that has
the closest match to the namespace we are resolving for. We define the
closest match as the one with the _smallest character difference_
between the glob and the namespace we are matching.
2018-12-11 16:33:19 +01:00
Danielle Tomlinson 805669ead4 client: Correctly pass a noop PrevAllocMigrator when restoring 2018-12-11 15:46:58 +01:00
Mahmood Ali f6f39f1314 add a note about busybox license 2018-12-11 09:35:26 -05:00
Mahmood Ali 3babda5d45 tests: no need for buffer channel 2018-12-11 09:35:26 -05:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Mahmood Ali 23c07b9afe tests: update stop/kill tests with new pattern
Update rawexec and rkt stop/kill tests with the patterns introduced in
7a49e9b68e519050a0c2ef0b67c33503bfbc51be.  This implementation should be
more resilient to discrepancy between task stopping and task being marked as exited.
2018-12-11 09:35:26 -05:00
Mahmood Ali 4635168f20 test: fix TestFingerprintManager_Run_Combination
Let's use a fingerprinter that doesn't have values prepopulated in test
fixtures.
2018-12-11 09:35:26 -05:00
Mahmood Ali 8453ce7d56 tests: setup libcontainer rootfs
Using statically linked busybox binary to setup a basic rootfs for
testing, by symlinking it to provide the basic commands used in tests.

I considered using a proper rootfs tarball, but the overhead of managing
tarfile and expanding it seems significant enough that I went with this
implementation.
2018-12-11 09:35:26 -05:00
Mahmood Ali 994b9d967c tests: Lower package runtime
Lowering the runtime here to pre 7ca535aa90748caff1522468cc0c4ab672a74abb expectations.

The longest package at the time `client/driver` shrunk significantly,
and now the longest packages take less than 5 minutes.

We do have some long running timed out projects due to a stuck shutdown,
but in completed jobs (though they failed), the longest packages took
less than 5 minutes.  The longest running packages in
https://travis-ci.org/hashicorp/nomad/jobs/464640776 were:

```
FAIL  github.com/hashicorp/nomad/nomad                                   268.089s
ok    github.com/hashicorp/nomad/drivers/docker                          203.903s  coverage:  68.8%   of  statements
ok    github.com/hashicorp/nomad/drivers/rkt                             132.104s  coverage:  65.0%   of  statements
ok    github.com/hashicorp/nomad/api                                     123.193s  coverage:  62.9%   of  statements
ok    github.com/hashicorp/nomad/command/agent                           74.657s   coverage:  72.3%   of  statements
ok    github.com/hashicorp/nomad/command                                 63.592s   coverage:  42.7%   of  statements
```
2018-12-11 09:35:26 -05:00
Danielle Tomlinson 6fb5ca6ad5 allocrunner: Test alloc runners should include a noop migrator 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 4b4b85e3f4 allocwatcher: Cleanup new migrator/watcher interface 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 83720575de client: Unify handling of previous and preempted allocs 2018-12-11 13:12:35 +01:00
Michael Schurter 8808ab9cea
Merge pull request #4953 from hashicorp/b-script-context-wrapper
consul: add ScriptExecutor context wrapper
2018-12-10 17:22:53 -08:00
Michael Schurter 4c5f3ae82c
Merge pull request #4952 from hashicorp/b-script-context
consul: fix script checks exiting after 1 run
2018-12-10 17:22:15 -08:00
Danielle Tomlinson dff7093243 client: Wait for preempted allocs to terminate
When starting an allocation that is preempting other allocs, we create a
new group allocation watcher, and then wait for the allocations to
terminate in the allocation PreRun hooks.

If there's no preempted allocations, then we simply provide a
NoopAllocWatcher.
2018-12-11 00:59:18 +01:00