Commit graph

13853 commits

Author SHA1 Message Date
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Mahmood Ali 979a65486d tests: tag image explicitly 2018-12-11 17:59:45 -05:00
Alex Dadgar 2df8a56b76 changelog 2018-12-11 12:52:45 -08:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali 76bc2a3fd0 ci: install lxc-templates explicitly
LXC package on Ubuntu 16.04 doesn't depend on lxc-template, but we
require it in our tests.
2018-12-11 15:49:11 -05:00
Mahmood Ali e6e71fb47a tests: skip checking rdma cgroup
rdma was added in most recent kernels and libcontainer/docker
don't isolate them by default.
2018-12-11 15:49:11 -05:00
Mahmood Ali 095dba72ac ci: use Ubuntu 16.04 (Xenial) in TravisCI 2018-12-11 15:49:11 -05:00
Mahmood Ali ba515947c2 client: update driver info on new fingerprint
Fixes a bug where a driver health and attributes are never updated from
their initial status.  If a driver started unhealthy, it may never go
into a healthy status.
2018-12-11 14:25:10 -05:00
Mahmood Ali 84ded28c6d
drivers/docker: enforce volumes.enabled (#4983)
When volumes.enable flag is off in Docker driver, disable all mounts of
paths outside alloc dir.
2018-12-11 14:22:50 -05:00
Danielle Tomlinson d11c62fa3a
Merge pull request #4963 from hashicorp/dani/f-preempt-alloc-wait
client: Wait for preemptions to terminate
2018-12-11 18:06:34 +01:00
Danielle Tomlinson ed1791f4bf client: Style: use fluent style for building loggers 2018-12-11 18:03:45 +01:00
Danielle Tomlinson 1ee0777521 fixup: Correctly sort based on distance, use iradix for ordering 2018-12-11 17:35:51 +01:00
Preetha Appan 977a4a540d
Early continue after meeting needed count
Also adds another optimization that filters out un-needed allocations
as a final filtering step
2018-12-11 10:12:18 -06:00
Danielle Tomlinson 8d76d9c24b acl: Add support for globbing namespaces
This commit adds basic support for globbing namespaces in acl
definitions.

For concrete definitions, we merge all of the defined policies at load time, and
perform a simple lookup later on. If an exact match of a concrete
definition is found, we do not attempt to resolve globs.

For glob definitions, we merge definitions of exact replicas of a glob.

When loading a policy for a glob defintion, we choose the glob that has
the closest match to the namespace we are resolving for. We define the
closest match as the one with the _smallest character difference_
between the glob and the namespace we are matching.
2018-12-11 16:33:19 +01:00
Danielle Tomlinson 805669ead4 client: Correctly pass a noop PrevAllocMigrator when restoring 2018-12-11 15:46:58 +01:00
Mahmood Ali f6f39f1314 add a note about busybox license 2018-12-11 09:35:26 -05:00
Mahmood Ali 3babda5d45 tests: no need for buffer channel 2018-12-11 09:35:26 -05:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Mahmood Ali 23c07b9afe tests: update stop/kill tests with new pattern
Update rawexec and rkt stop/kill tests with the patterns introduced in
7a49e9b68e519050a0c2ef0b67c33503bfbc51be.  This implementation should be
more resilient to discrepancy between task stopping and task being marked as exited.
2018-12-11 09:35:26 -05:00
Mahmood Ali 4635168f20 test: fix TestFingerprintManager_Run_Combination
Let's use a fingerprinter that doesn't have values prepopulated in test
fixtures.
2018-12-11 09:35:26 -05:00
Mahmood Ali 8453ce7d56 tests: setup libcontainer rootfs
Using statically linked busybox binary to setup a basic rootfs for
testing, by symlinking it to provide the basic commands used in tests.

I considered using a proper rootfs tarball, but the overhead of managing
tarfile and expanding it seems significant enough that I went with this
implementation.
2018-12-11 09:35:26 -05:00
Mahmood Ali 994b9d967c tests: Lower package runtime
Lowering the runtime here to pre 7ca535aa90748caff1522468cc0c4ab672a74abb expectations.

The longest package at the time `client/driver` shrunk significantly,
and now the longest packages take less than 5 minutes.

We do have some long running timed out projects due to a stuck shutdown,
but in completed jobs (though they failed), the longest packages took
less than 5 minutes.  The longest running packages in
https://travis-ci.org/hashicorp/nomad/jobs/464640776 were:

```
FAIL  github.com/hashicorp/nomad/nomad                                   268.089s
ok    github.com/hashicorp/nomad/drivers/docker                          203.903s  coverage:  68.8%   of  statements
ok    github.com/hashicorp/nomad/drivers/rkt                             132.104s  coverage:  65.0%   of  statements
ok    github.com/hashicorp/nomad/api                                     123.193s  coverage:  62.9%   of  statements
ok    github.com/hashicorp/nomad/command/agent                           74.657s   coverage:  72.3%   of  statements
ok    github.com/hashicorp/nomad/command                                 63.592s   coverage:  42.7%   of  statements
```
2018-12-11 09:35:26 -05:00
Danielle Tomlinson 6fb5ca6ad5 allocrunner: Test alloc runners should include a noop migrator 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 4b4b85e3f4 allocwatcher: Cleanup new migrator/watcher interface 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 83720575de client: Unify handling of previous and preempted allocs 2018-12-11 13:12:35 +01:00
Michael Schurter 8808ab9cea
Merge pull request #4953 from hashicorp/b-script-context-wrapper
consul: add ScriptExecutor context wrapper
2018-12-10 17:22:53 -08:00
Michael Schurter 4c5f3ae82c
Merge pull request #4952 from hashicorp/b-script-context
consul: fix script checks exiting after 1 run
2018-12-10 17:22:15 -08:00
Danielle Tomlinson dff7093243 client: Wait for preempted allocs to terminate
When starting an allocation that is preempting other allocs, we create a
new group allocation watcher, and then wait for the allocations to
terminate in the allocation PreRun hooks.

If there's no preempted allocations, then we simply provide a
NoopAllocWatcher.
2018-12-11 00:59:18 +01:00
Danielle Tomlinson 2cdef6a7b4 allocwatcher: Add Group AllocWatcher
The Group Alloc watcher is an implementation of a PrevAllocWatcher that
can wait for multiple previous allocs before terminating.

This is to be used when running an allocation that is preempting upstream
allocations, and thus only supports being ran with a local alloc watcher.

It also currently requires all of its child watchers to correctly handle
context cancellation. Should this be a problem, it should be fairly easy
to implement a replacement using channels rather than a waitgroup.

It obeys the PrevAllocWatcher interface for convenience, but it may be
better to extract Migration capabilities into a seperate interface for
greater clarity.
2018-12-11 00:58:27 +01:00
Alex Dadgar 457c6eb398 typo 2018-12-10 15:35:26 -08:00
Alex Dadgar 508a3dfa49 merge 087 and 090 changelog 2018-12-10 15:34:21 -08:00
Mahmood Ali fa9b9028a5 Use max 3 precision in displaying floats
When formating floats in `nomad node status`, use a maximum precision of
3.
2018-12-10 12:18:24 -05:00
Mahmood Ali 14668f48d1 device attributes in nomad node status -verbose
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali 9f69b8bfec Rename helper_stats -> helper_devices 2018-12-10 12:18:24 -05:00
Mahmood Ali ced76978bb devices/nvidia: memory state as the summary stat 2018-12-10 12:18:24 -05:00
Mahmood Ali 97829a3f02 fix dtestutil.NewDriverHarness ref 2018-12-08 09:58:23 -05:00
Mahmood Ali 021d3720b5
Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Nick Ethier 35268fdb54
executor: misspell 2018-12-08 01:52:06 -05:00
Nick Ethier 32057b6f7f
Merge pull request #4973 from emate/recover-filerotator-from-io-errors
Recover from any possible io error when invoking Write on FileRotator
2018-12-08 00:05:42 -05:00
Preetha Appan f60c52c8ba
Score combinations of allocs from multiple devices for preemption 2018-12-07 18:35:47 -06:00
Chris Baker 4bbb8106c1 updated memberlist dependency to latest, which is missing NMD-1173 error 2018-12-07 22:15:05 +00:00
Chris Baker 59beae35df nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap) 2018-12-07 22:14:15 +00:00
Alex Dadgar 695fa416a6
Merge pull request #4965 from hashicorp/b-gc-running
Don't GC running but desired stop allocations
2018-12-07 13:36:33 -08:00
Chris Baker 707bac0a7b rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173) 2018-12-07 20:12:55 +00:00
Nick Ethier 86e9c11ec2
executor: don't drop errors when configuring libcontainer cfg, add nil check on resources 2018-12-07 14:03:42 -05:00
Marcin Matlaszek 39eec70f31
Recover from any possible io error when invoking Write on FileRotator
As of now, FileRotator uses bufio.Write under the hood to write data to
configured output file. Due to the way how bufio handles any occurred io
error - saves it into `err` variable never resetting it automatically -
any operation like `Write`, `Flush` etc will become a no-op, returning the very same,
saved error (eg. Out of disk space) even when the problem is fixed (eg. disk
space is available again).

That automatically means that FileRotator will stop writing any logs,
reporting the same error over and over again, even if it's no longer
valid.

This PR fixes it by resetting the bufio Writer, which resets any errors
and tries to write requested data.
2018-12-07 18:22:29 +01:00
Mahmood Ali 7d5b5bb5f9
Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Mahmood Ali 91a67f347d Vendor libcontainer/devices 2018-12-07 09:13:27 -05:00
Nick Ethier 47df1dde10
Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier 19a695308f
executor: fix tests 2018-12-06 21:39:53 -05:00