Commit graph

13686 commits

Author SHA1 Message Date
Mahmood Ali 3babda5d45 tests: no need for buffer channel 2018-12-11 09:35:26 -05:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Mahmood Ali 23c07b9afe tests: update stop/kill tests with new pattern
Update rawexec and rkt stop/kill tests with the patterns introduced in
7a49e9b68e519050a0c2ef0b67c33503bfbc51be.  This implementation should be
more resilient to discrepancy between task stopping and task being marked as exited.
2018-12-11 09:35:26 -05:00
Mahmood Ali 4635168f20 test: fix TestFingerprintManager_Run_Combination
Let's use a fingerprinter that doesn't have values prepopulated in test
fixtures.
2018-12-11 09:35:26 -05:00
Mahmood Ali 8453ce7d56 tests: setup libcontainer rootfs
Using statically linked busybox binary to setup a basic rootfs for
testing, by symlinking it to provide the basic commands used in tests.

I considered using a proper rootfs tarball, but the overhead of managing
tarfile and expanding it seems significant enough that I went with this
implementation.
2018-12-11 09:35:26 -05:00
Mahmood Ali 994b9d967c tests: Lower package runtime
Lowering the runtime here to pre 7ca535aa90748caff1522468cc0c4ab672a74abb expectations.

The longest package at the time `client/driver` shrunk significantly,
and now the longest packages take less than 5 minutes.

We do have some long running timed out projects due to a stuck shutdown,
but in completed jobs (though they failed), the longest packages took
less than 5 minutes.  The longest running packages in
https://travis-ci.org/hashicorp/nomad/jobs/464640776 were:

```
FAIL  github.com/hashicorp/nomad/nomad                                   268.089s
ok    github.com/hashicorp/nomad/drivers/docker                          203.903s  coverage:  68.8%   of  statements
ok    github.com/hashicorp/nomad/drivers/rkt                             132.104s  coverage:  65.0%   of  statements
ok    github.com/hashicorp/nomad/api                                     123.193s  coverage:  62.9%   of  statements
ok    github.com/hashicorp/nomad/command/agent                           74.657s   coverage:  72.3%   of  statements
ok    github.com/hashicorp/nomad/command                                 63.592s   coverage:  42.7%   of  statements
```
2018-12-11 09:35:26 -05:00
Danielle Tomlinson 6fb5ca6ad5 allocrunner: Test alloc runners should include a noop migrator 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 4b4b85e3f4 allocwatcher: Cleanup new migrator/watcher interface 2018-12-11 13:12:35 +01:00
Danielle Tomlinson 83720575de client: Unify handling of previous and preempted allocs 2018-12-11 13:12:35 +01:00
Michael Schurter 8808ab9cea
Merge pull request #4953 from hashicorp/b-script-context-wrapper
consul: add ScriptExecutor context wrapper
2018-12-10 17:22:53 -08:00
Michael Schurter 4c5f3ae82c
Merge pull request #4952 from hashicorp/b-script-context
consul: fix script checks exiting after 1 run
2018-12-10 17:22:15 -08:00
Danielle Tomlinson dff7093243 client: Wait for preempted allocs to terminate
When starting an allocation that is preempting other allocs, we create a
new group allocation watcher, and then wait for the allocations to
terminate in the allocation PreRun hooks.

If there's no preempted allocations, then we simply provide a
NoopAllocWatcher.
2018-12-11 00:59:18 +01:00
Danielle Tomlinson 2cdef6a7b4 allocwatcher: Add Group AllocWatcher
The Group Alloc watcher is an implementation of a PrevAllocWatcher that
can wait for multiple previous allocs before terminating.

This is to be used when running an allocation that is preempting upstream
allocations, and thus only supports being ran with a local alloc watcher.

It also currently requires all of its child watchers to correctly handle
context cancellation. Should this be a problem, it should be fairly easy
to implement a replacement using channels rather than a waitgroup.

It obeys the PrevAllocWatcher interface for convenience, but it may be
better to extract Migration capabilities into a seperate interface for
greater clarity.
2018-12-11 00:58:27 +01:00
Alex Dadgar 457c6eb398 typo 2018-12-10 15:35:26 -08:00
Alex Dadgar 508a3dfa49 merge 087 and 090 changelog 2018-12-10 15:34:21 -08:00
Mahmood Ali fa9b9028a5 Use max 3 precision in displaying floats
When formating floats in `nomad node status`, use a maximum precision of
3.
2018-12-10 12:18:24 -05:00
Mahmood Ali 14668f48d1 device attributes in nomad node status -verbose
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali 9f69b8bfec Rename helper_stats -> helper_devices 2018-12-10 12:18:24 -05:00
Mahmood Ali ced76978bb devices/nvidia: memory state as the summary stat 2018-12-10 12:18:24 -05:00
Mahmood Ali 97829a3f02 fix dtestutil.NewDriverHarness ref 2018-12-08 09:58:23 -05:00
Mahmood Ali 021d3720b5
Merge pull request #4950 from hashicorp/b-exc-libcontainer-kill
executor: kill all container processes
2018-12-08 09:52:42 -05:00
Nick Ethier 35268fdb54
executor: misspell 2018-12-08 01:52:06 -05:00
Nick Ethier 32057b6f7f
Merge pull request #4973 from emate/recover-filerotator-from-io-errors
Recover from any possible io error when invoking Write on FileRotator
2018-12-08 00:05:42 -05:00
Preetha Appan f60c52c8ba
Score combinations of allocs from multiple devices for preemption 2018-12-07 18:35:47 -06:00
Chris Baker 4bbb8106c1 updated memberlist dependency to latest, which is missing NMD-1173 error 2018-12-07 22:15:05 +00:00
Chris Baker 59beae35df nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap) 2018-12-07 22:14:15 +00:00
Alex Dadgar 695fa416a6
Merge pull request #4965 from hashicorp/b-gc-running
Don't GC running but desired stop allocations
2018-12-07 13:36:33 -08:00
Chris Baker 707bac0a7b rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173) 2018-12-07 20:12:55 +00:00
Nick Ethier 86e9c11ec2
executor: don't drop errors when configuring libcontainer cfg, add nil check on resources 2018-12-07 14:03:42 -05:00
Marcin Matlaszek 39eec70f31
Recover from any possible io error when invoking Write on FileRotator
As of now, FileRotator uses bufio.Write under the hood to write data to
configured output file. Due to the way how bufio handles any occurred io
error - saves it into `err` variable never resetting it automatically -
any operation like `Write`, `Flush` etc will become a no-op, returning the very same,
saved error (eg. Out of disk space) even when the problem is fixed (eg. disk
space is available again).

That automatically means that FileRotator will stop writing any logs,
reporting the same error over and over again, even if it's no longer
valid.

This PR fixes it by resetting the bufio Writer, which resets any errors
and tries to write requested data.
2018-12-07 18:22:29 +01:00
Mahmood Ali 7d5b5bb5f9
Merge pull request #4933 from hashicorp/f-mount-device
Mount Devices in container based drivers
2018-12-07 10:32:03 -05:00
Mahmood Ali 91a67f347d Vendor libcontainer/devices 2018-12-07 09:13:27 -05:00
Nick Ethier 47df1dde10
Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier 19a695308f
executor: fix tests 2018-12-06 21:39:53 -05:00
Nick Ethier 913efed9f5
executor: fix broken non-linux build 2018-12-06 21:33:20 -05:00
Nick Ethier 2283cb2c39
executor: use drivers.Resources as resource model 2018-12-06 21:22:02 -05:00
Nick Ethier 29ef54c0ee
executor: merge plugin shim with executor package 2018-12-06 21:13:45 -05:00
Nick Ethier 71353a88d4
executor: remove structs package 2018-12-06 20:54:14 -05:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Danielle Tomlinson 8100252116
Merge pull request #4960 from hashicorp/dani/b-gc-tests
Re-enable Client GC tests
2018-12-06 23:18:36 +01:00
Mahmood Ali a7b205daf2
Merge pull request #4955 from hashicorp/fix-docker-tests-20181203
Fix docker driver tests
2018-12-06 16:41:33 -05:00
Danielle Tomlinson e3621c55fa gc: Fix maxallocs integration test 2018-12-06 21:50:50 +01:00
Mahmood Ali 9e825f880c Use absolute path in example device plugin
deviceDir is used for specifying mount/device host paths, and those
should be absolute paths.
2018-12-06 15:46:35 -05:00
Mahmood Ali bdc53b1d8e driver/rkt: mount plugin devices 2018-12-06 15:46:35 -05:00
Mahmood Ali 2c0fd2a902 driver/lxc: mount plugin devices
Also, LXC requires target paths to be relative.  Container paths in LXC
binds should never be absolute paths, so we strip any preceeding `/`,
even if a user sets one.
2018-12-06 15:46:35 -05:00
Mahmood Ali 699875eb1c fixup: add missed docker utils test 2018-12-06 15:46:35 -05:00
Mahmood Ali e9557ae596 tests: ensure image is loaded as test setup 2018-12-06 15:36:43 -05:00
Alex Dadgar c4b5f80918 Make alloc health watcher a postrun hook rather than shutdown hook 2018-12-06 12:30:31 -08:00
Michael Lange 81c2d8b4a2
Merge pull request #4967 from hashicorp/b-ui-stat-charts-can-escape-canvas
UI: Keep line charts in their canvases at all times
2018-12-06 10:56:37 -08:00