`make check` runs very intensive linters that slow and seem to behave
differently on different machines.
Linting is still a part of our CI and we shouldn't be cutting a release
when CI isn't green anyway.
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.
The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).
This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync. Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
`gotestsum` has user friendlier output that emits final summary, also it can emit junit xml file for
automated analysis instead of current format that should significantly
ease automated analysis of CI.
workaround a regression in 1.11.3
> We are aware of a functionality regression in "go get" when executed in GOPATH mode on an import path pattern containing "..." (e.g., "go get github.com/golang/pkg/..."), when downloading packages not already present in the GOPATH workspace. This is issue golang.org/issue/29241. It will be resolved in the next minor patch releases, Go 1.11.4 and Go 1.10.7, which we plan to release soon. We apologize for any disruption.
Lowering the runtime here to pre 7ca535aa90748caff1522468cc0c4ab672a74abb expectations.
The longest package at the time `client/driver` shrunk significantly,
and now the longest packages take less than 5 minutes.
We do have some long running timed out projects due to a stuck shutdown,
but in completed jobs (though they failed), the longest packages took
less than 5 minutes. The longest running packages in
https://travis-ci.org/hashicorp/nomad/jobs/464640776 were:
```
FAIL github.com/hashicorp/nomad/nomad 268.089s
ok github.com/hashicorp/nomad/drivers/docker 203.903s coverage: 68.8% of statements
ok github.com/hashicorp/nomad/drivers/rkt 132.104s coverage: 65.0% of statements
ok github.com/hashicorp/nomad/api 123.193s coverage: 62.9% of statements
ok github.com/hashicorp/nomad/command/agent 74.657s coverage: 72.3% of statements
ok github.com/hashicorp/nomad/command 63.592s coverage: 42.7% of statements
```
nomad/client take very long and exceed 15m sometimes:
In https://travis-ci.org/hashicorp/nomad/jobs/452990197 :
```
panic: test timed out after 15m0s
goroutine 4739 [running]:
testing.(*M).startAlarm.func1()
/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/testing/testing.go:1296 +0xfd
....
goroutine 4665 [select]:
github.com/hashicorp/nomad/vendor/google.golang.org/grpc.newClientStream.func5(0xc0003dd500, 0xc000420120, 0x2b3f86295588, 0xc000496810)
/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/google.golang.org/grpc/stream.go:287 +0xd7
created by github.com/hashicorp/nomad/vendor/google.golang.org/grpc.newClientStream
/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/google.golang.org/grpc/stream.go:286 +0x842
FAIL github.com/hashicorp/nomad/client/driver 900.036s
```
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
We make a HashiCorp hard fork of the jteeuwen/go-bindata hard fork that
was replaced and diffed the code against a Dec 1, 2015 copy of the
original repository we had as a cross-check of that hard fork.
This replaces references to jteeuwen/go-bindata to point to the
HashiCorp fork.
This commit reworks the Vagrantfile for Nomad in order to support
straightforward testing on more than one operating system, whilst
retaining the ability to stand up a test cluster running Ubuntu.
The following changes are made:
- Scripts have been extracted from the Vagrantfile into their own shell
script files, in order that editors lint them.
- All scripts have been edited to lint with no warnings or errors for
their respective shells.
- Scripts are named according to the operating system and privilege
level which they run. We prefer to run a whole shell script as root
versus prefixing (essentially) every command with `sudo` or an
equivalent.
- The Linux development box has been separated from the test cluster,
removing some of the more gnarly (and less portable) logic. The Linux
development box is still primary and autostarts.
- A FreeBSD target has been added. The base box works for both
Virtualbox and VMWare Fusion.
- A target is added to the GNUmakefile to stand up a test cluster, using
the default provider, or overriding the provider by setting the PROVIDER
variable in make:
- `make testcluster`
- `make testcluster PROVIDER=vmware_fusion`
- Machines in the test cluster have Avahi configured for zeroconf
discovery. Each machine can ping each other machine at `hostname.local`
- for example `nomad-server02.local`, `nomad-client03.local`.
This commit replaces the shell script-driven build process for Nomad
with one based around GNU Make (note we _do_ use GNU-specific
constructs), requiring no additional scripts for common cases of
development. The following targets are implemented:
Per-OS/arch combinations:
Binaries (Host - Mac OS X):
pkg/darwin_amd64/nomad
Binaries (Host - Linux):
pkg/linux_386/nomad
pkg/linux_amd64/nomad
pkg/linux_amd64-lxc/nomad
pkg/linux_arm/nomad
pkg/linux_arm64/nomad
pkg/windows_386/nomad
pkg/windows_amd64/nomad
Packages (Host - Mac OS X):
pkg/darwin_amd64.zip
Packages (Host - Linux):
pkg/linux_386.zip
pkg/linux_amd64.zip
pkg/linux_amd64-lxc.zip
pkg/linux_arm.zip
pkg/linux_arm64.zip
pkg/windows_386.zip
pkg/windows_amd64.zip
Phony targets:
dev - Builds for the current host GOOS/GOARCH (unless overriden
in the environment)
release - Builds all appropriate release packages for the
current host GOOS/GOARCH (i.e. Windows and Linux
packages on a Linux host, Darwin packages on an OSX
host)
generate - Generate code for the current host architecture using
`go generate`.
test - Runs the Nomad test suite
clean - Removes build artifacts
travis - Runs `make test` with the wrapper script to prevent
Travis CI from timing out.
help - Displays usage information about commonly used targets.
Note that there are some semantic differences from the previous version.
1. `generate` is no longer a dependency of `dev` builds. This is because
it causes a rebuild every time, even when no code has changed, since
`go generate` does not appear to leave file timestamps alone.
Regardless, it is insufficient to generate on one host OS - it needs
to be run on each target to ensure everything is generated correctly.
2. `gofmt` is no longer checked. This should be enabled as a linter once
the `gofmt -s` refactoring will pass on the whole code base, in order
to avoid special cased checks versus using go-metalinter.
Example Usages:
Make a development build for the current GOOS/GOARCH:
make dev
Make release build packages appropriate for the host OS:
make release
Update generated code for the host OS:
make generate
Run linting checks:
make check
Build a specific alternative GOOS/GOARCH/tags combination:
make pkg/linux_amd64-pkg/nomad
make pkg/linux_amd64-pkg.zip
Vault is required for the fingerprinting tests but is not currently
installed by the build process. This commit adds a new category of
external tools for test dependencies and `go get`'s them during the
bootstrap.
We also fix the syntax of the Makefile to use tabs throughout.
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
The dev build is far simpler than the release build, so move it to its
own shell script. This simplifies the release build script slightly as
well at the cost of duplicating the version/tag logic.
Also don't even try to check for LXC if not running on Linux. I don't
think we want to try to support cross-compiling LXC from non-Linux
hosts.
We recently ran into an issue on a small percentage of nomad-clients
where the nomad-client was running successfully, but due to a race
condition, could not correctly bind to the docker socket. This caused
all of our nomad jobs to be allocated to a single nomad-client instead
of being spread evenly across our clients. The only way to discover this
was to run `nomad node-status <node>` and count each job allocation per
node.
This can lead to a fairly long debugging process if there are several
nomad-clients. Including the number of allocations for each node in the
`node-status` command would save a large amount of debug time.
```
jake@biscuits [12:08:41] [~]
-> % nomad node-status
ID Datacenter Name Class Drain Status Allocations
2b0aabc5 dc1 biscuits <none> false ready 0
```
```
jake@biscuits [12:08:55] [~]
-> % nomad node-status
ID Datacenter Name Class Drain Status Allocations
2b0aabc5 dc1 biscuits <none> false ready 1
```