In multiregion deployments when ACLs are enabled, the deploymentwatcher needs
an appropriately scoped ACL token with the same `submit-job` rights as the
user who submitted it. The token will already be replicated, so store the
accessor ID so that it can be retrieved by the leader.
Integration points for multiregion jobs to be registered in the enterprise
version of Nomad:
* hook in `Job.Register` for enterprise to send job to peer regions
* remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs
This PR switches the Nomad repository from using govendor to Go modules
for managing dependencies. Aspects of the Nomad workflow remain pretty
much the same. The usual Makefile targets should continue to work as
they always did. The API submodule simply defers to the parent Nomad
version on the repository, keeping the semantics of API versioning that
currently exists.
Fixes#7854
Nomad requires a version of go-getter that is currently in PR (https://github.com/hashicorp/go-getter/pull/256)
We also require some recent bug fix to go-getter around the handling of URL redirects.
Update our vendor'd copy of go-getter to the newly rebased umask changes so that we can incorporate
the latest changes for go-getter.
* changes necessary to support oss licesning shims
revert nomad fmt changes
update test to work with enterprise changes
update tests to work with new ent enforcements
make check
update cas test to use scheduler algorithm
back out preemption changes
add comments
* remove unused method
Also, since github.com/docker/docker is the canonical package names and
is transparently forwarded to github.com/moby/moby, I removed the
moby/moby references in origin.
Pick up a panic fix of f1bd0923b8
Some CirleCI builds fail somewhat mysteriously, e.g. https://circleci.com/gh/hashicorp/nomad/52676 , due to this bug. The test json file emit the following:
```
{"Time":"2020-03-28T12:03:08.602544522Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"panic: send on closed channel\n"}
{"Time":"2020-03-28T12:03:08.602576075Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"\n"}
{"Time":"2020-03-28T12:03:08.602584429Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"goroutine 403 [running]:\n"}
{"Time":"2020-03-28T12:03:08.602590561Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc000a78000, 0xc0009cf160)\n"}
{"Time":"2020-03-28T12:03:08.602598464Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"\t/home/circleci/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x47\n"}
{"Time":"2020-03-28T12:03:08.602604952Z","Action":"output","Package":"github.com/hashicorp/nomad/client/allocrunner","Test":"TestGroupServiceHook_Update08Alloc","Output":"created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually\n"}
```
This change adds Darwin and FreeBSD C code of gopsutil library, that is
needed for these platforms. `shirou/gopsutil` uses some C code that
isn't in a go package, so don't get vendored automatically.
hashicorp/hcl library added some better validation for error and illegal
characters. The diff is primarily improved error reporting. The
parser.go change includes a case where illegal characters were silently
dropped, but now get reported as invalid characters.
Introduce limits to prevent unauthorized users from exhausting all
ephemeral ports on agents:
* `{https,rpc}_handshake_timeout`
* `{http,rpc}_max_conns_per_client`
The handshake timeout closes connections that have not completed the TLS
handshake by the deadline (5s by default). For RPC connections this
timeout also separately applies to first byte being read so RPC
connections with TLS enabled have `rpc_handshake_time * 2` as their
deadline.
The connection limit per client prevents a single remote TCP peer from
exhausting all ephemeral ports. The default is 100, but can be lowered
to a minimum of 26. Since streaming RPC connections create a new TCP
connection (until MultiplexV2 is used), 20 connections are reserved for
Raft and non-streaming RPCs to prevent connection exhaustion due to
streaming RPCs.
All limits are configurable and may be disabled by setting them to `0`.
This also includes a fix that closes connections that attempt to create
TLS RPC connections recursively. While only users with valid mTLS
certificates could perform such an operation, it was added as a
safeguard to prevent programming errors before they could cause resource
exhaustion.
Update github.com/aws/aws-sdk-go and github.com/hashicorp/go-discover to
pick up support for EC2 Metadata Instance Service v2 changes.
Follow up to https://github.com/hashicorp/go-discover/pull/128 .
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger
clean up old code
small comment about write
rm old comment about minsize
rename to Monitor
Removes connection logic from monitor command
Keep connection logic in endpoints, use a channel to send results from
monitoring
use new multisink logger and interfaces
small test for dropped messages
update go-hclogger and update sink/intercept logger interfaces
Adds nomad monitor command. Like consul monitor, this command allows you
to stream logs from a nomad agent in real time with a a specified log
level
add endpoint tests
Upgrade go-hclog to latest version
The current version of go-hclog pads log prefixes to equal lengths
so info becomes [INFO ] and debug becomes [DEBUG]. This breaks
hashicorp/logutils/level.go Check function. Upgrading to the latest
version removes this padding and fixes log filtering that uses logutils
Check
Update multierror to latest as of now. Our version is very old and
dates back to Sep 2015[1]. Here, we aim to pick up a panic fix found in
n https://github.com/hashicorp/go-multierror/pull/11 (Dec 2016).
This is a purely hygiene maintenance change. I am unaware of any causes
of the panic in our current dependencies. Though, some private internal
libraries did rely on the "recent" behavior of go-multierror, and I
aimed to update here to ease our adoption of other libraries later.
[1] d30f09973e
Golang 1.13 is pickier with importpaths and aliasing and fails
compilation currently.
Here, for go-msgpack dependency, we use upstream ugorji/go with a single
change
23165f7bc3
.
For consistency and to ease noticing descripency, I made ugorji/go and
hashicorp/go-msgpack reference the same sha.
This is a dependency management update and has no functional change to
product.
* ar: refactor network bridge config to use go-cni lib
* ar: use eth as the iface prefix for bridged network namespaces
* vendor: update containerd/go-cni package
* ar: update network hook to use TODO contexts when calling configurator
* unnecessary conversion
This upgrades hcl2 library dependency to pick up
https://github.com/hashicorp/hcl2/pull/113 .
Prior to this change, parsing and decoding array attributes containing
invalid errors (e.g. references to unknown variables) are silently
dropped, with `cty.Unknown` being assigned to the bad element. Rather
than showing a type/meaningful error from hcl2, we get a very decrypted
error message from msgpack layer trying to handle `cty.unknown`.
This ensures that we propagate diagnostics correctly and report
meaningful errors to users.
Fixes https://github.com/hashicorp/nomad/issues/5694
Fixes https://github.com/hashicorp/nomad/issues/5680
Our testing so far indicates that ugorji/go/codec maintains backward
compatiblity with the version we are using now, for purposes of Nomad
serialization.
Using latest ugorji/go allows us to get back to using upstream library,
get get the optimizations benefits in RPC paths (including code
generation optimizations).
ugorji/go introduced two significant changes:
* time binary format in debb8e2d2e. Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior
* ugorji/go started honoring `json` tag as well:
v1.1.4 is the latest but has a bug in handling RawString that's fixed in
d09a80c1e0
.
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
* updated armon/go-metrics to address race condition in DisplayMetrics
* Divest api/ package of deps elsewhere in the nomad repo.
This will allow making api/ a module without then pulling in the
external repo, leading to a package name conflict.
This required some migration of tests to an apitests/ folder (can be
moved anywhere as it has no deps on it). It also required some
duplication of code, notably some test helpers from api/ -> apitests/
and part (but not all) of testutil/ -> api/testutil/.
Once there's more separation and an e.g. sdk/ folder those can be
removed in favor of a dep on the sdk/ folder, provided the sdk/ folder
doesn't depend on api/ or /.
* Also remove consul dep from api/ package
* Fix stupid linters
* Some restructuring
* CVE-2019-5736: Update libcontainer depedencies
Libcontainer is vulnerable to a runc container breakout, that was
reported as CVE-2019-5736[1]. Upgrading vendored libcontainer with the fix.
The runc changes are captured in 369b920277 .
[1] https://seclists.org/oss-sec/2019/q1/119
Previously used `github.com/shirou/gopsutil`[1], used some GPL code [2].
This was somewhat unintentional, and was addressed later [3].
Due to being late in the cycle of Nomad release when this is noticed,
and time elapsed since we updated the dependency, we want to be
conservative in our package updates.
As such, we opted to go with forking the repo to use the previously used
version with the GPL removal code commit, done in [4].
[1] 5776ff9c7c
[2] 5776ff9c7c/host/include/smc.c
[3] c95755e4bc
[4] 62d5761ddb
This fixes a bug related to shutting down of GRPC plugin interfaces
(more info: https://github.com/hashicorp/go-plugin/pull/88)
This does not yet fix all test cases for subprocess leaking, but is a
useful independant change.
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
* client/executor: refactor client to remove interpolation
* executor: POC libcontainer based executor
* vendor: use hashicorp libcontainer fork
* vendor: add libcontainer/nsenter dep
* executor: updated executor interface to simplify operations
* executor: implement logging pipe
* logmon: new logmon plugin to manage task logs
* driver/executor: use logmon for log management
* executor: fix tests and windows build
* executor: fix logging key names
* executor: fix test failures
* executor: add config field to toggle between using libcontainer and standard executors
* logmon: use discover utility to discover nomad executable
* executor: only call libcontainer-shim on main in linux
* logmon: use seperate path configs for stdout/stderr fifos
* executor: windows fixes
* executor: created reusable pid stats collection utility that can be used in an executor
* executor: update fifo.Open calls
* executor: fix build
* remove executor from docker driver
* executor: Shutdown func to kill and cleanup executor and its children
* executor: move linux specific universal executor funcs to seperate file
* move logmon initialization to a task runner hook
* client: doc fixes and renaming from code review
* taskrunner: use shared config struct for logmon fifo fields
* taskrunner: logmon only needs to be started once per task