Fixes#2478#2474#1995#2294
The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.
The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.
Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.
Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.
Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.
Fixes https://github.com/hashicorp/nomad/issues/2534
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.
Fixes https://github.com/hashicorp/nomad/issues/2160
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal
Also adds some missing fields to the API version of the evaluation.
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.
This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.
The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations. However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.
The logging has been improved to make the problem more clear as well.
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.
Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.
Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.
Currently Prestart does two things:
* Any driver specific task environment building
* Download Docker images
This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
This PR disallows stale queries when deriving a Vault token. Allowing
stale queries could result in the allocation not existing on the server
that is servicing the request.