Fixes#5395
Alternative to #5957
Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.
Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!
I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop. If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.
This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.
We attempt to mitigate this by introducing an initialization probation
period. During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone. This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework. Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.
This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin. This is frequently overlooked, resulting in
puzzlement.
The more significant issue with `executor` in particular is in relation
to external driver:
* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.
* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).
This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2]. `init()`
will be invoked and handled properly in tests and external drivers.
For external drivers, this change will cause external drivers to launch
the executor that's compiled against.
There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers. Must avoid having
command execution rely on any other init in the package. I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.
[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.
We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.
This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:
```hcl
service {
name = "redis-cache"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "foo"]
port = "foo"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "bar"]
port = "bar"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
```
in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.
This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.
This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
Enterprise only.
Disable preemption for service and batch jobs by default.
Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.
I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
* master: (912 commits)
Update redirects.txt
Added redirect for Spark guide link
client: log when server list changes
docs: mention regression in task config validation
fix update to changelog
update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
typo: "atleast" -> "at least"
implement nomad exec for rkt
docs: fixed typo
use pty/tty terminology similar to github.com/kr/pty
vendor github.com/kr/pty
drivers: implement streaming exec for executor based drivers
executors: implement streaming exec
executor: scaffolding for executor grpc handling
client: expose allocated memory per task
client improve a comment in updateNetworks
stalebot: Add 'thinking' as an exempt label (#5684)
Added Sparrow link
update links to use new canonical location
Add redirects for restructing done in GH-5667
...
This adds a websocket endpoint for handling `nomad exec`.
The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.
Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer. Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks. This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.
While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.
This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.
It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.
It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
* updated armon/go-metrics to address race condition in DisplayMetrics
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
Fixes https://github.com/hashicorp/nomad/issues/5593
Executor seems to die unexpectedly after nomad agent dies or is
restarted. The crash seems to occur at the first log message after
the nomad agent dies.
To ease debugging we forward executor log messages to executor.log as
well as to Stderr. `go-plugin` sets up plugins with Stderr pointing to
a pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE
results into executor process dying.
I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]
This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.
[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
A common issue when using nomad is needing to add in the object verb to
a command to include the `-verbose` flag.
This commit allows users to pass `-verbose` via the `nomad status` alias by
adding a placeholder boolean in the metacommand which allows subcommands
to parse the flag.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes#2359
Cloess #1180
This fixes a bug with JSON agent configuration parsing where the AST
for the plugin stanza had unnecessary flattening originating from hcl parsing
library. The workaround fixes the AST by popping off the flattened element and wrapping
it in a list. The workaround comes from similar code in terraform.
There were no existing test cases for json parsing so I added a few.
This commit uses the go-colorable library to enable support for coloured
UI output on Windows. This acts as a compatibility layer that takes
standard unix-y terminal codes and translates them into the requisite
windows calls as required.
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends. This number is closer
what Docker reports.
Related to https://github.com/hashicorp/nomad/issues/5165 .
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
The whole approach to monitoring drains has ordering issues and lacks
state to output useful error messages.
AFAICT to get the tests passing reliably I needed to change the behavior
of monitoring.
Parts of these tests are skipped in CI, and they should be rewritten as
e2e tests.
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.
Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
* master: (71 commits)
Fix output of 'nomad deployment fail' with no arg
Always create a running allocation when testing task state
tests: ensure exec tests pass valid task resources (#4992)
some changes for more idiomatic code
fix iops related tests
fixed bug in loop delay
gofmt
improved code for readability
client: updateAlloc release lock after read
fixup! device attributes in `nomad node status -verbose`
drivers/exec: support device binds and mounts
fix iops bug and increase test matrix coverage
tests: tag image explicitly
changelog
ci: install lxc-templates explicitly
tests: skip checking rdma cgroup
ci: use Ubuntu 16.04 (Xenial) in TravisCI
client: update driver info on new fingerprint
drivers/docker: enforce volumes.enabled (#4983)
client: Style: use fluent style for building loggers
...
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.
I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take
a timeout duration instead of a context. This broke the script check
removal code which used context cancelation propagation to remove
script checks while they were executing.
This commit adds a wrapper around ScriptExecutors that obeys context
cancelation again. The only downside is that it leaks a goroutine until
the underlying Exec call completes or timeouts.
Since check removal is relatively rare, check timeouts usually low, and
scripts usually fast, the risk of leaking a goroutine seems very small.
Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3
The removal of the inner context made the remaining cancels cancel the
outer context and cause script checks to exit prematurely.
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.
TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver. Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.
Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.
Fix an issue in which the deployment watcher would fail the deployment
based on the earliest progress deadline of the deployment regardless of
if the task group has finished.
Further fix an issue where the blocked eval optimization would make it
so no evals were created to progress the deployment. To reproduce this
issue, prior to this commit, you can create a job with two task groups.
The first group has count 1 and resources such that it can not be
placed. The second group has count 3, max_parallel=1, and can be placed.
Run this first and then update the second group to do a deployment. It
will place the first of three, but never progress since there exists a
blocked eval. However, that doesn't capture the fact that there are two
groups being deployed.
Although the really exciting change is making WaitForRunning return the
allocations that it started. This should cut down test boilerplate
significantly.