The dev mode flag for connect was binding to the default interface's
IP, but this makes for a bad user experience for the CLI which will
default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work
without further configuration by the user.
* adds meta object to service in job spec, sends it to consul
* adds tests for service meta
* fix tests
* adds docs
* better hashing for service meta, use helper for copying meta when registering service
* tried to be DRY, but looks like it would be more work to use the
helper function
Fixes#6041
Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
Consul Connect must route traffic between network namespaces through a
public interface (i.e. not localhost). In order to support testing in
dev mode, users needed to manually set the interface which doesn't
make for a smooth experience.
This commit adds a facility for adding optional parameters to the
`nomad agent -dev` flag and uses it to add a `-dev=connect` flag that
binds to a public interface on the host.
When rendering a task template, the `plugin` function is no longer
permitted by default and will raise an error. An operator can opt-in
to permitting this function with the new `template.function_blacklist`
field in the client configuration.
When rendering a task template, path parameters for the `file`
function will be treated as relative to the task directory by
default. Relative paths or symlinks that point outside the task
directory will raise an error. An operator can opt-out of this
protection with the new `template.disable_file_sandbox` field in the
client configuration.
* jobspec: breakup parse.go into smaller files
* add sidecar_task parsing to jobspec and api
* jobspec: combine service parsing logic for task and group service stanzas
* api: use slice of ConsulUpstream values instead of pointers
The `/v1/client/fs/stream endpoint` supports tailing a file by writing
chunks out as they come in. But not all browsers support streams
(ex IE11) so we need to be able to tail a file without streaming.
The fs stream and logs endpoint use the same implementation for
filesystem streaming under the hood, but the fs stream always passes
the `follow` parameter set to true. This adds the same toggle to the
fs stream endpoint that we have for logs. It defaults to true for
backwards compatibility.
If server.enabled is false, we ought to ignore all other values in
the server stanza.
However, I opted to preserve current error when `--bootstrap-expect` is
passed to the CLI when server is not enabled, to maintain current
behavior.
Fixes#5395
Alternative to #5957
Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.
Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!
I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop. If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.
This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.
We attempt to mitigate this by introducing an initialization probation
period. During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone. This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework. Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.
This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin. This is frequently overlooked, resulting in
puzzlement.
The more significant issue with `executor` in particular is in relation
to external driver:
* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.
* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).
This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2]. `init()`
will be invoked and handled properly in tests and external drivers.
For external drivers, this change will cause external drivers to launch
the executor that's compiled against.
There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers. Must avoid having
command execution rely on any other init in the package. I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.
[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.
We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.
This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:
```hcl
service {
name = "redis-cache"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "foo"]
port = "foo"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "bar"]
port = "bar"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
```
in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.
This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.
This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
Enterprise only.
Disable preemption for service and batch jobs by default.
Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.
I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
* master: (912 commits)
Update redirects.txt
Added redirect for Spark guide link
client: log when server list changes
docs: mention regression in task config validation
fix update to changelog
update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
typo: "atleast" -> "at least"
implement nomad exec for rkt
docs: fixed typo
use pty/tty terminology similar to github.com/kr/pty
vendor github.com/kr/pty
drivers: implement streaming exec for executor based drivers
executors: implement streaming exec
executor: scaffolding for executor grpc handling
client: expose allocated memory per task
client improve a comment in updateNetworks
stalebot: Add 'thinking' as an exempt label (#5684)
Added Sparrow link
update links to use new canonical location
Add redirects for restructing done in GH-5667
...
This adds a websocket endpoint for handling `nomad exec`.
The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.
Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer. Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks. This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.
While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.
This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.
It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.
It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
* updated armon/go-metrics to address race condition in DisplayMetrics
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
Fixes https://github.com/hashicorp/nomad/issues/5593
Executor seems to die unexpectedly after nomad agent dies or is
restarted. The crash seems to occur at the first log message after
the nomad agent dies.
To ease debugging we forward executor log messages to executor.log as
well as to Stderr. `go-plugin` sets up plugins with Stderr pointing to
a pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE
results into executor process dying.
I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]
This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.
[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
A common issue when using nomad is needing to add in the object verb to
a command to include the `-verbose` flag.
This commit allows users to pass `-verbose` via the `nomad status` alias by
adding a placeholder boolean in the metacommand which allows subcommands
to parse the flag.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes#2359
Cloess #1180
This fixes a bug with JSON agent configuration parsing where the AST
for the plugin stanza had unnecessary flattening originating from hcl parsing
library. The workaround fixes the AST by popping off the flattened element and wrapping
it in a list. The workaround comes from similar code in terraform.
There were no existing test cases for json parsing so I added a few.
This commit uses the go-colorable library to enable support for coloured
UI output on Windows. This acts as a compatibility layer that takes
standard unix-y terminal codes and translates them into the requisite
windows calls as required.
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends. This number is closer
what Docker reports.
Related to https://github.com/hashicorp/nomad/issues/5165 .
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.