Noticed that ACL endpoints return 500 status code for user errors. This
is confusing and can lead to false monitoring alerts.
Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message. Codes for now match
HTTP codes to ease reasoning.
```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```
* client: improve group service stanza interpolation and check_restart support
Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.
The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
handle the case where we request a server-id which is this current server
update docs, error on node and server id params
more accurate names for tests
use shared no leader err, formatting
rm bad comment
remove redundant variable
When inferring whether to use TTY, check both stdin and stdout are
terminals.
Otherwise, we get failures like the following:
```
$ nomad alloc exec --job example echo hi
hi
$ echo | nomad alloc exec --job example echo hi
hi
$ nomad alloc exec --job example echo hi | head -n1
failed to exec into task: not a terminal
```
This is the basic code to add the Windows Service Manager hooks to Nomad.
Includes vendoring golang.org/x/sys/windows/svc and added Docs:
* guide for installing as a windows service.
* configuration for logging to file from PR #6429
Nomad web UI currently fails when querying client nodes for allocation
state end endpoints, due to CORS policy.
The issue is that CORS requests that are marked `withCredentials` need
the http server to include a `Access-Control-Allow-Credentials` [1].
But Nomad Task Logs and filesystem requests include authenticating
information and thus marked with `credentials=true`[2][3].
It's worth noting that the browser currently sends credentials and
authentication token to servers anyway; it's just that the response is
not made available to caller nomad ui javascript. For task logs
specifically, nomad ui retries again by querying the web ui address
(typically pointing to a nomad server) which will forward the request
to the nomad client agent appropriately.
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials
[2] 101d0373ee/ui/app/components/task-log.js (L50)
[3] 101d0373ee/ui/app/services/token.js (L25-L39)
Addresses feedback around monitor implementation
subselect on stopCh to prevent blocking forever.
Set up a separate goroutine to check every 3 seconds for dropped
messages.
rename returned ch to avoid confusion
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger
clean up old code
small comment about write
rm old comment about minsize
rename to Monitor
Removes connection logic from monitor command
Keep connection logic in endpoints, use a channel to send results from
monitoring
use new multisink logger and interfaces
small test for dropped messages
update go-hclogger and update sink/intercept logger interfaces
Adds nomad monitor command. Like consul monitor, this command allows you
to stream logs from a nomad agent in real time with a a specified log
level
add endpoint tests
Upgrade go-hclog to latest version
The current version of go-hclog pads log prefixes to equal lengths
so info becomes [INFO ] and debug becomes [DEBUG]. This breaks
hashicorp/logutils/level.go Check function. Upgrading to the latest
version removes this padding and fixes log filtering that uses logutils
Check
AgentMonitor is an endpoint to stream logs for a given agent. It allows
callers to pass in a supplied log level, which may be different than the
agents config allowing for temporary debugging with lower log levels.
Pass in logWriter when setting up Agent
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.
I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
Currently this assumes that a short write will never happen. While these
are improbable in a case where rotation being off a few bytes would
matter, this now correctly tracks the number of written bytes.
Currently this logging implementation is dependent on the order of files
as returned by filepath.Glob, which although internal methods are
documented to be lexographical, does not publicly document this. Here we
defensively resort.
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to. The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.
Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.
Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace. Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release. As such, we maintain current behavior that doesn't require
passing the proper namespace in request. A future major release may
start enforcing checking declared namespace.
This commit introduces a rotating file logger for Nomad Agent Logs. The
logger implementation itself is a lift and shift from Consul, with tests
updated to fit with the Nomad pattern of using require, and not having a
testutil for creating tempdirs cleanly.
Show full ID on individual alloc or node status views. Shortening
the ID isn't very helpful in these cases, and makes looking up the full
id slightly more complicated when user needs to interact with API.
List views are unmodified and show short id unless `-vebose` flag is passed.
Before
```
$ nomad node status -self | head -n2
ID = 21fc51f9
Name = mars-2.local
$ nomad alloc status 15ae54cd | head -n3
ID = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3
Eval ID = a6b15f86
Name = example.cache[0]
```
After:
```
$ nomad node status -self | head -n2
ID = 21fc51f9-fd39-0fa0-fb41-f34c7aa36101
Name = mars-2.local
$ nomad alloc status 15ae54cd | head -n3
ID = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3
Eval ID = a6b15f86-ca8e-e536-b544-4bfb43137ff3
Name = example.cache[0]
```
This fixes two bugs:
First, FS Logs API endpoint only propagated error back to user if it was
encoded with code, which isn't common. Other errors get suppressed and
callers get an empty response with 200 error code. Now, these endpoints
return a 500 status code along with the error message.
Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:21 GMT
< Content-Length: 0
<
* Connection #0 to host 127.0.0.1 left intact
```
After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:48:12 GMT
< Content-Length: 60
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
alloc lookup failed: index error: UUID must be 36 characters
```
Second, we return 400 status code for request validation errors.
Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:29 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:49:18 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
Without a `LocalServicePort`, Connect services will try to use the
mapped port even when delivering traffic locally. A user can override
this behavior by pinning the port value in the `service` stanza but
this prevents us from using the Consul service name to reach the
service.
This commits configures the Consul proxy with its `LocalServicePort`
and `LocalServiceAddress` fields.
Currently when hitting the /v1/agent/self API with ACL Replication
enabled results in the token being returned in the API. This commit
redacts that information, as it should be treated as a shared secret.
Currently, using a Volume in a job uses the following configuration:
```
volume "alias-name" {
type = "volume-type"
read_only = true
config {
source = "host_volume_name"
}
}
```
This commit migrates to the following:
```
volume "alias-name" {
type = "volume-type"
source = "host_volume_name"
read_only = true
}
```
The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.
However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation
As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:
- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
to the plugin name, and a `config` block will
allow you to pass meta to the plugin. Or will
point to a pre-configured ephemeral config.
*If implemented
The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.
In Nomad prior to Consul Connect, all Consul checks work the same
except for Script checks. Because the Task being checked is running in
its own container namespaces, the check is executed by Nomad in the
Task's context. If the Script check passes, Nomad uses the TTL check
feature of Consul to update the check status. This means in order to
run a Script check, we need to know what Task to execute it in.
To support Consul Connect, we need Group Services, and these need to
be registered in Consul along with their checks. We could push the
Service down into the Task, but this doesn't work if someone wants to
associate a service with a task's ports, but do script checks in
another task in the allocation.
Because Nomad is handling the Script check and not Consul anyways,
this moves the script check handling into the task runner so that the
task runner can own the script check's configuration and
lifecycle. This will allow us to pass the group service check
configuration down into a task without associating the service itself
with the task.
When tasks are checked for script checks, we walk back through their
task group to see if there are script checks associated with the
task. If so, we'll spin off script check tasklets for them. The
group-level service and any restart behaviors it needs are entirely
encapsulated within the group service hook.
Consul registers Connect services automatically, however Nomad thinks it
owns them due to the _nomad prefix. Since the services are managed by
Consul, Nomad needs to explicitly ignore them or otherwies they will be
removed.
The dev mode flag for connect was binding to the default interface's
IP, but this makes for a bad user experience for the CLI which will
default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work
without further configuration by the user.
* adds meta object to service in job spec, sends it to consul
* adds tests for service meta
* fix tests
* adds docs
* better hashing for service meta, use helper for copying meta when registering service
* tried to be DRY, but looks like it would be more work to use the
helper function
Fixes#6041
Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
Consul Connect must route traffic between network namespaces through a
public interface (i.e. not localhost). In order to support testing in
dev mode, users needed to manually set the interface which doesn't
make for a smooth experience.
This commit adds a facility for adding optional parameters to the
`nomad agent -dev` flag and uses it to add a `-dev=connect` flag that
binds to a public interface on the host.
When rendering a task template, the `plugin` function is no longer
permitted by default and will raise an error. An operator can opt-in
to permitting this function with the new `template.function_blacklist`
field in the client configuration.
When rendering a task template, path parameters for the `file`
function will be treated as relative to the task directory by
default. Relative paths or symlinks that point outside the task
directory will raise an error. An operator can opt-out of this
protection with the new `template.disable_file_sandbox` field in the
client configuration.
* jobspec: breakup parse.go into smaller files
* add sidecar_task parsing to jobspec and api
* jobspec: combine service parsing logic for task and group service stanzas
* api: use slice of ConsulUpstream values instead of pointers
The `/v1/client/fs/stream endpoint` supports tailing a file by writing
chunks out as they come in. But not all browsers support streams
(ex IE11) so we need to be able to tail a file without streaming.
The fs stream and logs endpoint use the same implementation for
filesystem streaming under the hood, but the fs stream always passes
the `follow` parameter set to true. This adds the same toggle to the
fs stream endpoint that we have for logs. It defaults to true for
backwards compatibility.
If server.enabled is false, we ought to ignore all other values in
the server stanza.
However, I opted to preserve current error when `--bootstrap-expect` is
passed to the CLI when server is not enabled, to maintain current
behavior.
Fixes#5395
Alternative to #5957
Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.
Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!
I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop. If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.
This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.
We attempt to mitigate this by introducing an initialization probation
period. During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone. This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework. Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.
This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin. This is frequently overlooked, resulting in
puzzlement.
The more significant issue with `executor` in particular is in relation
to external driver:
* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.
* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).
This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2]. `init()`
will be invoked and handled properly in tests and external drivers.
For external drivers, this change will cause external drivers to launch
the executor that's compiled against.
There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers. Must avoid having
command execution rely on any other init in the package. I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.
[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.
We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.
This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:
```hcl
service {
name = "redis-cache"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "foo"]
port = "foo"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
service {
name = "redis-cache"
tags = ["global", "bar"]
port = "bar"
check {
name = "alive"
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
}
```
in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.
This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.
This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
Enterprise only.
Disable preemption for service and batch jobs by default.
Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.
I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
* master: (912 commits)
Update redirects.txt
Added redirect for Spark guide link
client: log when server list changes
docs: mention regression in task config validation
fix update to changelog
update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
typo: "atleast" -> "at least"
implement nomad exec for rkt
docs: fixed typo
use pty/tty terminology similar to github.com/kr/pty
vendor github.com/kr/pty
drivers: implement streaming exec for executor based drivers
executors: implement streaming exec
executor: scaffolding for executor grpc handling
client: expose allocated memory per task
client improve a comment in updateNetworks
stalebot: Add 'thinking' as an exempt label (#5684)
Added Sparrow link
update links to use new canonical location
Add redirects for restructing done in GH-5667
...
This adds a websocket endpoint for handling `nomad exec`.
The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.
Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer. Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks. This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.
While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.
This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.
It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.
It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
* updated armon/go-metrics to address race condition in DisplayMetrics
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.
Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
Fixes https://github.com/hashicorp/nomad/issues/5593
Executor seems to die unexpectedly after nomad agent dies or is
restarted. The crash seems to occur at the first log message after
the nomad agent dies.
To ease debugging we forward executor log messages to executor.log as
well as to Stderr. `go-plugin` sets up plugins with Stderr pointing to
a pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE
results into executor process dying.
I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]
This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.
[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.