Addresses feedback around monitor implementation
subselect on stopCh to prevent blocking forever.
Set up a separate goroutine to check every 3 seconds for dropped
messages.
rename returned ch to avoid confusion
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger
clean up old code
small comment about write
rm old comment about minsize
rename to Monitor
Removes connection logic from monitor command
Keep connection logic in endpoints, use a channel to send results from
monitoring
use new multisink logger and interfaces
small test for dropped messages
update go-hclogger and update sink/intercept logger interfaces
Adds nomad monitor command. Like consul monitor, this command allows you
to stream logs from a nomad agent in real time with a a specified log
level
add endpoint tests
Upgrade go-hclog to latest version
The current version of go-hclog pads log prefixes to equal lengths
so info becomes [INFO ] and debug becomes [DEBUG]. This breaks
hashicorp/logutils/level.go Check function. Upgrading to the latest
version removes this padding and fixes log filtering that uses logutils
Check
AgentMonitor is an endpoint to stream logs for a given agent. It allows
callers to pass in a supplied log level, which may be different than the
agents config allowing for temporary debugging with lower log levels.
Pass in logWriter when setting up Agent
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.
I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.
Similar to Kubernetes, we expose 3 options for configuring mount
propagation:
- private, which is equivalent to `rprivate` on Linux, which does not allow the
container to see any new nested mounts after the chroot was created.
- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
that have been created _outside of the container_ to be visible
inside the container after the chroot is created.
- bidirectional, which is equivalent to `rshared` on Linux, which allows both
the container to see new mounts created on the host, but
importantly _allows the container to create mounts that are
visible in other containers an don the host_
private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.
To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
Currently this assumes that a short write will never happen. While these
are improbable in a case where rotation being off a few bytes would
matter, this now correctly tracks the number of written bytes.
Currently this logging implementation is dependent on the order of files
as returned by filepath.Glob, which although internal methods are
documented to be lexographical, does not publicly document this. Here we
defensively resort.
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to. The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.
Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.
Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace. Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release. As such, we maintain current behavior that doesn't require
passing the proper namespace in request. A future major release may
start enforcing checking declared namespace.
This commit introduces a rotating file logger for Nomad Agent Logs. The
logger implementation itself is a lift and shift from Consul, with tests
updated to fit with the Nomad pattern of using require, and not having a
testutil for creating tempdirs cleanly.
This fixes two bugs:
First, FS Logs API endpoint only propagated error back to user if it was
encoded with code, which isn't common. Other errors get suppressed and
callers get an empty response with 200 error code. Now, these endpoints
return a 500 status code along with the error message.
Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:21 GMT
< Content-Length: 0
<
* Connection #0 to host 127.0.0.1 left intact
```
After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start®ion=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:48:12 GMT
< Content-Length: 60
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
alloc lookup failed: index error: UUID must be 36 characters
```
Second, we return 400 status code for request validation errors.
Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:29 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:49:18 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
Without a `LocalServicePort`, Connect services will try to use the
mapped port even when delivering traffic locally. A user can override
this behavior by pinning the port value in the `service` stanza but
this prevents us from using the Consul service name to reach the
service.
This commits configures the Consul proxy with its `LocalServicePort`
and `LocalServiceAddress` fields.
Currently when hitting the /v1/agent/self API with ACL Replication
enabled results in the token being returned in the API. This commit
redacts that information, as it should be treated as a shared secret.
Currently, using a Volume in a job uses the following configuration:
```
volume "alias-name" {
type = "volume-type"
read_only = true
config {
source = "host_volume_name"
}
}
```
This commit migrates to the following:
```
volume "alias-name" {
type = "volume-type"
source = "host_volume_name"
read_only = true
}
```
The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.
However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation
As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:
- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
to the plugin name, and a `config` block will
allow you to pass meta to the plugin. Or will
point to a pre-configured ephemeral config.
*If implemented
The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.