Commit Graph

2778 Commits

Author SHA1 Message Date
Lang Martin fb6c27b828 test: build quota_apply_test, remove the tests that require ent 2019-12-17 13:20:14 -05:00
Drew Bailey d9e41d2880
docs for shutdown delay
update docs, address pr comments

ensure pointer is not nil

use pointer for diff tests, set vs unset
2019-12-16 11:38:35 -05:00
Drew Bailey 24929776a2
shutdown delay for task groups
copy struct values

ensure groupserviceHook implements RunnerPreKillhook

run deregister first

test that shutdown times are delayed

move magic number into variable
2019-12-16 11:38:16 -05:00
Mahmood Ali 76be9b4afb cli: sequence cli.Ui operations
Fixes a bug where if a command flag parsing errors, the resulting error
and help usage messages get interleaved in unexpected and non-user
friendly way.

The reason is that we have flag parsing library effectively writes to
ui.Error in a goroutine.  This is problematic: first, we lose the sequencing between help
usage and error message; second, cli.Ui methods are not concurrent safe.

Here, we introduce a custom error writer that buffers result and calls
ui.Error() in the write method and in the same goroutine.

For context, we need to wrap ui.Error because it's line-oriented, while
flags library expects a io.Writer which is bytes oriented.
2019-12-16 10:08:17 -05:00
Danielle 246a4e898b
Merge pull request #6828 from hashicorp/b/nomad-monitor-panic
command: error when no node is found for `monitor`
2019-12-10 14:29:32 +01:00
Danielle Lancashire cd764ab0e9
command: error when no node is found for `monitor`
Currently `nomad monitor -node-id` will panic when a node-id does not
match any nodes, as there is no empty result bounds checking. Here we
return an error to the user when no nodes are found.
2019-12-10 13:10:47 +01:00
Seth Hoenig f0c3dca49c tests: swap lib/freeport for tweaked helper/freeport
Copy the updated version of freeport (sdk/freeport), and tweak it for use
in Nomad tests. This means staying below port 10000 to avoid conflicts with
the lib/freeport that is still transitively used by the old version of
consul that we vendor. Also provide implementations to find ephemeral ports
of macOS and Windows environments.

Ports acquired through freeport are supposed to be returned to freeport,
which this change now also introduces. Many tests are modified to include
calls to a cleanup function for Server objects.

This should help quite a bit with some flakey tests, but not all of them.
Our port problems will not go away completely until we upgrade our vendor
version of consul. With Go modules, we'll probably do a 'replace' to swap
out other copies of freeport with the one now in 'nomad/helper/freeport'.
2019-12-09 08:37:32 -06:00
Michael Schurter 3008473f9b
Merge branch 'master' into release-0102 2019-12-04 14:13:34 -08:00
Mahmood Ali 7b8cfee162 tests: deflake TestHTTP_FreshClientAllocMetrics
The test asserts that alloc counts get reported accurately in metrics by
inspecting the metrics endpoint directly.  Sadly, the metrics as
collected by `armon/go-metrics` seem to be stateful and may contain info
from other tests.

This means that the test can fail depending on the order of returned
metrics.

Inspecting the metrics output of one failing run, you can see the
duplicate guage entries but for different node_ids:

```
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 10,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a"
      }
    },
    {
      "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal",
      "Value": 0,
      "Labels": {
        "datacenter": "dc1",
        "node_class": "none",
        "node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c"
      }
    },
```
2019-11-22 18:41:21 -05:00
Nomad Release bot db6420367d Generate files for 0.10.2-rc1 release 2019-11-22 18:42:49 +00:00
Drew Bailey b45ce9e997
add server-id to -h output 2019-11-21 16:04:28 -05:00
Drew Bailey b3765b06ea
add server-id to -h output 2019-11-21 16:01:09 -05:00
Drew Bailey 6d5156bbba
Allows a node uuid prefix to be passed in 2019-11-21 15:15:41 -05:00
Drew Bailey 7ca6dbe61e
Allows a node uuid prefix to be passed in 2019-11-21 14:51:48 -05:00
Lang Martin 069e9a624b command: quota init writes files with a network limit 2019-11-20 17:59:55 -06:00
Lang Martin d2fc279af4 command: quota status reports network usage 2019-11-20 17:59:34 -06:00
Lang Martin f45bebdb66 command: quota init writes files with a network limit 2019-11-20 18:44:06 -05:00
Lang Martin 2e2c662977 command: quota status reports network usage 2019-11-20 18:44:06 -05:00
Michael Schurter 48239d7f2e
Merge pull request #6017 from hashicorp/f-policy-json
api: Add parsed rules to policy response
2019-11-20 15:31:03 -08:00
Mahmood Ali be6c60455e
Merge pull request #6669 from hashicorp/b-cors-allow-credentials
Allow UI to query client directly for task logs/state
2019-11-20 15:14:01 -05:00
Buck Doyle db77a24ed3
Merge branch 'master' into f-policy-json 2019-11-20 11:20:07 -06:00
Michael Schurter ecf970b5a5
Merge pull request #6370 from pmcatominey/tls-server-name
command: add -tls-server-name flag
2019-11-20 08:44:54 -08:00
Preetha 42c1c85285
Merge pull request #6421 from hashicorp/b-acl-bootstrap-codes
api: acl bootstrap errors aren't 500
2019-11-20 10:36:08 -06:00
Preetha be4a51d5b8
Merge pull request #6349 from hashicorp/b-host-stats
client: Return empty values when host stats fail
2019-11-20 10:13:02 -06:00
Buck Doyle 7e3188a4ea
CLI: Remove duplicated error output (#6738) 2019-11-19 16:05:53 -06:00
Mahmood Ali 97974c4b13
Merge pull request #6684 from hashicorp/b-nomad-exec-stdout-tty
nomad exec: check stdout for tty as well
2019-11-19 15:55:21 -05:00
Mahmood Ali 6f8bb5e90b api: acl bootstrap errors aren't 500
Noticed that ACL endpoints return 500 status code for user errors.  This
is confusing and can lead to false monitoring alerts.

Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message.  Codes for now match
HTTP codes to ease reasoning.

```
$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9))

$ nomad acl bootstrap
Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9))
```
2019-11-19 15:51:57 -05:00
Tim Gross 1210261fe2
hclfmt nomad jobspecs (#6724) 2019-11-19 10:36:41 -05:00
Nick Ethier bd454a4c6f
client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Drew Bailey 9b63828658
serverID to target remote leader or server
handle the case where we request a server-id which is this current server

update docs, error on node and server id params

more accurate names for tests

use shared no leader err, formatting

rm bad comment

remove redundant variable
2019-11-14 10:07:35 -05:00
Drew Bailey b644e1f47d
add server-id to monitor specific server 2019-11-14 09:53:41 -05:00
Drew Bailey acd97d0731
Merge pull request #6670 from hashicorp/api/fallthrough-test
test rootfallthrough handler
2019-11-13 10:51:31 -05:00
Lars Lehtonen 1dbf44bc40 command/agent: Prune Dead Code (#6682)
* remove unused MockPeriodicJob() from tests
* remove unused getIndex() from tests
* remove unused checkIndex() from tests
* remove unused assertIndex() from tests
* remove unused Agent.findLoopbackDevice()
2019-11-13 08:20:01 -05:00
Lars Lehtonen e85509c466 command: error handling before file close (#6681) 2019-11-13 08:18:20 -05:00
Drew Bailey f5310ff63f
fix so assertions are test case driven 2019-11-12 14:28:21 -05:00
Mahmood Ali 591cb75ee4 nomad exec: check stdout for tty as well
When inferring whether to use TTY, check both stdin and stdout are
terminals.

Otherwise, we get failures like the following:

```
$ nomad alloc exec --job example echo hi
hi
$ echo | nomad alloc exec --job example echo hi
hi
$ nomad alloc exec --job example echo hi | head -n1
failed to exec into task: not a terminal
```
2019-11-12 11:39:06 -05:00
Lars Lehtonen 98d3e47b32 command: fix TestHelpers_LineLimitReader_TimeLimit() goroutine (#6678) 2019-11-12 08:35:11 -05:00
Charlie Voiselle 835831a3d8 Added service wrapper code (#6220)
This is the basic code to add the Windows Service Manager hooks to Nomad.

Includes vendoring golang.org/x/sys/windows/svc and added Docs:
* guide for installing as a windows service.
* configuration for logging to file from PR #6429
2019-11-11 15:16:07 -05:00
Drew Bailey f989f38594
test /ui/ path 2019-11-11 12:12:42 -05:00
Drew Bailey a0548824f3
test rootfallthrough handler 2019-11-11 12:08:44 -05:00
Mahmood Ali b2145f2d02 Allow UI to query client directly
Nomad web UI currently fails when querying client nodes for allocation
state end endpoints, due to CORS policy.

The issue is that CORS requests that are marked `withCredentials` need
the http server to include a `Access-Control-Allow-Credentials` [1].

But Nomad Task Logs and filesystem requests include authenticating
information and thus marked with `credentials=true`[2][3].

It's worth noting that the browser currently sends credentials and
authentication token to servers anyway; it's just that the response is
not made available to caller nomad ui javascript.  For task logs
specifically, nomad ui retries again by querying the web ui address
(typically pointing to a nomad server) which will forward the request
to the nomad client agent appropriately.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials
[2] 101d0373ee/ui/app/components/task-log.js (L50)
[3] 101d0373ee/ui/app/services/token.js (L25-L39)
2019-11-11 15:13:30 +00:00
Lars Lehtonen 08d5342812 command/agent: TestAgent_ServerConfig() fix dropped errors (#6659) 2019-11-11 09:46:46 -05:00
Drew Bailey 04439a5a78
better func name, swap conditional 2019-11-11 08:35:56 -05:00
Drew Bailey c85df2dac7
returns a 404 if not found instead of redirect to ui 2019-11-08 15:34:35 -05:00
Tim Gross 4909adb32c
fix broken test expectation from message change (#6635) 2019-11-06 16:33:13 -05:00
Drew Bailey 7b2ad28ef6
unlock before returning, no need for label
comment, trigger build

return length written
2019-11-05 11:44:29 -05:00
Drew Bailey d3b48a3e45
simplify logch goroutine 2019-11-05 11:44:28 -05:00
Drew Bailey df57f70a68
wireup plain=true|false query param 2019-11-05 11:44:28 -05:00
Drew Bailey f4a7e3dc75
coordinate closing of doneCh, use interface to simplify callers
comments
2019-11-05 11:44:26 -05:00
Drew Bailey fe542680dc
log-json -> json
fix typo command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

Update command/agent/monitor/monitor.go

Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>

address feedback, lock to prevent send on closed channel

fix lock/unlock for dropped messages
2019-11-05 09:51:59 -05:00
Drew Bailey 84c8e79f90
simplify assert message 2019-11-05 09:51:56 -05:00
Drew Bailey 8726b685de
address feedback 2019-11-05 09:51:56 -05:00
Drew Bailey e4b3e1d7d4
allow more time for streaming message
remove unused struct
2019-11-05 09:51:55 -05:00
Drew Bailey 318b6c91bf
monitor command takes no args
rm extra new line

fix lint errors

return after close

fix, simplify test
2019-11-05 09:51:55 -05:00
Drew Bailey 0e759c401c
moving endpoints over to frames 2019-11-05 09:51:54 -05:00
Drew Bailey c7b633b6c1
lock in sub select
rm redundant lock

wip to use framing

wip switch to stream frames
2019-11-05 09:51:54 -05:00
Drew Bailey fb23c1325d
fix deadlock issue, switch to frames envelope 2019-11-05 09:51:54 -05:00
Drew Bailey 32f62edbb0
return 400 if invalid log_json param is given
Addresses feedback around monitor implementation

subselect on stopCh to prevent blocking forever.

Set up a separate goroutine to check every 3 seconds for dropped
messages.

rename returned ch to avoid confusion
2019-11-05 09:51:53 -05:00
Drew Bailey 17d876d5ef
rename function, initialize log level better
underscores instead of dashes for query params
2019-11-05 09:51:53 -05:00
Drew Bailey 8e3915c7fc
use channel instead of empty string to determine close 2019-11-05 09:51:52 -05:00
Drew Bailey da6229d704
update go-hclog dep
remove duplicate lock
2019-11-05 09:51:52 -05:00
Drew Bailey db65b1f4a5
agent:read acl policy for monitor 2019-11-05 09:51:52 -05:00
Drew Bailey f46fd5b3e1
only look up rpchandler for node if we have nodeid
fix some comments and nomad monitor -h output
2019-11-05 09:51:51 -05:00
Drew Bailey 3b9c33a5f0
new hclog with standardlogger intercept 2019-11-05 09:51:49 -05:00
Drew Bailey a45ae1cd58
enable json formatting, use queryoptions 2019-11-05 09:51:49 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Drew Bailey e076204820
get local rpc endpoint working 2019-11-05 09:51:48 -05:00
Drew Bailey 976c43157c
remove log_writer
prefix output with proper spacing

update gzip handler, adjust first byte flow to allow gzip handler bypass

wip, first stab at wiring up rpc endpoint
2019-11-05 09:51:48 -05:00
Drew Bailey 0de94466b2
Display error when remote side ended monitor
multisink logger

remove usage of logwriter
2019-11-05 09:51:48 -05:00
Drew Bailey f60e44afc7
Adds nomad monitor command
Adds nomad monitor command. Like consul monitor, this command allows you
to stream logs from a nomad agent in real time with a a specified log
level

add endpoint tests

Upgrade go-hclog to latest version

The current version of go-hclog pads log prefixes to equal lengths
so info becomes [INFO ] and debug becomes [DEBUG]. This breaks
hashicorp/logutils/level.go Check function. Upgrading to the latest
version removes this padding and fixes log filtering that uses logutils
Check
2019-11-05 09:51:47 -05:00
Drew Bailey b386119d15
Add Agent Monitor to receive streaming logs
Queries /v1/agent/monitor and receives streaming logs from client
2019-11-05 09:51:47 -05:00
Drew Bailey b0184e2032
Adds AgentMonitor Endpoint
AgentMonitor is an endpoint to stream logs for a given agent. It allows
callers to pass in a supplied log level, which may be different than the
agents config allowing for temporary debugging with lower log levels.

Pass in logWriter when setting up Agent
2019-11-05 09:51:46 -05:00
Drew Bailey 3a11f1f23a
Merge pull request #6609 from hashicorp/b-alloc-status-consistency
Prevent nomad alloc status output inconsistency
2019-11-04 10:12:04 -05:00
Drew Bailey a7adc54235
Prevent nomad alloc status output inconsistency
Prevent random map ordering and sort alphabetically

better variable name
2019-11-01 14:01:32 -04:00
Michael Schurter 9fed8d1bed client: fix panic from 0.8 -> 0.10 upgrade
makeAllocTaskServices did not do a nil check on AllocatedResources
which causes a panic when upgrading directly from 0.8 to 0.10. While
skipping 0.9 is not supported we intend to fix serious crashers caused
by such upgrades to prevent cluster outages.

I did a quick audit of the client package and everywhere else that
accesses AllocatedResources appears to be properly guarded by a nil
check.
2019-11-01 07:47:03 -07:00
Mahmood Ali 3f6e50617a
Merge pull request #6047 from hashicorp/b-ignore-server-if-disabled
Only warn against BootstrapExpect set in CLI flag
2019-10-29 10:55:44 -04:00
Lang Martin aa77ea4032
quota: parse network stanza in quotas (#6511) 2019-10-24 10:41:54 -04:00
Michael Schurter 39437a5c5b
Merge branch 'master' into release-0100 2019-10-22 08:17:57 -07:00
Nomad Release bot 3e6c9dd40e Generate files for 0.10.0 release 2019-10-22 12:34:56 +00:00
Seth Hoenig 8b03477f46
Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags
connect: enable setting tags on consul connect sidecar service in job…
2019-10-17 15:14:09 -05:00
Seth Hoenig 039fbd3f3b connect: enable setting tags on consul connect sidecar service in jobspec (#6415) 2019-10-17 19:25:20 +00:00
Mahmood Ali 61e66cb077
Merge pull request #6427 from hashicorp/b-fs-endpoint-errors
agent: report fs log errors as http errors
2019-10-15 20:12:59 -04:00
Mahmood Ali 88f8127820 tests: avoid using unnecessary pipe 2019-10-15 17:22:03 -04:00
Mahmood Ali e6d5635e1a
Merge pull request #6425 from hashicorp/f-cli-show-full-ids
cli: show full id for single node or alloc status
2019-10-15 10:54:25 -04:00
Danielle fee482ae6c
Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation
volumes: Add support for mount propagation
2019-10-14 14:29:40 +02:00
Danielle Lancashire 4fbcc668d0
volumes: Add support for mount propagation
This commit introduces support for configuring mount propagation when
mounting volumes with the `volume_mount` stanza on Linux targets.

Similar to Kubernetes, we expose 3 options for configuring mount
propagation:

- private, which is equivalent to `rprivate` on Linux, which does not allow the
           container to see any new nested mounts after the chroot was created.

- host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts
                that have been created _outside of the container_ to be visible
                inside the container after the chroot is created.

- bidirectional, which is equivalent to `rshared` on Linux, which allows both
                 the container to see new mounts created on the host, but
                 importantly _allows the container to create mounts that are
                 visible in other containers an don the host_

private and host-to-task are safe, but bidirectional mounts can be
dangerous, as if the code inside a container creates a mount, and does
not clean it up before tearing down the container, it can cause bad
things to happen inside the kernel.

To add a layer of safety here, we require that the user has ReadWrite
permissions on the volume before allowing bidirectional mounts, as a
defense in depth / validation case, although creating mounts should also require
a priviliged execution environment inside the container.
2019-10-14 14:09:58 +02:00
Danielle 2640155ae5
Merge pull request #6429 from hashicorp/f-log-to-file
Add support for logging to a file
2019-10-11 13:35:39 +02:00
Nomad Release bot 3007f1662e Generate files for 0.10.0-rc1 release 2019-10-10 19:08:23 +00:00
Danielle Lancashire 5cedf6d024
logging: Correctly track number of written bytes
Currently this assumes that a short write will never happen. While these
are improbable in a case where rotation being off a few bytes would
matter, this now correctly tracks the number of written bytes.
2019-10-10 14:02:14 +02:00
Danielle Lancashire b67215d4f8
logging: Sort files when pruning old logs
Currently this logging implementation is dependent on the order of files
as returned by filepath.Glob, which although internal methods are
documented to be lexographical, does not publicly document this. Here we
defensively resort.
2019-10-10 13:51:16 +02:00
Mahmood Ali 4b2ba62e35 acl: check ACL against object namespace
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to.  The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.

Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.

Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace.  Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release.  As such, we maintain current behavior that doesn't require
passing the proper namespace in request.  A future major release may
start enforcing checking declared namespace.
2019-10-08 12:59:22 -04:00
Mahmood Ali 3c0d8c7611
Merge pull request #6441 from hashicorp/b-agent-token
Redact replication tokens in /agent/self
2019-10-08 12:55:45 -04:00
Danielle Lancashire 9eaac48f25
agent: Refactor log setup to support log-to-file 2019-10-07 14:42:32 +02:00
Danielle Lancashire 442f4888b3
agent: Introduce File Logger
This commit introduces a rotating file logger for Nomad Agent Logs. The
logger implementation itself is a lift and shift from Consul, with tests
updated to fit with the Nomad pattern of using require, and not having a
testutil for creating tempdirs cleanly.
2019-10-07 14:37:31 +02:00
Danielle Lancashire d3614ea0a8
config: Add required configuration for logging to a file 2019-10-07 14:16:59 +02:00
Mahmood Ali d09355efe4 cli: show full id for single node or alloc status
Show full ID on individual alloc or node status views.  Shortening
the ID isn't very helpful in these cases, and makes looking up the full
id slightly more complicated when user needs to interact with API.

List views are unmodified and show short id unless `-vebose` flag is passed.

Before
```
$ nomad node status -self | head -n2
ID            = 21fc51f9
Name          = mars-2.local

$ nomad alloc status 15ae54cd | head -n3
ID                  = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3
Eval ID             = a6b15f86
Name                = example.cache[0]
```

After:
```
$ nomad node status -self | head -n2
ID            = 21fc51f9-fd39-0fa0-fb41-f34c7aa36101
Name          = mars-2.local

$ nomad alloc status 15ae54cd | head -n3
ID                  = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3
Eval ID             = a6b15f86-ca8e-e536-b544-4bfb43137ff3
Name                = example.cache[0]
```
2019-10-04 16:36:18 -04:00
Mahmood Ali 317e0f9e44 agent: report fs log errors as http errors
This fixes two bugs:

First, FS Logs API endpoint only propagated error back to user if it was
encoded with code, which isn't common.  Other errors get suppressed and
callers get an empty response with 200 error code.  Now, these endpoints
return  a 500 status code along with the error message.

Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:21 GMT
< Content-Length: 0
<
* Connection #0 to host 127.0.0.1 left intact
```

After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:48:12 GMT
< Content-Length: 60
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
alloc lookup failed: index error: UUID must be 36 characters
```

Second, we return 400 status code for request validation errors.

Before
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:47:29 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```

After
```
$ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0)
> GET /v1/client/fs/logs/qwerqwera HTTP/1.1
> Host: 127.0.0.1:4646
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Vary: Accept-Encoding
< Vary: Origin
< Date: Fri, 04 Oct 2019 19:49:18 GMT
< Content-Length: 22
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact
must provide task name
```
2019-10-04 16:33:58 -04:00
Lang Martin fb41dd86ba default raft protocol v2 2019-09-24 14:37:55 -04:00
Peter McAtominey de133d883f
command: add -tls-server-name flag 2019-09-24 09:20:41 -07:00
Tim Gross cd9c23617f
client/connect: ConsulProxy LocalServicePort/Address (#6358)
Without a `LocalServicePort`, Connect services will try to use the
mapped port even when delivering traffic locally. A user can override
this behavior by pinning the port value in the `service` stanza but
this prevents us from using the Consul service name to reach the
service.

This commits configures the Consul proxy with its `LocalServicePort`
and `LocalServiceAddress` fields.
2019-09-23 14:30:48 -04:00
Danielle Lancashire 39fe07f66b
api: Redact tokens in /agent/self 2019-09-23 19:07:27 +02:00
Danielle Lancashire 8b44369073
api: Redact ACL Replication Token
Currently when hitting the /v1/agent/self API with ACL Replication
enabled results in the token being returned in the API. This commit
redacts that information, as it should be treated as a shared secret.
2019-09-22 14:35:53 +02:00
Chris Baker 6f38cca15a
fixed incorrect CLI documentation in `job deployments`
listed `-all-allocs` instead of `-all`
2019-09-20 12:24:53 -05:00
Danielle Lancashire e81d113e3f
command: Improve metrics fail logging 2019-09-19 04:17:42 +02:00
Mahmood Ali b4a7585e5e
Merge pull request #6328 from hashicorp/b-gh-6269
cli: emit job version number proper
2019-09-17 19:06:44 -04:00
Tim Gross e3e30c15a9
remove resolved TODO from UpdateTTL docstring (#6336) 2019-09-16 16:26:06 -04:00
Mahmood Ali df8a168d06 cli: emit job version number proper
We must emit alloc job number rather than its the field address.
2019-09-13 19:04:32 -04:00
Danielle Lancashire 78b61de45f
config: Hoist volume.config.source into volume
Currently, using a Volume in a job uses the following configuration:

```
volume "alias-name" {
  type = "volume-type"
  read_only = true

  config {
    source = "host_volume_name"
  }
}
```

This commit migrates to the following:

```
volume "alias-name" {
  type = "volume-type"
  source = "host_volume_name"
  read_only = true
}
```

The original design was based due to being uncertain about the future of storage
plugins, and to allow maxium flexibility.

However, this causes a few issues, namely:
- We frequently need to parse this configuration during submission,
scheduling, and mounting
- It complicates the configuration from and end users perspective
- It complicates the ability to do validation

As we understand the problem space of CSI a little more, it has become
clear that we won't need the `source` to be in config, as it will be
used in the majority of cases:

- Host Volumes: Always need a source
- Preallocated CSI Volumes: Always needs a source from a volume or claim name
- Dynamic Persistent CSI Volumes*: Always needs a source to attach the volumes
                                   to for managing upgrades and to avoid dangling.
- Dynamic Ephemeral CSI Volumes*: Less thought out, but `source` will probably point
                                  to the plugin name, and a `config` block will
                                  allow you to pass meta to the plugin. Or will
                                  point to a pre-configured ephemeral config.
*If implemented

The new design simplifies this by merging the source into the volume
stanza to solve the above issues with usability, performance, and error
handling.
2019-09-13 04:37:59 +02:00
Mahmood Ali 877260afd8 fix 'nomad namespace apply' help
Named arguments need to preceed positional arguments.
2019-09-09 10:04:41 -07:00
Nomad Release bot dc7d728a82 Generate files for 0.10.0-beta1 release 2019-09-06 18:47:09 +00:00
Michael Schurter 31eb8375e5
Merge pull request #6282 from hashicorp/f-connect-dev-path
connect: check if consul is on PATH
2019-09-05 12:25:23 -07:00
Michael Schurter 457684e34e connect: check if consul is on PATH
Only in -dev-connect mode for now since its valid to install Consul
after Nomad has started in production.
2019-09-05 12:05:42 -07:00
Jasmine Dahilig e1c73cdab5
add validation for job_gc_interval (#6277) 2019-09-05 11:20:46 -07:00
Mahmood Ali 6d73ca0cfb
Merge pull request #6250 from hashicorp/f-raft-protocol-v3
Update default raft protocol to version 3
2019-09-04 09:34:41 -04:00
Tim Gross 0f29dcc935
support script checks for task group services (#6197)
In Nomad prior to Consul Connect, all Consul checks work the same
except for Script checks. Because the Task being checked is running in
its own container namespaces, the check is executed by Nomad in the
Task's context. If the Script check passes, Nomad uses the TTL check
feature of Consul to update the check status. This means in order to
run a Script check, we need to know what Task to execute it in.

To support Consul Connect, we need Group Services, and these need to
be registered in Consul along with their checks. We could push the
Service down into the Task, but this doesn't work if someone wants to
associate a service with a task's ports, but do script checks in
another task in the allocation.

Because Nomad is handling the Script check and not Consul anyways,
this moves the script check handling into the task runner so that the
task runner can own the script check's configuration and
lifecycle. This will allow us to pass the group service check
configuration down into a task without associating the service itself
with the task.

When tasks are checked for script checks, we walk back through their
task group to see if there are script checks associated with the
task. If so, we'll spin off script check tasklets for them. The
group-level service and any restart behaviors it needs are entirely
encapsulated within the group service hook.
2019-09-03 15:09:04 -04:00
Buck Doyle 21ec6a237c Merge branch 'master' into f-policy-json
# Conflicts:
#	CHANGELOG.md
2019-09-03 09:56:25 -05:00
Jasmine Dahilig 4edebe389a
add default update stanza and max_parallel=0 disables deployments (#6191) 2019-09-02 10:30:09 -07:00
Evan Ercolano fcf66918d0 Remove unused canary param from MakeTaskServiceID 2019-08-31 16:53:23 -04:00
Michael Schurter 4bd53deba9
Merge pull request #6236 from hashicorp/b-ignore-connect-services
consul: ignore connect services when syncing
2019-08-30 13:11:09 -07:00
Michael Schurter 67b7bc1e90 consul: ignore connect services when syncing
Consul registers Connect services automatically, however Nomad thinks it
owns them due to the _nomad prefix. Since the services are managed by
Consul, Nomad needs to explicitly ignore them or otherwies they will be
removed.
2019-08-30 11:53:41 -07:00
Tim Gross b79021adfd cli: split -dev and -dev-connect flags 2019-08-30 09:33:30 -04:00
Buck Doyle ab96785fc9 Change test to use valid HCL for rules 2019-08-29 16:09:02 -05:00
Mahmood Ali 6eabf53b91 Default raft protocol to version 3 2019-08-28 15:56:59 -04:00
Nick Ethier 9e96971a75
cli: display group ports and address in alloc status command output (#6189)
* cli: display group ports and address in alloc status command output

* add assertions for port.To = -1 case and convert assertions to testify
2019-08-27 23:59:36 -04:00
Jasmine Dahilig ffceab0879
remove network stanza from job init --short example jobspec (#6179) 2019-08-27 07:36:32 -07:00
Tim Gross 11030f7aa0 init: add generated assets into bindata 2019-08-26 14:24:15 -04:00
Tim Gross 4d4461d1f5 agent: -dev=connect mode bind to 0.0.0.0
The dev mode flag for connect was binding to the default interface's
IP, but this makes for a bad user experience for the CLI which will
default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work
without further configuration by the user.
2019-08-23 13:51:16 -04:00
Jerome Gravel-Niquet cbdc1978bf Consul service meta (#6193)
* adds meta object to service in job spec, sends it to consul

* adds tests for service meta

* fix tests

* adds docs

* better hashing for service meta, use helper for copying meta when registering service

* tried to be DRY, but looks like it would be more work to use the
helper function
2019-08-23 12:49:02 -04:00
Michael Schurter 95b8048553
Merge pull request #6121 from hashicorp/f-connect-bootstrap
connect: task hook for bootstrapping envoy sidecar
2019-08-22 10:58:31 -07:00
Michael Schurter 59e0b67c7f connect: task hook for bootstrapping envoy sidecar
Fixes #6041

Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
2019-08-22 08:15:32 -07:00
Danielle Lancashire 2e5f28029f
remove hidden field from host volumes
We're not shipping support for "hidden" volumes in 0.10 any more, I'll
convert this to an issue+mini RFC for future enhancement.
2019-08-22 08:48:05 +02:00
Danielle c280e97619
Merge pull request #6184 from hashicorp/dani/fix-api
api: Fix definition of HostVolumeInfo
2019-08-22 00:13:28 +02:00
Danielle Lancashire 112b986736
api: Fix definition of HostVolumeInfo 2019-08-21 22:34:41 +02:00
Danielle Lancashire 9df7e0eb72
clientconfig: Fix parsing multiple host volumes 2019-08-21 22:19:58 +02:00
Michael Schurter 050cc32fde
Merge pull request #6157 from hashicorp/f-connect-register
Register connect enabled group services with Consul
2019-08-20 14:45:38 -07:00
Michael Schurter b008fd1724 connect: register group services with Consul
Fixes #6042

Add new task group service hook for registering group services like
Connect-enabled services.

Does not yet support checks.
2019-08-20 12:25:10 -07:00
Tim Gross c404491f1f test: require root for linux devmode test 2019-08-20 13:31:49 -04:00
Tim Gross a0e923f46c add optional task field to group service checks 2019-08-20 09:35:31 -04:00
Nick Ethier 24f5a4c276
sidecar_task override in connect admission controller (#6140)
* structs: use seperate SidecarTask struct for sidecar_task stanza and add merge

* nomad: merge SidecarTask into proxy task during connect Mutate hook
2019-08-20 01:22:46 -04:00
Tim Gross 2ab004d971 command: add `-connect` flag to job init
Adds an example job for Consul Connect integration as well as an
annotated example job.
2019-08-19 14:43:04 -04:00
Tim Gross 2a592a2e0c
agent: add optional param to -dev flag for connect (#6126)
Consul Connect must route traffic between network namespaces through a
public interface (i.e. not localhost). In order to support testing in
dev mode, users needed to manually set the interface which doesn't
make for a smooth experience.

This commit adds a facility for adding optional parameters to the
`nomad agent -dev` flag and uses it to add a `-dev=connect` flag that
binds to a public interface on the host.
2019-08-14 15:29:37 -04:00
Tim Gross 13376cff9c move `nomad init` outputs to go-bindata assets 2019-08-14 14:10:23 -04:00
Preetha 8c6312d973
Merge pull request #6097 from hashicorp/f-kind-validate
Add validation for kind field if it is a consul connect proxy
2019-08-13 11:05:30 -05:00
Preetha Appan 72e45dd01e
More code review feedback 2019-08-12 17:41:40 -05:00
Tim Gross 03433f35d4 client/template: configuration for function blacklist and sandboxing
When rendering a task template, the `plugin` function is no longer
permitted by default and will raise an error. An operator can opt-in
to permitting this function with the new `template.function_blacklist`
field in the client configuration.

When rendering a task template, path parameters for the `file`
function will be treated as relative to the task directory by
default. Relative paths or symlinks that point outside the task
directory will raise an error. An operator can opt-out of this
protection with the new `template.disable_file_sandbox` field in the
client configuration.
2019-08-12 16:34:48 -04:00
Preetha Appan 35506c516d
Improve validation logic and add table driven tests 2019-08-12 14:39:50 -05:00
Danielle Lancashire 7208a7ab88
command: Cleanup node-status 2019-08-12 15:39:09 +02:00
Danielle Lancashire 333fdd723b
cli: Display host volume info in nomad node status 2019-08-12 15:39:09 +02:00
Danielle Lancashire 861caa9564
HostVolumeConfig: Source -> Path 2019-08-12 15:39:08 +02:00
Danielle Lancashire e132a30899
structs: Unify Volume and VolumeRequest 2019-08-12 15:39:08 +02:00
Danielle Lancashire 01f3fe13fb
api: Allow submission of jobs with volumes 2019-08-12 15:39:08 +02:00
Danielle Lancashire 063e4240c1
client: Add parsing and registration of HostVolume configuration 2019-08-12 15:39:08 +02:00
Nick Ethier 1871c1edbc
Add sidecar_task stanza parsing (#6104)
* jobspec: breakup parse.go into smaller files

* add sidecar_task parsing to jobspec and api

* jobspec: combine service parsing logic for task and group service stanzas

* api: use slice of ConsulUpstream values instead of pointers
2019-08-09 15:18:53 -04:00
Preetha 1d543290af
Merge pull request #6090 from hashicorp/f-task-kind
Add field "kind" to task for use in connect tasks
2019-08-08 14:40:12 -05:00
Nick Ethier 7806f4c597
Revert "client: add autofetch for CNI plugins"
This reverts commit 0bd157cc3b04fb090dd0d54affcae71496102ce8.
2019-08-08 15:10:19 -04:00
Preetha Appan a393ea79e8
Add field "kind" to task for use in connect tasks 2019-08-07 18:43:36 -05:00
Jasmine Dahilig 8d980edd2e
add create and modify timestamps to evaluations (#5881) 2019-08-07 09:50:35 -07:00
Michael Schurter d2862b33e6
Merge pull request #6045 from hashicorp/f-connect-groupservice
consul: add Connect structs
2019-08-06 15:43:38 -07:00
Michael Schurter 17fd82d6ad consul: add Connect structs
Refactor all Consul structs into {api,structs}/services.go because
api/tasks.go didn't make sense anymore and structs/structs.go is
gigantic.
2019-08-06 08:15:07 -07:00
Jasmine Dahilig ac488bc9dc
job region defaults to client node region if 'global' or none provided (#6064) 2019-08-05 14:28:02 -07:00
Tim Gross 443ce3a831
api: add follow param to file stream endpoint (#6049)
The `/v1/client/fs/stream endpoint` supports tailing a file by writing
chunks out as they come in. But not all browsers support streams
(ex IE11) so we need to be able to tail a file without streaming.

The fs stream and logs endpoint use the same implementation for
filesystem streaming under the hood, but the fs stream always passes
the `follow` parameter set to true. This adds the same toggle to the
fs stream endpoint that we have for logs. It defaults to true for
backwards compatibility.
2019-08-01 08:32:43 -04:00
Mahmood Ali 31ad8161ab Only warn against BootstrapExpect set in CLI flag
If server.enabled is false, we ought to ignore all other values in
the server stanza.

However, I opted to preserve current error when `--bootstrap-expect` is
passed to the CLI when server is not enabled, to maintain current
behavior.
2019-07-31 03:19:15 -05:00
Nick Ethier 7de0bec8ab
client/cni: updated comments and simplified logic to auto download plugins 2019-07-31 01:04:10 -04:00
Nick Ethier b16640c50d
Apply suggestions from code review
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-07-31 01:04:10 -04:00
Nick Ethier af6b191963
client: add autofetch for CNI plugins 2019-07-31 01:04:09 -04:00
Nick Ethier ef83f0831b
ar: plumb client config for networking into the network hook 2019-07-31 01:04:06 -04:00
Michael Schurter fb487358fb
connect: add group.service stanza support 2019-07-31 01:04:05 -04:00
Nick Ethier 6537279686
agent: simplify if block 2019-07-31 01:03:17 -04:00
Nick Ethier 8650429e38
Add network stanza to group
Adds a network stanza and additional options to the task group level
in prep for allowing shared networking between tasks of an alloc.
2019-07-31 01:03:12 -04:00
Michael Schurter d31488e262
Merge pull request #5978 from pete-woods/configurable-job-gc-interval
command/agent: allow the job GC interval to be configured
2019-07-30 15:54:29 -07:00
Nomad Release bot e39fb11531 Generate files for 0.9.4 release 2019-07-30 19:05:18 +00:00
Pete Woods b47c5ca467
Allow the job GC interval to be configured from default of 5 minutes 2019-07-26 10:11:25 +01:00
Danielle 45f3f928f5
Merge pull request #5996 from hashicorp/f-reload-log-level
Support for hot reloading log levels
2019-07-24 13:54:04 +02:00
Danielle Lancashire 0422f1b0c2
Support for hot reloading log levels 2019-07-24 13:37:08 +02:00
Nomad Release bot 04187c8b86 Generate files for 0.9.4-rc1 release 2019-07-22 21:42:36 +00:00
Danielle Lancashire d454dab39b
chore: Format hcl configurations 2019-07-20 16:55:07 +02:00
Michael Schurter db4de5fae9
Merge pull request #5975 from hashicorp/b-check-watcher-deadlock
consul: fix deadlock in check-based restarts
2019-07-18 13:13:40 -07:00
Michael Schurter 6d095b3b36 consul: add test for check watcher deadlock 2019-07-18 08:24:09 -07:00
Michael Schurter 826d2503e6
Update command/agent/consul/check_watcher.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-07-18 07:08:27 -07:00
Michael Schurter 5407584bc3 consul: fix deadlock in check-based restarts
Fixes #5395
Alternative to #5957

Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.

Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!

I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
2019-07-17 15:22:21 -07:00
Chris Baker 8a75afcb39
Merge pull request #5870 from hashicorp/b-nmd-1529-alloc-stop-missing-header
api: return X-Nomad-Index header on allocation stop
2019-07-17 13:25:17 -04:00
Mahmood Ali 5d09b04f69
Merge pull request #5837 from hashicorp/b-consul-restore-sync-2
Avoid de-registering slowly restored services
2019-07-17 12:02:24 +08:00
Mahmood Ali ec7e258d71 address review feedback 2019-07-17 10:43:13 +07:00
Eli Shvartsman 692fd19884 take NodeID from url in api for node eligibility 2019-07-15 18:34:53 +03:00
Preetha 5b83cd4ce0
Merge pull request #5894 from hashicorp/f-remove-deprecated-code
Remove deprecated code
2019-07-02 09:29:24 -05:00
Preetha Appan aa2b4b4e00
Undo removal of node drain compat changes
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Preetha Appan 3345ce3ba4
Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
Preetha Appan f6fc5d40d1
one more drain test 2019-06-26 17:33:51 -05:00
Preetha Appan 67bf66efc6
remove now unneeded test 2019-06-26 16:59:23 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Chris Baker 3429cf39ed api: return X-Nomad-Index header on allocation stop 2019-06-21 16:20:06 +00:00
Chris Baker 59fac48d92 alloc lifecycle: 404 when attempting to stop non-existent allocation 2019-06-20 21:27:22 +00:00
Mahmood Ali b209584dce
Merge pull request #5726 from hashicorp/b-plugins-via-init
Use init() to handle plugin invocation
2019-06-18 21:09:03 -04:00
Mahmood Ali e07413c420 Avoid de-registering slowly restored services
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop.  If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.

This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.

We attempt to mitigate this by introducing an initialization probation
period.  During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone.  This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
2019-06-14 11:15:21 -04:00
Mahmood Ali 962921f86c Use init to handle plugin invocation
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework.  Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.

This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin.  This is frequently overlooked, resulting in
puzzlement.

The more significant issue with `executor` in particular is in relation
to external driver:

* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.

* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).

This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2].  `init()`
will be invoked and handled properly in tests and external drivers.

For external drivers, this change will cause external drivers to launch
the executor that's compiled against.

There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers.  Must avoid having
command execution rely on any other init in the package.  I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.

[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
2019-06-13 16:48:01 -04:00
Jasmine Dahilig ed9740db10
Merge pull request #5664 from hashicorp/f-http-hcl-region
backfill region from hcl for jobUpdate and jobPlan
2019-06-13 12:25:01 -07:00
Jasmine Dahilig 51e141be7a backfill region from job hcl in jobUpdate and jobPlan endpoints
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
2019-06-13 08:03:16 -07:00
Danielle b7fc81031b
Merge pull request #5829 from hashicorp/dani/b-5819
consul: Include port-label in service registration
2019-06-13 16:20:45 +02:00
Danielle Lancashire 8112177503
consul: Include port-label in service registration
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.

We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.

This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:

```hcl
service {
  name = "redis-cache"
  tags = ["global", "cache"]
  port = "db"
  check {
    name     = "alive"
    type     = "tcp"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "foo"]
  port = "foo"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "bar"]
  port = "bar"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}
```

in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
2019-06-13 15:24:54 +02:00
Nick Ethier 1b7fa4fe29
Optional Consul service tags for nomad server and agent services (#5706)
Optional Consul service tags for nomad server and agent services
2019-06-13 09:00:35 -04:00
Preetha 8a98817fe4
Merge pull request #5820 from hashicorp/r-assorted-changes-20190612_1
Assorted minor changes
2019-06-12 10:33:16 -05:00
Danielle Lancashire ae8bb7365a
alloc-lifecycle: Fix restart with empty body
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.

This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.

This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
2019-06-12 15:35:00 +02:00
Mahmood Ali b00d1f1e10 tests: parsing dir should be equivalent to parsing individual files 2019-06-12 08:19:09 -04:00
Mahmood Ali 3d8f2622e9 tests: avoid manipulating package variables 2019-06-12 08:16:32 -04:00
Lang Martin 3837c9b021 command add comments re: defaults to LoadConfig 2019-06-11 22:35:43 -04:00
Lang Martin 02aae678be config_parse_test update comment for accuracy 2019-06-11 22:30:20 -04:00
Lang Martin 7aa95ebd6f config_parse get rid of ParseConfigDefault 2019-06-11 22:00:23 -04:00
Lang Martin 9b0411af6a Revert "config explicitly merge defaults once when using a config directory"
This reverts commit 006a9a1d454739eee21b7d8abb8b7aef1353b648.
2019-06-11 22:00:23 -04:00
Lang Martin 1e2f87a11e agent/testdata add a configuration directory for testing 2019-06-11 16:34:04 -04:00
Lang Martin fe8a4781d8 config merge maintains *HCL string fields used for duration conversion 2019-06-11 16:34:04 -04:00
Lang Martin 3bd153690b config_parse_test, handle defaults 2019-06-11 16:34:04 -04:00
Lang Martin c97dd512f4 config explicitly merge defaults once when using a config directory 2019-06-11 15:42:27 -04:00
Lang Martin ad56434472 config_parse split out defaults from ParseConfig 2019-06-11 15:42:27 -04:00
Lang Martin 28cf8eddfe config parse_test check for string coercion in client.meta 2019-06-10 13:12:38 -04:00
Michael Schurter 073893f529 nomad: disable service+batch preemption by default
Enterprise only.

Disable preemption for service and batch jobs by default.

Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
2019-06-04 15:54:50 -07:00
Mahmood Ali a9f81f2daa client config flag to disable remote exec
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.

I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
2019-06-03 15:31:39 -04:00
Nomad Release bot 6d6bc59732 Generate files for 0.9.2-rc1 release 2019-05-22 19:29:30 +00:00
Lang Martin 16cd0beb9b api use job.update as the default for taskgroup.update 2019-05-22 12:34:57 -04:00
Lang Martin b5fd735960 add update AutoPromote bool 2019-05-22 12:32:08 -04:00
Mahmood Ali f5a4fcac3f Restore tty start before emitting errors
Otherwise, the error message appears indented unexpectedly.
2019-05-17 11:58:31 -04:00
Mahmood Ali 1293a8511c
Fix typos and comments
Co-Authored-By: Michael Schurter <michael.schurter@gmail.com>
2019-05-16 17:06:03 -04:00
Mahmood Ali 689453bd3a Implement escaping chrarcter for alloc exec 2019-05-16 16:22:52 -04:00
Preetha 2dcd4291f8
Merge pull request #5702 from hashicorp/f-filter-by-create-index
Filter deployments by create index
2019-05-15 21:50:41 -05:00
Preetha Appan 2c5c16111e
Add -all to help text and flags 2019-05-15 21:16:57 -05:00
Mahmood Ali bfd229918a fix typo 2019-05-15 13:01:05 -04:00
Mahmood Ali c057c6dc44
Merge pull request #5633 from hashicorp/f-nomad-exec-parts-02-cli
nomad exec part 2: CLI
2019-05-15 12:50:42 -04:00
Mahmood Ali 778c7a1982 Handle Terminal Output state in Windows 2019-05-15 10:37:37 -04:00
Mahmood Ali 1104827671 Add clarifying comments for negating `-i` or `-t` 2019-05-15 10:35:12 -04:00
Preetha Appan 4f9c8ea068
Fix one more test set up 2019-05-14 16:13:41 -05:00
Nick Ethier ade97bc91f
fixup #5172 and rebase against master 2019-05-14 14:37:34 -04:00
Nick Ethier cab6a95668
Merge branch 'master' into pr/5172
* master: (912 commits)
  Update redirects.txt
  Added redirect for Spark guide link
  client: log when server list changes
  docs: mention regression in task config validation
  fix update to changelog
  update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
  typo: "atleast" -> "at least"
  implement nomad exec for rkt
  docs: fixed typo
  use pty/tty terminology similar to github.com/kr/pty
  vendor github.com/kr/pty
  drivers: implement streaming exec for executor based drivers
  executors: implement streaming exec
  executor: scaffolding for executor grpc handling
  client: expose allocated memory per task
  client improve a comment in updateNetworks
  stalebot: Add 'thinking' as an exempt label (#5684)
  Added Sparrow link
  update links to use new canonical location
  Add redirects for restructing done in GH-5667
  ...
2019-05-14 14:10:33 -04:00
Preetha Appan 4d3f74e161
Fix test setup to have correct jobcreateindex for deployments 2019-05-13 18:53:47 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Mahmood Ali 2ddc39973d
Merge pull request #5668 from hashicorp/flaky-test-20190430
fix flaky test by allowing for call invocation overhead
2019-05-13 12:33:44 -04:00
Mahmood Ali dd8762e348 typo: "atleast" -> "at least" 2019-05-13 10:01:19 -04:00
Mahmood Ali 513303347c add CLI commands for nomad exec 2019-05-12 22:04:50 -04:00
Mahmood Ali 919827f2df
Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali 66982a1660 agent: add websocket handler for nomad exec
This adds a websocket endpoint for handling `nomad exec`.

The endpoint is a websocket interface, as we require a bi-directional
streaming (to handle both input and output), which is not very appropriate for
plain HTTP 1.0. Using websocket makes implementing the web ui a bit simpler. I
considered using golang http hijack capability to treat http request as a plain
connection, but the web interface would be too complicated potentially.

Furthermore, the API endpoint operates against the raw core nomad exec streaming
datastructures, defined in protobuf, with json serializer.  Our APIs use json
interfaces in general, and protobuf generates json friendly golang structs.
Reusing the structs here simplify interface and reduce conversion overhead.
2019-05-09 16:49:08 -04:00
Danielle 4a22fa0ee2
Merge pull request #5536 from hashicorp/dani/consul
Consul Catalog Integration Fixes
2019-05-09 13:22:54 +02:00
Danielle Lancashire 0da2924b2a consul: Document example check id 2019-05-09 13:22:22 +02:00
Mahmood Ali d405fcb093 fix flaky test by allowing for call invocation overhead 2019-05-08 18:04:37 -04:00
Preetha 1538913a2a
Merge pull request #5628 from hashicorp/f-preemption-config
Add config to disable preemption for batch/service jobs
2019-05-06 15:40:35 -05:00
Lang Martin 9f3f11df97
Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl
config parse direct hcl
2019-05-06 12:05:19 -04:00
Preetha Appan ad3c263d3f
Rename to match system scheduler config.
Also added docs
2019-05-03 14:06:12 -05:00
Danielle Lancashire d824e00d1a consul: Do not deregister external checks
This commit causes sync to skip deregistering checks that are not
managed by nomad, such as service maintenance mode checks.  This is
handled in the same way as service registrations - by doing a Nomad
specific prefix match.
2019-05-02 16:54:18 +02:00
Danielle Lancashire 0b8e85118e consul: Use a stable identifier for services
The current implementation of Service Registration uses a hash of the
nomad-internal state of a service to register it with Consul, this means that
any update to the service invalidates this name and we then deregister, and
recreate the service in Consul.

While this behaviour slightly simplifies reasoning about service registration,
this becomes problematic when we add consul health checks to a service. When
the service is re-registered, so are the checks, which default to failing for
at least one check period.

This commit migrates us to using a stable identifier based on the
allocation, task, and service identifiers, and uses the difference
between the remote and local state to decide when to push updates.

It uses the existing hashing mechanic to decide when UpdateTask should
regenerate service registrations for providing to Sync, but this should
be removable as part of a future refactor.

It additionally introduces the _nomad-check- prefix for check
definitions, to allow for future allowing of consul features like
maintenance mode.
2019-05-02 16:54:18 +02:00
Chris Baker a40477a7b8
test case for 5540 (#5590)
* client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy

* updated armon/go-metrics to address race condition in DisplayMetrics
2019-04-30 10:31:35 -04:00
Lang Martin 2e643d26a2 config_parse leave the *HCL strings in place after converting times 2019-04-30 10:30:53 -04:00
Lang Martin 3ba6095fe3 config_parse_test additional config confirmation w/ sample json 2019-04-30 10:30:53 -04:00
Lang Martin fe9b31dcf9 config comment for future changes 2019-04-30 10:30:53 -04:00
Lang Martin 598112a1cc tag HCL bookkeeping keys with json:"-" to keep them out of the api 2019-04-30 10:29:14 -04:00
Lang Martin 43407cffe3 config_parse_test remove redundant parse direct test 2019-04-30 10:29:14 -04:00
Lang Martin b8e9c35cd0 config_parse remove unused multi-stage parsing via mapstructure 2019-04-30 10:29:14 -04:00
Lang Martin 1f86770456 config_parse_test test direct hcl parsing 2019-04-30 10:29:14 -04:00
Lang Martin 5ebae65d1a agent/config, config/* mapstructure tags -> hcl tags 2019-04-30 10:29:14 -04:00
Lang Martin 92fd988c9f config_parse add new ParseConfigFileDirectHCL
- parse by using hcl.Decode directly
- handle time.Duration strings in a second pass
- report unexpected keys in a third pass
2019-04-30 10:29:14 -04:00
Preetha Appan 6615d5c868
Add config to disable preemption for batch/service jobs 2019-04-29 18:48:07 -05:00
Danielle Lancashire a8880f9643 alloc_signal: Add autcompletion and cmd tests 2019-04-26 12:47:53 +02:00
Danielle Lancashire 3409e0be89 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Mahmood Ali 60ee243149 fix crash when executor parent nomad process dies
Fixes https://github.com/hashicorp/nomad/issues/5593

Executor seems to die unexpectedly after nomad agent dies or is
restarted.  The crash seems to occur at the first log message after
the nomad agent dies.

To ease debugging we forward executor log messages to executor.log as
well as to Stderr.  `go-plugin` sets up plugins with Stderr pointing to
a pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal.  SIGPIPE
results into executor process dying.

I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]

This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.

[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-hclog/int.go#L320-L323
2019-04-23 09:52:46 -04:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Michael Schurter 373748a327
Merge pull request #5486 from hashicorp/b-validate-migrate
api: fix migrate stanza initialization
2019-04-15 09:44:59 -07:00
Chris Baker 3b9237de4a gofmt/goimport and test formatting 2019-04-12 20:55:55 +00:00
Chris Baker eca8a3d537 changes to appease gofmt 2019-04-12 19:12:42 +00:00
Chris Baker b52d1c9274 cli: add support for periodic force evaluation
resolves #3251
2019-04-12 18:56:35 +00:00
Chris Baker 5a43f10aaf cli: add `acl token list` command, documentation
docs: fix some incorrect acl policy docs (typos, copy-paste errors)
2019-04-12 15:48:36 +00:00
Michael Schurter 5e8e59eefb api: fix migrate stanza initialization
Fixes Migrate to be initialized like RescheduleStrategy.

Fixes #5477
2019-04-11 15:29:19 -07:00
Danielle Lancashire e135876493 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Danielle 35f66d901f
Merge pull request #5516 from hashicorp/dani/f-verbose-status
Allow passing -verbose to meta status
2019-04-11 13:31:48 +02:00
Danielle Lancashire b4547a34b0 status: Allow passing -verbose to meta status
A common issue when using nomad is needing to add in the object verb to
a command to include the `-verbose` flag.

This commit allows users to pass `-verbose` via the `nomad status` alias by
adding a placeholder boolean in the metacommand which allows subcommands
to parse the flag.
2019-04-11 13:15:44 +02:00
Chris Baker ce0c330c7c
agent config: cleaner VAULT_ env lookup 2019-04-10 10:34:10 -05:00
Chris Baker a26d4fe1e5
docs: -vault-namespace, VAULT_NAMESPACE, and config
agent: added VAULT_NAMESPACE env-based configuration
2019-04-10 10:34:10 -05:00
Chris Baker d3041cdb17
wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation 2019-04-10 10:34:10 -05:00
Chris Baker 6a2454f56d
"job revert" command: alphabetized flags 2019-04-10 10:34:10 -05:00
Chris Baker 2f4d8d0a2f
cli: plumbed vault token from job revert command through API call 2019-04-10 10:34:10 -05:00
Arshneet Singh 2b50b5499d
Remove redundant assertion and replace regex matches with require 2019-04-10 10:34:10 -05:00
Arshneet Singh 1272fcb9e1
Don't display node name if output isn't verbose. Add tests. 2019-04-10 10:34:10 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Nomad Release bot e307734e4a Generate files for 0.9.0 release 2019-04-09 01:56:00 +00:00
Nomad Release bot 16b4336ccf Generate files for 0.9.0-rc2 release 2019-04-03 01:54:29 +00:00
Preetha Appan 71e6550f81
Address review comments 2019-03-29 08:57:49 -05:00
Preetha Appan e0566237a4
fix linting 2019-03-28 18:01:40 -05:00
Preetha Appan cc07256bb5
Fix json parsing bug with plugins that don't provide args
This fixes a bug with JSON agent configuration parsing where the AST
for the plugin stanza had unnecessary flattening originating from hcl parsing
library. The workaround fixes the AST by popping off the flattened element and wrapping
it in a list. The workaround comes from similar code in terraform.

There were no existing test cases for json parsing so I added a few.
2019-03-28 16:33:30 -05:00
Nomad Release bot 3ab3dd4105 Generate files for 0.9.0-rc1 release 2019-03-21 19:06:13 +00:00
Michael Schurter a2b2c29216 Fix version.go for 0.9.0-beta3 release 2019-02-26 10:11:30 -08:00
Michael Schurter d74755900e Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Preetha 911c93f7bd
Merge pull request #5350 from hashicorp/b-json-logging-meta
Support json logging for CLI output for agent
2019-02-22 13:40:56 -06:00
Preetha Appan 8f9ec85fe6
fix import order 2019-02-22 13:40:13 -06:00
Preetha Appan 3ab2e431b6
Move logger initialization to earlier step 2019-02-21 12:41:54 -06:00
Danielle Tomlinson 91c300c310 ui: Support colored output on Windows
This commit uses the go-colorable library to enable support for coloured
UI output on Windows. This acts as a compatibility layer that takes
standard unix-y terminal codes and translates them into the requisite
windows calls as required.
2019-02-20 14:01:35 +01:00
Preetha Appan 0149bbc608
cli Ui implementation that logs to a hclogger
This makes it so any messages output to the UI *after* the agent has started
will be logged in json format correctly
2019-02-19 17:53:14 -06:00
Preetha Appan 35f6db47b8
fix indentation 2019-02-14 12:49:26 -06:00
Preetha Appan f443c7d321
Fix whitespace 2019-02-14 12:49:26 -06:00
Chris Baker ab02f4588e
Update job_init.go
minor
2019-02-14 12:49:26 -06:00
Preetha Appan d405881e34
expand job init example with spread and affinity 2019-02-14 12:49:26 -06:00
Michael Schurter 3b84e08fa4
Merge pull request #5297 from hashicorp/b-docker-logging
Docker: Fix logging config parsing
2019-02-11 06:57:52 -08:00
Michael Schurter e3e1797850 consul: squelch noisy useless logs
Only log when syncing actually did something.
2019-02-04 11:07:57 -08:00
Iskander (Alex) Sharipov 7b1a4eaef9
nomad/command: fix strings.Contains args order
Swapped call args order to meet the expected behavior.

Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>
2019-02-02 09:43:24 +03:00
Michael Schurter cad3f1022a cli: do not duplicate reschedule headers per group
Fixes #5291
2019-02-01 09:28:36 -08:00
Alex Dadgar 84d0afccae Generate files for 0.9.0-beta2 2019-01-30 13:31:50 -08:00
Preetha Appan 8e621a167b
fix tests 2019-01-30 14:46:24 -06:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Nick Ethier c21ce7b523
add circbufwriter package 2019-01-28 11:35:21 -05:00
Nick Ethier 3ef163b03b
executor: prevent logger from blocking when stderr pipe is detached 2019-01-25 23:08:01 -05:00
Michael Schurter 13f061a83f
Merge pull request #5196 from hashicorp/f-plugin-utils
Make plugins/shared external and make pluginutls/
2019-01-23 06:59:32 -08:00
Michael Schurter 32daa7b47b goimports until make check is happy 2019-01-23 06:27:14 -08:00
Preetha Appan 5f1d467ed2
nil check node resources to prevent panic 2019-01-22 19:34:02 -06:00
Michael Schurter be0bab7c3f move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar cdcd3c929c loader and singleton 2019-01-22 15:11:57 -08:00
Alex Dadgar 6c2782f037 move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Mahmood Ali 05e32fb525
Merge pull request #5213 from hashicorp/b-api-separate
Slimmer /api package
2019-01-18 20:52:53 -05:00
Mahmood Ali 5df63fda7c
Merge pull request #5190 from hashicorp/f-memory-usage
Track Basic Memory Usage as reported by cgroups
2019-01-18 16:46:02 -05:00
Mahmood Ali 6bdb9864de api: remove MockJob from exported functions
`api.MockJob` is a test utility, that's only used by `command/agent`
package.  This moves it to the package and removes it from the public
API.
2019-01-18 14:51:31 -05:00
Michael Schurter 48afda786b
Merge pull request #5187 from hashicorp/test-consul
Port a bunch of pre-0.9 Consul tests to 0.9
2019-01-15 07:41:50 -08:00
Alex Dadgar 471fdb3ccf
Merge pull request #5173 from hashicorp/b-log-levels
Plugins use parent loggers
2019-01-14 16:14:30 -08:00
Mahmood Ali 9909d98bee Track Basic Memory Usage as reported by cgroups
Track current memory usage, `memory.usage_in_bytes`, in addition to
`memory.max_memory_usage_in_bytes` and friends.  This number is closer
what Docker reports.

Related to https://github.com/hashicorp/nomad/issues/5165 .
2019-01-14 18:47:52 -05:00
Michael Schurter fc1bb95ef8 Remove old comment; it's been fixed! 2019-01-14 09:56:53 -08:00
Preetha Appan 7bd1440710
REfactor statedb factory config to set it directly in client config 2019-01-12 10:38:20 -06:00
Preetha Appan f059ef8a47
Modified destroy failure handling to rely on allocrunner's destroy method
Added a unit test with custom statedb implementation that errors, to
use to verify destroy errors
2019-01-12 10:37:12 -06:00
Alex Dadgar 5621086f50 Enable json logs 2019-01-11 11:36:37 -08:00
Alex Dadgar 14ed757a56 Plugins use parent loggers
This PR fixes various instances of plugins being launched without using
the parent loggers. This meant that logs would not all go to the same
output, break formatting etc.
2019-01-11 11:36:37 -08:00
Preetha Appan b46728a88b
Make spread weight a pointer with default value if unset 2019-01-11 10:31:21 -06:00
Nick Wales 7a7b5da0df Adds optional Consul service tags to nomad server and agent services, gh#4297 2019-01-09 22:02:46 +00:00
Chris Baker e9db2ae822 Merge branch 'master' of github.com:hashicorp/nomad into f-1157-validate-node-meta-variables 2019-01-09 18:56:49 +00:00
Chris Baker d5b1a56f3b increased config validation coverage for dev mode 2019-01-09 18:56:40 +00:00
Michael Schurter ac169008f0
Merge pull request #5045 from hashicorp/b-drivermanager-tests-drain
drain: fix node drain monitoring
2019-01-09 10:23:28 -08:00
Mahmood Ali 90f3cea187
Merge pull request #5157 from hashicorp/r-drivers-no-cstructs
drivers: avoid referencing client/structs package
2019-01-09 13:06:46 -05:00
Mahmood Ali 03a9e812c8 cli: support hitting pre-0.9 nomad agents
node.NodeResources is nil when operating against pre-0.9.
2019-01-08 19:32:26 -05:00
Chris Baker d8a3a74c43 move `if dev` check into config validation, to support dev-mod
validation in the future
2019-01-08 22:21:48 +00:00
Michael Schurter 8a6b1acaa6 drain: fix node drain monitoring
The whole approach to monitoring drains has ordering issues and lacks
state to output useful error messages.

AFAICT to get the tests passing reliably I needed to change the behavior
of monitoring.

Parts of these tests are skipped in CI, and they should be rewritten as
e2e tests.
2019-01-08 09:35:16 -08:00
Chris Baker 220e9e838f refactored config validation into a new method, modified Meta.Client
tests appropriately
2019-01-08 15:07:36 +00:00
Mahmood Ali 916a40bb9e move cstructs.DeviceNetwork to drivers pkg 2019-01-08 09:11:47 -05:00
Chris Baker 91449d6809 Merge branch 'master' of github.com:hashicorp/nomad into f-1157-validate-node-meta-variables 2019-01-08 02:17:35 +00:00
Chris Baker bf00f93d87 moved interp key regex out to a helper function 2019-01-08 00:11:47 +00:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Chris Baker f99e18aaf4 gofmt to make check happy 2019-01-07 18:01:59 +00:00
Chris Baker a61afad5bb added validation on client metadata keys 2019-01-07 17:16:38 +00:00
Nick Ethier ab3c5c0a8b
fix test 2018-12-20 13:54:29 -05:00
Nick Ethier fad553ab6a
command: wait for drivers to be ready before test 2018-12-20 13:52:33 -05:00
Nick Ethier 5b9bba08c6
fix tests 2018-12-20 01:05:17 -05:00
Nick Ethier 060ceb3635
fix test 2018-12-20 01:01:53 -05:00
Nick Ethier a96afb6c91
fix tests that fail as a result of async client startup 2018-12-20 00:53:44 -05:00
Nick Ethier 82175d1328
client/drivermananger: add driver manager
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.

Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
2018-12-18 22:55:18 -05:00
Alex Dadgar 4c57d2ec4d Add plugin API versioning to plugin loader and plugins 2018-12-18 16:48:00 -08:00
Nick Ethier 09dadf0a23
Merge branch 'master' into f-grpc-executor
* master: (71 commits)
  Fix output of 'nomad deployment fail' with no arg
  Always create a running allocation when testing task state
  tests: ensure exec tests pass valid task resources (#4992)
  some changes for more idiomatic code
  fix iops related tests
  fixed bug in loop delay
  gofmt
  improved code for readability
  client: updateAlloc release lock after read
  fixup! device attributes in `nomad node status -verbose`
  drivers/exec: support device binds and mounts
  fix iops bug and increase test matrix coverage
  tests: tag image explicitly
  changelog
  ci: install lxc-templates explicitly
  tests: skip checking rdma cgroup
  ci: use Ubuntu 16.04 (Xenial) in TravisCI
  client: update driver info on new fingerprint
  drivers/docker: enforce volumes.enabled (#4983)
  client: Style: use fluent style for building loggers
  ...
2018-12-13 14:41:09 -05:00
Brian Lalor 31ef34838e
Fix output of 'nomad deployment fail' with no arg 2018-12-13 13:22:17 -05:00
Mahmood Ali d497729826
Merge pull request #4978 from hashicorp/f-device-tweaks
Display device attributes in `nomad node status -verbose`
2018-12-12 19:45:07 -05:00
Mahmood Ali 00c9385a2b fixup! device attributes in `nomad node status -verbose` 2018-12-12 09:17:31 -05:00
Alex Dadgar 86d9ad4397 fix iops bug and increase test matrix coverage 2018-12-11 15:28:21 -08:00
Mahmood Ali 69b2355274
Merge pull request #4975 from hashicorp/fix-master-20181209
Some test fixes and remedies
2018-12-11 18:00:21 -05:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Mahmood Ali 5a487ac884 tests: prevent indefinite blocking in some tests
Noticed few places where tests seem to block indefinitely and panic
after the test run reaches the test package timeout.

I intend to follow up with the proper fix later, but timing out is much
better than indefinitely blocking.
2018-12-11 09:35:26 -05:00
Michael Schurter 8808ab9cea
Merge pull request #4953 from hashicorp/b-script-context-wrapper
consul: add ScriptExecutor context wrapper
2018-12-10 17:22:53 -08:00
Michael Schurter 4c5f3ae82c
Merge pull request #4952 from hashicorp/b-script-context
consul: fix script checks exiting after 1 run
2018-12-10 17:22:15 -08:00
Mahmood Ali 14668f48d1 device attributes in `nomad node status -verbose`
This reports device attributes like the following:

```
$ nomad node status -self -verbose
ID          = f7adb958-29e1-2a5a-2303-9d61ffaab33a
Name        = mars.local
Class       = <none>
DC          = dc1
Drain       = false
Eligibility = eligible
Status      = ready
Uptime      = 12h40m13s

Drivers
Driver       Detected  Healthy  Message                               Time
docker       true      true     healthy                               2018-12-10T11:47:19-05:00
...

Attributes
cpu.arch                      = amd64
cpu.frequency                 = 2200
cpu.modelname                 = Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
cpu.numcores                  = 12
...

Device Group Attributes
Device Group = nomad/file/mock
block_device = sda1
filesystem   = ext4
size         = 63.2 GB

Meta
```
2018-12-10 12:18:24 -05:00
Mahmood Ali 9f69b8bfec Rename helper_stats -> helper_devices 2018-12-10 12:18:24 -05:00
Nick Ethier 47df1dde10
Merge branch 'master' into f-grpc-executor 2018-12-06 21:42:38 -05:00
Nick Ethier 29ef54c0ee
executor: merge plugin shim with executor package 2018-12-06 21:13:45 -05:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Nick Ethier 8b20de4801
executor: use grpc instead of netrpc as plugin protocol
* Added protobuf spec for executor
 * Seperated executor structs into their own package
2018-12-05 11:03:56 -05:00
Mahmood Ali f8efc40b8b tests: stop integration tests tasks explicitly
Also update the new recommended `nomad job` subcommands
2018-12-04 11:50:59 -05:00
Michael Schurter 8fa5e90095 consul: add ScriptExecutor context wrapper
Since d335a82859ca2177bc6deda0c2c85b559daf2db3 ScriptExecutors now take
a timeout duration instead of a context. This broke the script check
removal code which used context cancelation propagation to remove
script checks while they were executing.

This commit adds a wrapper around ScriptExecutors that obeys context
cancelation again. The only downside is that it leaks a goroutine until
the underlying Exec call completes or timeouts.

Since check removal is relatively rare, check timeouts usually low, and
scripts usually fast, the risk of leaking a goroutine seems very small.
2018-12-03 20:26:31 -08:00
Michael Schurter 6459c19ffc consul: fix script checks exiting after 1 run
Fixes a regression caused in d335a82859ca2177bc6deda0c2c85b559daf2db3

The removal of the inner context made the remaining cancels cancel the
outer context and cause script checks to exit prematurely.
2018-12-03 18:50:02 -08:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson ffc5e5d56b executors: Unify go-plugin handshake 2018-11-30 10:59:23 +01:00
Danielle Tomlinson fdfe93aa25 fixup: executorplugin: fix rkt build 2018-11-30 10:47:08 +01:00
Danielle Tomlinson d26a310db0 client: Move executor plugins into own package 2018-11-30 10:46:13 +01:00
Danielle Tomlinson 9b3e731f88 command: Remove Extraneous field in nodedrain test 2018-11-30 10:46:13 +01:00
Nick Ethier 80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier b1484aec33
nomad: fix hclog usage 2018-11-29 22:27:39 -05:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier ed65610ec6
command/agent: additional tests for telemetry config parsing 2018-11-19 23:22:33 -05:00
Nick Ethier b81e4e18f0
agent: suppose filter_default telemetry option 2018-11-19 23:21:48 -05:00
Nick Ethier 85b221a1d6
nomad: add flag to disable publishing of job_summary metrics for dispatched jobs 2018-11-19 23:21:19 -05:00
Nick Ethier 9e64ce7d73
docker: properly launch docker logger process 2018-11-19 22:59:12 -05:00
Mahmood Ali 9479015f51
Merge pull request #4884 from hashicorp/f-alloc-devices-cli
Report alloc device statistics in API and CLI
2018-11-16 18:04:54 -05:00
Mahmood Ali 6f9126f475 show Device Stats header in alloc status 2018-11-16 17:34:37 -05:00
Mahmood Ali 00ffd02ced Show stable order of device attributes 2018-11-16 17:34:37 -05:00
Preetha 5f094633fa
Merge pull request #4889 from hashicorp/f-service-meta
Pass service metadata "external-source" for consul UI integration
2018-11-16 12:24:21 -06:00
Preetha Appan 18708d3f0b
Pass service metadata "external-source" for consul UI integration 2018-11-16 11:28:56 -06:00
Mahmood Ali 33c96a803a tweak whitespace in device stats output 2018-11-16 10:37:39 -05:00
Mahmood Ali 159b8f866a Display device stats in `nomad alloc status` 2018-11-16 10:26:32 -05:00
Mahmood Ali d88a3f8413 Prepare to reuse device resources printing 2018-11-16 10:26:32 -05:00
Michael Schurter 6f3712ed48 gofmt -s -w command/helper_stats_test.go
Fixes the static checks build
2018-11-15 14:14:05 -08:00
Mahmood Ali 24b37e0aaf Display StatsObject nested objects as well 2018-11-15 08:09:54 -05:00
Mahmood Ali ee9353fbd6 Use disk display format for devices 2018-11-14 22:13:23 -05:00
Mahmood Ali 0712be643f Print verbose device in `nomad node status -stats` 2018-11-14 22:13:23 -05:00
Mahmood Ali 93e8fc53f9 device stats summary in `node status`
Sample output with a mock device:

```
Host Resource Utilization
CPU             Memory          Disk
2651/26400 MHz  9.6 GiB/16 GiB  98 GiB/234 GiB

Device Resource Utilization
nomad/file/mock[README.md]    511 bytes
nomad/file/mock[e2e.go]       239 bytes
nomad/file/mock[e2e_test.go]  128 bytes

Allocations
No allocations placed
```
2018-11-14 22:13:23 -05:00
Mahmood Ali c62ec124c0 Set clean config for mock driver
The default job here contains some exec task config (for setting
command and args) that aren't used for mock driver.  Now, the alloc
runner seems stricter about validating fields and errors on unexpected
fields.

Updating configs in tests so we can have an explicit task config
whenever driver is set explicitly.
2018-11-13 10:21:40 -05:00
Mahmood Ali c7610d8c22 mark and skip failing consul failing tests 2018-11-13 10:21:40 -05:00
Preetha Appan fd0ba320da
change path to v1/scheduler/configuration 2018-11-12 15:57:45 -06:00
Preetha Appan 3a10a589d7
Fix failing test 2018-11-10 19:53:47 -06:00
Preetha Appan 7ef126a027
Smaller methods, and added tests for RPC layer 2018-11-10 17:37:33 -06:00
Preetha Appan 75662b50d1
Use response object/querymeta/writemeta in scheduler config API 2018-11-10 10:31:10 -06:00
Alex Dadgar 98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Preetha Appan f1a69e529c
Fix vet error 2018-11-08 09:48:43 -06:00