Commit Graph

2262 Commits

Author SHA1 Message Date
Michael Schurter fb487358fb
connect: add group.service stanza support 2019-07-31 01:04:05 -04:00
Nick Ethier 6537279686
agent: simplify if block 2019-07-31 01:03:17 -04:00
Nick Ethier 8650429e38
Add network stanza to group
Adds a network stanza and additional options to the task group level
in prep for allowing shared networking between tasks of an alloc.
2019-07-31 01:03:12 -04:00
Michael Schurter d31488e262
Merge pull request #5978 from pete-woods/configurable-job-gc-interval
command/agent: allow the job GC interval to be configured
2019-07-30 15:54:29 -07:00
Nomad Release bot e39fb11531 Generate files for 0.9.4 release 2019-07-30 19:05:18 +00:00
Pete Woods b47c5ca467
Allow the job GC interval to be configured from default of 5 minutes 2019-07-26 10:11:25 +01:00
Danielle 45f3f928f5
Merge pull request #5996 from hashicorp/f-reload-log-level
Support for hot reloading log levels
2019-07-24 13:54:04 +02:00
Danielle Lancashire 0422f1b0c2
Support for hot reloading log levels 2019-07-24 13:37:08 +02:00
Nomad Release bot 04187c8b86 Generate files for 0.9.4-rc1 release 2019-07-22 21:42:36 +00:00
Danielle Lancashire d454dab39b
chore: Format hcl configurations 2019-07-20 16:55:07 +02:00
Michael Schurter db4de5fae9
Merge pull request #5975 from hashicorp/b-check-watcher-deadlock
consul: fix deadlock in check-based restarts
2019-07-18 13:13:40 -07:00
Michael Schurter 6d095b3b36 consul: add test for check watcher deadlock 2019-07-18 08:24:09 -07:00
Michael Schurter 826d2503e6
Update command/agent/consul/check_watcher.go
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2019-07-18 07:08:27 -07:00
Michael Schurter 5407584bc3 consul: fix deadlock in check-based restarts
Fixes #5395
Alternative to #5957

Make task restarting asynchronous when handling check-based restarts.
This matches the pre-0.9 behavior where TaskRunner.Restart was an
asynchronous signal. The check-based restarting code was not designed
to handle blocking in TaskRunner.Restart. 0.9 made it reentrant and
could easily overwhelm the buffered update chan and deadlock.

Many thanks to @byronwolfman for his excellent debugging, PR, and
reproducer!

I created this alternative as changing the functionality of
TaskRunner.Restart has a much larger impact. This approach reverts to
old known-good behavior and minimizes the number of places changes are
made.
2019-07-17 15:22:21 -07:00
Chris Baker 8a75afcb39
Merge pull request #5870 from hashicorp/b-nmd-1529-alloc-stop-missing-header
api: return X-Nomad-Index header on allocation stop
2019-07-17 13:25:17 -04:00
Mahmood Ali 5d09b04f69
Merge pull request #5837 from hashicorp/b-consul-restore-sync-2
Avoid de-registering slowly restored services
2019-07-17 12:02:24 +08:00
Mahmood Ali ec7e258d71 address review feedback 2019-07-17 10:43:13 +07:00
Eli Shvartsman 692fd19884 take NodeID from url in api for node eligibility 2019-07-15 18:34:53 +03:00
Preetha 5b83cd4ce0
Merge pull request #5894 from hashicorp/f-remove-deprecated-code
Remove deprecated code
2019-07-02 09:29:24 -05:00
Preetha Appan aa2b4b4e00
Undo removal of node drain compat changes
Decided to remove that in 0.10
2019-07-01 15:12:01 -05:00
Preetha Appan 3345ce3ba4
Infer content type in alloc fs stat endpoint 2019-06-28 20:31:28 -05:00
Preetha Appan f6fc5d40d1
one more drain test 2019-06-26 17:33:51 -05:00
Preetha Appan 67bf66efc6
remove now unneeded test 2019-06-26 16:59:23 -05:00
Preetha Appan 10e7d6df6d
Remove compat code associated with many previous versions of nomad
This removes compat code for namespaces (0.7), Drain(0.8) and other
older features from releases older than Nomad 0.7
2019-06-25 19:05:25 -05:00
Chris Baker 3429cf39ed api: return X-Nomad-Index header on allocation stop 2019-06-21 16:20:06 +00:00
Chris Baker 59fac48d92 alloc lifecycle: 404 when attempting to stop non-existent allocation 2019-06-20 21:27:22 +00:00
Mahmood Ali b209584dce
Merge pull request #5726 from hashicorp/b-plugins-via-init
Use init() to handle plugin invocation
2019-06-18 21:09:03 -04:00
Mahmood Ali e07413c420 Avoid de-registering slowly restored services
When a nomad client restarts/upgraded, nomad restores state from running
task and starts the sync loop.  If sync loop runs early, it may
deregister services from Consul prematurely even when Consul has the
running service as healthy.

This is not ideal, as re-registering the service means potentially
waiting a whole service health check interval before declaring the
service healthy.

We attempt to mitigate this by introducing an initialization probation
period.  During this time, we only deregister services and checks that
were explicitly deregistered, and leave unrecognized ones alone.  This
serves as a grace period for restoring to complete, or for operators to
restore should they recognize they restored with the wrong nomad data
directory.
2019-06-14 11:15:21 -04:00
Mahmood Ali 962921f86c Use init to handle plugin invocation
Currently, nomad "plugin" processes (e.g. executor, logmon, docker_logger) are started as CLI
commands to be handled by command CLI framework.  Plugin launchers use
`discover.NomadBinary()` to identify the binary and start it.

This has few downsides: The trivial one is that when running tests, one
must re-compile the nomad binary as the tests need to invoke the nomad
executable to start plugin.  This is frequently overlooked, resulting in
puzzlement.

The more significant issue with `executor` in particular is in relation
to external driver:

* Plugin must identify the path of invoking nomad binary, which is not
trivial; `discvoer.NomadBinary()` now returns the path to the plugin
rather than to nomad, preventing external drivers from launching
executors.

* The external driver may get a different version of executor than it
expects (specially if we make a binary incompatible change in future).

This commit addresses both downside by having the plugin invocation
handling through an `init()` call, similar to how libcontainer init
handler is done in [1] and recommened by libcontainer [2].  `init()`
will be invoked and handled properly in tests and external drivers.

For external drivers, this change will cause external drivers to launch
the executor that's compiled against.

There a are a couple of downsides to this approach:
* These specific packages (i.e executor, logmon, and dockerlog) need to
be careful in use of `init()`, package initializers.  Must avoid having
command execution rely on any other init in the package.  I prefixed
files with `z_` (golang processes files in lexical order), but ensured
we don't depend on order.
* The command handling is spread in multiple packages making it a bit
less obvious how plugin starts are handled.

[1] drivers/shared/executor/libcontainer_nsenter_linux.go
[2] eb4aeed24f/libcontainer (using-libcontainer)
2019-06-13 16:48:01 -04:00
Jasmine Dahilig ed9740db10
Merge pull request #5664 from hashicorp/f-http-hcl-region
backfill region from hcl for jobUpdate and jobPlan
2019-06-13 12:25:01 -07:00
Jasmine Dahilig 51e141be7a backfill region from job hcl in jobUpdate and jobPlan endpoints
- updated region in job metadata that gets persisted to nomad datastore
- fixed many unrelated unit tests that used an invalid region value
(they previously passed because hcl wasn't getting picked up and
the job would default to global region)
2019-06-13 08:03:16 -07:00
Danielle b7fc81031b
Merge pull request #5829 from hashicorp/dani/b-5819
consul: Include port-label in service registration
2019-06-13 16:20:45 +02:00
Danielle Lancashire 8112177503
consul: Include port-label in service registration
It is possible to provide multiple identically named services with
different port assignments in a Nomad configuration.

We introduced a regression when migrating to stable service identifiers where
multiple services with the same name would conflict, and the last definition
would take precedence.

This commit includes the port label in the stable service identifier to
allow the previous behaviour where this was supported, for example
providing:

```hcl
service {
  name = "redis-cache"
  tags = ["global", "cache"]
  port = "db"
  check {
    name     = "alive"
    type     = "tcp"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "foo"]
  port = "foo"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}

service {
  name = "redis-cache"
  tags = ["global", "bar"]
  port = "bar"

  check {
    name     = "alive"
    type     = "tcp"
    port     = "db"
    interval = "10s"
    timeout  = "2s"
  }
}
```

in a nomad task definition is now completely valid. Each service
definition with the same name must still have a unique port label however.
2019-06-13 15:24:54 +02:00
Nick Ethier 1b7fa4fe29
Optional Consul service tags for nomad server and agent services (#5706)
Optional Consul service tags for nomad server and agent services
2019-06-13 09:00:35 -04:00
Preetha 8a98817fe4
Merge pull request #5820 from hashicorp/r-assorted-changes-20190612_1
Assorted minor changes
2019-06-12 10:33:16 -05:00
Danielle Lancashire ae8bb7365a
alloc-lifecycle: Fix restart with empty body
Currently when you submit a manual request to the alloc lifecycle API
with a version of Curl that will submit empty bodies, the alloc restart
api will fail with an EOF error.

This behaviour is undesired, as it is reasonable to not submit a body at
all when restarting an entire allocation rather than an individual task.

This fixes it by ignoring EOF (not unexpected EOF) errors and treating
them as entire task restarts.
2019-06-12 15:35:00 +02:00
Mahmood Ali b00d1f1e10 tests: parsing dir should be equivalent to parsing individual files 2019-06-12 08:19:09 -04:00
Mahmood Ali 3d8f2622e9 tests: avoid manipulating package variables 2019-06-12 08:16:32 -04:00
Lang Martin 3837c9b021 command add comments re: defaults to LoadConfig 2019-06-11 22:35:43 -04:00
Lang Martin 02aae678be config_parse_test update comment for accuracy 2019-06-11 22:30:20 -04:00
Lang Martin 7aa95ebd6f config_parse get rid of ParseConfigDefault 2019-06-11 22:00:23 -04:00
Lang Martin 9b0411af6a Revert "config explicitly merge defaults once when using a config directory"
This reverts commit 006a9a1d454739eee21b7d8abb8b7aef1353b648.
2019-06-11 22:00:23 -04:00
Lang Martin 1e2f87a11e agent/testdata add a configuration directory for testing 2019-06-11 16:34:04 -04:00
Lang Martin fe8a4781d8 config merge maintains *HCL string fields used for duration conversion 2019-06-11 16:34:04 -04:00
Lang Martin 3bd153690b config_parse_test, handle defaults 2019-06-11 16:34:04 -04:00
Lang Martin c97dd512f4 config explicitly merge defaults once when using a config directory 2019-06-11 15:42:27 -04:00
Lang Martin ad56434472 config_parse split out defaults from ParseConfig 2019-06-11 15:42:27 -04:00
Lang Martin 28cf8eddfe config parse_test check for string coercion in client.meta 2019-06-10 13:12:38 -04:00
Michael Schurter 073893f529 nomad: disable service+batch preemption by default
Enterprise only.

Disable preemption for service and batch jobs by default.

Maintain backward compatibility in a x.y.Z release. Consider switching
the default for new clusters in the future.
2019-06-04 15:54:50 -07:00
Mahmood Ali a9f81f2daa client config flag to disable remote exec
This exposes a client flag to disable nomad remote exec support in
environments where access to tasks ought to be restricted.

I used `disable_remote_exec` client flag that defaults to allowing
remote exec. Opted for a client config that can be used to disable
remote exec globally, or to a subset of the cluster if necessary.
2019-06-03 15:31:39 -04:00