Commit graph

351 commits

Author SHA1 Message Date
Mahmood Ali b2b7618a1c
clarify unknown signal log line (#5466) 2019-03-25 17:19:43 -04:00
Mahmood Ali 2a7b18aec4
Revert "executor: synchronize exitState accesses" (#5449)
Reverts hashicorp/nomad#5433

Apparently, channel communications can constitute Happens-Before even for proximate variables, so this syncing isn't necessary.

> _The closing of a channel happens before a receive that returns a zero value because the channel is closed._
https://golang.org/ref/mem#tmp_7
2019-03-20 07:33:05 -04:00
Nick Ethier 505e36ff7a
Merge pull request #5429 from hashicorp/b-blocking-executor-shutdown
executor: block shutdown on process exiting
2019-03-19 15:18:01 -04:00
Mahmood Ali a1776dba34 executor: synchronize exitState accesses
exitState is set in `wait()` goroutine but accessed in a different
`Wait()` goroutine, so accesses must be synchronized by a lock.
2019-03-17 11:56:58 -04:00
Nick Ethier 7418d09cf0
executor: block shutdown on process exiting 2019-03-15 23:50:17 -04:00
Mahmood Ali 8ec49fc133
Handle when cannot fetch docker logs (#5420)
Fix #5418

When using a docker logger that doesn't support log streaming through
API, currently docker logger runs a tight loop of Docker API calls
unexpectedly. This change ensures we stop fetching logs early.

Also, this adds some basic backoff strategy when Docker API logging
fails unexpectedly, to avoid accidentally DoSing the docker daemon.
2019-03-14 16:23:11 -04:00
Mahmood Ali fb55717b0c
Regenerate Proto files (#5421)
Noticed that the protobuf files are out of sync with ones generated by 1.2.0 protoc go plugin.

The cause for these files seem to be related to release processes, e.g. [0.9.0-beta1 preperation](ecec3d38de (diff-da4da188ee496377d456025c2eab4e87)), and [0.9.0-beta3 preperation](b849d84f2f).

This restores the changes to that of the pinned protoc version and fails build if protobuf files are out of sync.  Sample failing Travis job is that of the first commit change: https://travis-ci.org/hashicorp/nomad/jobs/506285085
2019-03-14 10:56:27 -04:00
Preetha Appan 7f0d9e0c8e
minor review feedback 2019-03-13 13:27:28 -05:00
Preetha Appan 273f1e993d
Validate all auth fields being empty rather than just email
This fixes a regression in 0.9 beta3 compared to 0.8.7 in validating
docker auth config
2019-03-13 11:47:37 -05:00
Preetha Appan 549ae657f0
Dont require email address for docker auth 2019-03-13 11:08:56 -05:00
Preetha 7759166b0d
Merge pull request #5380 from quasilyte/patch-1
drivers/shared/executor: fix strings.Replace call
2019-03-06 11:47:01 -06:00
Mahmood Ali bb32ba8784
Support driver config fields being set to nil (#5391)
To pick up https://github.com/hashicorp/hcl2/pull/90
2019-03-05 21:47:06 -05:00
Iskander (Alex) Sharipov e69909fbd3
drivers/shared/executor: fix strings.Replace call
strings.Replace call with n=0 argument makes no sense
as it will do nothing. Probably -1 is intended.

Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>
2019-03-02 00:33:17 +03:00
Mahmood Ali 4726cb2207 logging.Type over logging.Driver 2019-02-28 16:40:18 -05:00
Mahmood Ali 104869c0e1 drivers/docker: rename logging type to driver
Docker uses the term logging `driver` in its public documentations: in
`docker` daemon config[1], `docker run` arguments [2] and in docker compose file[3].
Interestingly, docker used `type` in its API [4] instead of everywhere
else.

It's unfortunate that Nomad used `type` modeling after the Docker API
rather than the user facing documents.  Nomad using `type` feels very
non-user friendly as it's disconnected from how Docker markets the flag
and shows internal representation instead.

Here, we rectify the situation by introducing `driver` field and
prefering it over `type` in logging.

[1] https://docs.docker.com/config/containers/logging/configure/
[2] https://docs.docker.com/engine/reference/run/#logging-drivers---log-driver
[3] https://docs.docker.com/compose/compose-file/#logging
[4] https://docs.docker.com/engine/api/v1.39/#operation/ContainerCreate
2019-02-28 16:04:03 -05:00
Mahmood Ali 67e2a0ac05
docker: report unhealthy in unsupported Windows (#5356)
On Windows, Nomad only supports Windows containers, so report as
unhealthy otherwise.
2019-02-27 08:10:23 -05:00
Michael Schurter 812f1679e2
Merge pull request #5352 from hashicorp/b-leaked-logmon
logmon fixes
2019-02-26 08:35:46 -08:00
Danielle Tomlinson e250aad31b
Merge pull request #5355 from hashicorp/dani/windows-dockerstats
docker: Support Stats on Windows
2019-02-26 16:39:48 +01:00
Danielle Tomlinson e3dc80bea3 docker: Return undetected before first detection
This commit causes the docker driver to return undetected before it
first establishes a connection to the docker daemon.

This fixes a bug where hosts without docker installed would return as
unhealthy, rather than undetected.
2019-02-25 11:02:42 +01:00
Danielle Tomlinson 8aff115fca docker: Support stats on Windows 2019-02-22 14:19:58 +01:00
Michael Schurter 38821954b7 plugins: squelch context Canceled error logs
As far as I can tell this is the most straightforward and resilient way
to skip error logging on context cancellation with grpc streams. You
cannot compare the error against context.Canceled directly as it is of
type `*status.statusError`. The next best solution I found was:

```go
resp, err := stream.Recv()
if code, ok := err.(interface{ Code() code.Code }); ok {
	if code.Code == code.Canceled {
		return
	}
}
```

However I think checking ctx.Err() directly makes the code much easier
to read and is resilient against grpc API changes.
2019-02-21 15:32:18 -08:00
Mahmood Ali 6d30284ec9
Merge pull request #5341 from hashicorp/ci-windows-docker
Run Docker tests in Windows AppVeyor CI
2019-02-21 13:17:33 -05:00
Danielle Tomlinson 2610e2d9ef docker: Avoid leaking containers during Reattach
Currently if a docker_logger cannot be reattached to, we will leak the
container that was being used. This is problematic if e.g using static
ports as it means you can never recover your task, or if a service is
expensive to run and will then be running without supervision.
2019-02-20 17:47:06 +01:00
Danielle Tomlinson 953755ce24
Merge pull request #5335 from hashicorp/dani/docker-logger-spawn
Increase resiliency of docker driver logging
2019-02-20 17:16:05 +01:00
Michael Schurter a1645edb0b Update drivers/docker/docklog/docker_logger.go
Co-Authored-By: dantoml <dani@tomlinson.io>
2019-02-20 17:12:56 +01:00
Danielle Tomlinson 2f18441a47 docker: Respawn docker logger during recovery
Sometimes the nomad docker_logger may be killed by a service manager
when restarting the client for upgrades or reliability reasons.

Currently if this happens, we leak the users container and try to
reschedule over it.

This commit adds a new step to the recovery process that will spawn a
new docker logger process that will fetch logs from _the current
timestamp_. This is to avoid restarting users tasks because our logging
sidecar has failed.
2019-02-20 17:12:56 +01:00
Mahmood Ali 8c82c19831 tests: IsTravis() -> IsCI()
Replace IsTravis() references that is intended for more CI environments
rather than for Travis environment specifically.
2019-02-20 08:21:03 -05:00
Mahmood Ali fedab3d7b0 driver/docker: Skip failing Windows tests
Skip currently Docker tests that fail on Windows for further
investigation.
2019-02-20 07:48:02 -05:00
Mahmood Ali dd8a5c862a
Merge pull request #5321 from hashicorp/b-portmap-regression
drivers: restore port_map old json support
2019-02-19 20:58:37 -05:00
Mahmood Ali 4def8529db driver/docker: use BlockAttrs for storage_opts
storage_opts is a new field in 0.9 cycle and doesn't have backward
compatibility constraints.
2019-02-19 20:35:28 -05:00
Mahmood Ali a394cd63f4
CVE-2019-5736: Update libcontainer depedencies (#5334)
* CVE-2019-5736: Update libcontainer depedencies

Libcontainer is vulnerable to a runc container breakout, that was
reported as CVE-2019-5736[1].  Upgrading vendored libcontainer with the fix.

The runc changes are captured in 369b920277 .

[1] https://seclists.org/oss-sec/2019/q1/119
2019-02-19 20:21:18 -05:00
Danielle Tomlinson 3cf3ac7eac dlogger: Increase resilience to docker api failure
This commit adds some extra resiliency to the docker logger in the case
of API failure from the docker daemon, by restarting the stream from the
current point in time if the stream returns and the container is still
running.
2019-02-19 15:17:54 +01:00
Mahmood Ali 46cd3c3f55 drivers: restore port_map old json support
This ensures that `port_map` along with other block like attribute
declarations (e.g. ulimit, labels, etc) can handle various hcl and json
syntax that was supported in 0.8.

In 0.8.7, the following declarations are effectively equivalent:

```
// hcl block
port_map {
  http = 80
  https = 443
}

// hcl assignment
port_map = {
  http  = 80
  https = 443
}

// json single element array of map (default in API response)
{"port_map": [{"http": 80, "https": 443}]}

// json array of individual maps (supported accidentally iiuc)
{"port_map: [{"http": 80}, {"https": 443}]}
```

We achieve compatbility by using `NewAttr("...", "list(map(string))",
false)` to be serialized to a `map[string]string` wrapper, instead of using
`BlockAttrs` declaration.  The wrapper merges the list of maps
automatically, to ease driver development.

This approach is closer to how v0.8.7 implemented the fields [1][2], and
despite its verbosity, seems to perserve 0.8.7 behavior in hcl2.

This is only required for built-in types that have backward
compatibility constraints.  External drivers should use `BlockAttrs`
instead, as they see fit.

[1] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L216
[2] https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/docker.go#L698-L700
2019-02-16 11:37:33 -05:00
Danielle Tomlinson be431cb83d
Merge pull request #5326 from hashicorp/dani/json-submission
api: Fix compatibility with pre 0.9 API jobs
2019-02-14 18:56:13 +01:00
Mahmood Ali 1430f94b2a
Update drivers/docker/config_test.go
Co-Authored-By: dantoml <dani@tomlinson.io>
2019-02-14 18:55:10 +01:00
Danielle Tomlinson 3f696be06b Add regression test for parsing null mounts 2019-02-14 18:03:35 +01:00
Danielle Tomlinson a3a1491958 drivers/docker: SIGTERM to stop containers
Windows Docker daemon does not support SIGINT, SIGTERM is the semantic
equivalent that allows for graceful shutdown before being followed up by
a SIGKILL.
2019-02-14 15:38:54 +00:00
Mahmood Ali f7102cd01d
tests: add hcl task driver config parsing tests (#5314)
* drivers: add config parsing tests

Add basic tests for parsing and encoding task config.

* drivers/docker: fix some config declarations

* refactor and document config parse helpers
2019-02-12 14:46:37 -05:00
Mahmood Ali aec9120994
drivers/java: restore 0.8.7 java version detection (#5317)
Restore 0.8.x behavior where java driver is marked as detected when
`java -version` exits with 0 but returns unexpected output.

Furthermore, we restore behavior when `java -version` where we parse the
first three lines of `java -version` but ignore rest.

If `java -version` returns less than 3 lines, Nomad 0.8.7 would panic.
In this implementation, we'd still mark java as detected but returns
empty version.

The 0.8.7 logic for detecting java version is found in
https://github.com/hashicorp/nomad/blob/v0.8.7/client/driver/java.go#L132-L172
.

I punt on revamping how we can be more resilient to java -version
syntax, and aimed for preserving existing behavior instead.
2019-02-12 13:41:26 -05:00
Michael Schurter 3b84e08fa4
Merge pull request #5297 from hashicorp/b-docker-logging
Docker: Fix logging config parsing
2019-02-11 06:57:52 -08:00
Gertjan Roggemans 94ca78354b docker: Fix volume driver_config options spec (#5309)
Fixes #5308
2019-02-11 09:18:44 -05:00
Michael Schurter e1e4b10884 docker: fix logging config parsing
Fixes
https://groups.google.com/d/topic/nomad-tool/B3Uo6Kns2BI/discussion
2019-02-04 11:07:57 -08:00
Nick Ethier e7ea26449e
client: fix bug during 0.8 state up grade that causes external drivers to fail 2019-01-30 14:22:29 -05:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Nick Ethier bb9a8afe9b
executor: fix bug and add tests for incorrect stats timestamp reporting 2019-01-28 21:57:45 -05:00
Nick Ethier bcbed3c532
Merge pull request #5248 from hashicorp/b-rawexec-leak
Fix leaked executor in raw_exec
2019-01-28 21:18:31 -05:00
Alex Dadgar 991bcc3ef1 Don't fall through 2019-01-28 09:53:19 -08:00
Alex Dadgar 403faa0d7c comment 2019-01-28 09:47:53 -08:00
Nick Ethier 1f4c26e19e
raw_exec: ensure executor is killed after task is stopped 2019-01-25 23:06:31 -05:00
Alex Dadgar 68ced492fb Fix killing non-existant container with a kill timeout 2019-01-25 16:21:51 -08:00