Commit Graph

3051 Commits

Author SHA1 Message Date
Alex Dadgar bec9a72eec Remove networking from basic resources 2018-01-12 14:33:42 -08:00
Charlie Voiselle 867bb6f7f9 Found more priviledge.
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Alex Dadgar 9e1e04c6f1
Merge pull request #3727 from filipochnik/fix-gh-2832
Recognize renewing non-renewable Vault lease as fatal
2018-01-10 11:47:10 -08:00
Michael Schurter 189ce7f991
Merge pull request #3723 from hashicorp/b-3702-chown-dirs
chown dirs when migrating ephemeral_disk data
2018-01-09 09:27:26 -08:00
Michael Schurter e6c27256b7 Test streamed directory ownership 2018-01-08 16:00:07 -08:00
Michael Schurter 2c79ffb213 chown dirs when migrating ephemeral_disk data
Fixes #3702

Added missing chown call and made it conditional on running as root and
not on Windows as we do with files.
2018-01-08 15:31:12 -08:00
Charlie Voiselle 1bb1ab5069 fix typo
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Filip Ochnik d265e11c36 Recognize renewing non-renewable Vault lease as fatal 2018-01-08 20:32:31 +01:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo 9741097406 reloading tls config should be atomic for clients/servers 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo acd3d1b162 fix up downgrading client to plaintext
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Michael Schurter ef76c65da1 Lookup euid outside of loop 2017-12-13 11:50:12 -08:00
Michael Schurter 5032bf4f5a Skip tests that require root when not root
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar f0b0697b57 Keyify struct 2017-12-11 17:23:14 -08:00
Michael Schurter c4d4ead199 Fix test broken by mock updates 2017-12-08 16:45:25 -08:00
Michael Schurter 4b20441eef Validate port label for host address mode
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter 30dd570061 Fix interpolation bug with service/check updates
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
2017-12-08 12:03:00 -08:00
Michael Schurter 4347026f83 Test Consul from TaskRunner thoroughly
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Alex Dadgar a0d6b6a121
Merge pull request #3630 from hashicorp/b-periodic
Handle race between fingerprinters and registration
2017-12-07 16:11:13 -08:00
Alex Dadgar 91ffbbb517 Review feedback 2017-12-07 16:10:57 -08:00
Chelsea Komlo c8e0cb3044
Merge pull request #3591 from hashicorp/b-1755-stop
Allow controlling the stop signal for drivers
2017-12-07 17:06:43 -05:00
Alex Dadgar 02baa6c52b Handle race between fingerprinters and registration 2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo 61fa8ad4ba code review fixes 2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo 77ab41124b set default kill signal on executor shutdown 2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo 6cae8fe6e6 extend configurable kill signal to java driver 2017-12-07 11:40:10 -05:00
Alex Dadgar 4409fdacc0 Drop trace logging 2017-12-06 18:02:24 -08:00
Alex Dadgar cd9a7f14b8 Add logging around heartbeats 2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo 350319239c change location of default kill signal 2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo 7dfb64f941 extract signal helper into utils 2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo b08611cfac move kill_signal to task level, extend to docker 2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo 80de7d5ebd allow controlling the stop signal in exec/raw_exec 2017-12-06 11:28:45 -05:00
Chelsea Komlo 9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Michael Schurter b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo 4463dc607e fix up test 2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo 7284f2385a remove unused user option 2017-12-04 18:01:31 -05:00
Michael Schurter 6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo 7c74968452 add ability to specify user for rkt 2017-12-04 14:21:48 -05:00
Michael Schurter 2bf1d6d85e rkt: Don't require port_map with host networking
Also don't try to return a DriverNetwork with host networking. None will
ever exist as that's the point of host networking: rkt won't create a
network namespace.
2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo 4ee2122536 get KillTimeout in seconds, not nanoseconds 2017-12-01 10:43:00 -05:00
Michael Schurter 5e975bbd0f Add comment and normalize err check ordering
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter d996c3a231 Check for error file when receiving snapshots 2017-11-29 17:26:11 -08:00
Michael Schurter ca946679f6 Destroy partially migrated alloc dirs
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Schurter 23c66e37c5 Handle errors during snapshotting
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo 2208964948 Support StopTimeout for Docker tasksw
Update github.com/fsouza/go-dockerclient
2017-11-29 14:33:05 -05:00
Preetha Appan 6ad65c51e6 Missed assert in one place 2017-11-20 13:04:38 -06:00
Preetha Appan 747bd59daa Better error validation, and added test case for invalid sysctl inputs 2017-11-20 12:07:18 -06:00
Preetha Appan c68973747b Address some review comments 2017-11-20 11:15:09 -06:00
Preetha Appan 39ef9ee76d Fix gofmt warnings 2017-11-18 09:23:09 -06:00
Preetha Appan e53dd15f58 Fix test compilation after rebase 2017-11-17 17:46:04 -06:00
Samuel BERTHE 0fca2e19c8 review(docker driver): sysctls -> sysctl + ulimits -> ulimit 2017-11-17 16:30:45 -06:00
Samuel BERTHE 6c93922cb7 Oops 2017-11-17 16:14:14 -06:00
Samuel BERTHE c8363bc44b 💄 2017-11-17 16:03:22 -06:00
Samuel BERTHE 281ab90484 test(docker driver): testing sysctls and ulimits 2017-11-17 16:03:22 -06:00
Samuel BERTHE b9a10ff7fa feat(docker driver): adds sysctls and ulimits configs 2017-11-17 16:03:22 -06:00
Alex Dadgar 69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter 3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter 737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter 437fce9954 Improve rktRemove error message 2017-11-16 15:45:14 -08:00
Michael Schurter 3ceec0caab Remove rkt pods when exiting
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle 7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle 969ddf9c2a Lowered to DEBUG from AD feedback 2017-11-16 14:13:03 -05:00
Alex Dadgar 05b1588cea Only publish metric when the task is running and dev mode publishes metrics 2017-11-15 13:21:06 -08:00
Alex Dadgar 07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar 97ec3974a9 Use interface attached to default route 2017-11-15 11:32:32 -08:00
Michael Schurter f86f0bd9ea Handle leader task being dead in RestoreState
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932

While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*

The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle 1197637251 Dropped loglevel for AWS fingerprinter env reads
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally).  Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter 3023336b39 Add a test demonstrating the bug
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar ee31e15f51 Better interface selection heuristic
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.

Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan 926c9ed997 Make device mounting unit test verify configuration via docker inspect 2017-11-13 09:56:54 -06:00
Preetha Appan dc2d5fb5a4 Unit test (linux only) that tests mounting a device in the docker driver 2017-11-13 09:56:54 -06:00
Preetha Appan 4834710e45 Add default value for cgroup permissions for device if not set 2017-11-13 09:56:54 -06:00
Preetha Appan 9cdee6991c Remove unnecessary check since validate method already checks this 2017-11-13 09:56:54 -06:00
Preetha Appan 110c1fd4f0 Add support for passing device into docker driver 2017-11-13 09:56:54 -06:00
Alex Dadgar d1358ec1b6 alway load all templates 2017-11-10 12:35:51 -08:00
Alex Dadgar a3ea0c17a0 Handle multiple environment templates
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter 690b8f4cfb Remove noisy log line
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer 11e2870875 Qemu driver: clean up logging; fail unsupported features on Windows 2017-11-03 15:40:20 -07:00
Alex Dadgar 6034916ad1 fix spelling mistake 2017-11-03 15:04:59 -07:00
Alex Dadgar a23033932a
Merge pull request #3459 from multani/docker-oom-notification
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer cef9ba9770 Qemu driver: tweaks in response to PR feedback
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan 0eaef09675 Remove event GenericSource, and address other code review comments. Also added deprecation info in comments. 2017-11-03 10:10:06 -05:00
Preetha Appan 5f09c968b3 Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details. 2017-11-03 09:13:01 -05:00
Alex Dadgar b4af10edde Alloc Runner doesn't panic on restoration. 2017-11-02 16:14:13 -07:00
Alex Dadgar abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury cb68889652 Added the node_id as a tag 2017-11-02 13:29:10 -07:00
Alex Dadgar 701f462d33 remove atlas 2017-11-02 11:27:21 -07:00
Michael Schurter fc33c945be Make unable-to-gc log level adaptive
WARNing when someone has over 50 non-terminal allocs was just too
confusing.

Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:

```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury 8a9d0d40b1 Added support for tagged metrics 2017-11-02 10:07:57 -07:00
Diptanu Choudhury 5f522c6de3 Incrementing the start counter when we are actually starting a container 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 44535e5d10 Recording counter for dead allocs properly 2017-11-02 09:51:20 -07:00
Diptanu Choudhury 0b34e811b7 Added metrics to track task/alloc start/restarts/dead events 2017-11-02 09:51:20 -07:00
Matt Mercer 00f90323c2 Qemu driver: defer cleanup sooner 2017-11-01 17:37:43 -07:00
Matt Mercer 43256af5f3 Qemu driver: clean up test logging; retry integration test for longer 2017-11-01 17:21:56 -07:00
Matt Mercer b1145705d3 Use strings.Replace() instead of custom function 2017-11-01 15:31:35 -07:00
Matt Mercer d51d174fa0 Qemu driver: basic testing of graceful shutdown feature 2017-11-01 15:31:30 -07:00
Matt Mercer c26013ea0b Qemu driver: include PIDs in log output 2017-11-01 15:31:24 -07:00
Matt Mercer 38d9a391aa Qemu driver: ensure proper cleanup of resources 2017-11-01 15:31:20 -07:00
Matt Mercer 46f7e2fa4c Qemu driver: minor logging fixes 2017-11-01 15:31:14 -07:00
Matt Mercer 4afb9dfa2d Standardize driver.qemu logging prefix 2017-11-01 15:30:44 -07:00
Matt Mercer 5127e75569 Qemu driver: add graceful shutdown feature 2017-11-01 15:30:36 -07:00
Michael Schurter 1769db98b7 Fix regression by returning error on unknown alloc 2017-11-01 15:16:38 -05:00
Michael Schurter 9f26b9a403 Fix race in test 2017-11-01 15:16:38 -05:00
Michael Schurter 73e9b57908 Trigger GCs after alloc changes
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter 2a81160dcd Fix GC'd alloc tracking
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.

Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar c710550551 fix test 2017-10-30 12:35:31 -07:00
Alex Dadgar 4831380e57 Node access is done using locked Node copy
Fixes https://github.com/hashicorp/nomad/issues/3454

Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Jonathan Ballet 5429d1c656 docker: changed OOM killed error message 2017-10-27 20:30:52 +02:00
Jonathan Ballet 12615bde9c docker: log that a container has been killed by the OOM killer
Fix: #2203 (at least for Docker tasks)
2017-10-27 18:05:27 +02:00
Alex Dadgar f117eb28c7 go style vars 2017-10-25 10:49:34 -07:00
Alex Dadgar 3f8495dd0e fix two flaky tests 2017-10-23 18:15:52 -07:00
Alex Dadgar cb0d0ef009 move to consul freeport implementation 2017-10-23 16:51:40 -07:00
Alex Dadgar dbc014b360 Standardize retrieving a free port into a helper package 2017-10-23 16:48:20 -07:00
Alex Dadgar 4a69e1ad15 don't double parallel 2017-10-23 16:48:06 -07:00
Alex Dadgar 96ca2bbe4c respond to comments 2017-10-23 15:50:27 -07:00
Alex Dadgar 99c81b5848 Skip if no docker 2017-10-19 16:55:10 -07:00
Alex Dadgar 593536664e fix flaky java tests 2017-10-19 16:49:57 -07:00
Alex Dadgar 4bc452b479 Undo darwin user setting 2017-10-19 16:49:57 -07:00
Alex Dadgar c7c6964313 Run as user on mac 2017-10-19 16:49:57 -07:00
Alex Dadgar 55a1dffa2f sudo docker works 2017-10-19 16:49:57 -07:00
Alex Dadgar 805e7b3b62 docker tests 2017-10-19 16:49:57 -07:00
Michael Schurter 797f49702e Add logging around moby/moby#32648 bug 2017-10-18 10:44:03 -07:00
Michael Schurter 22ac450b2f Properly fail rkt fingerprinting on old vesions 2017-10-16 13:58:58 -07:00
Michael Schurter d7732c1a58 Squelch repeated rkt version warnings 2017-10-16 12:09:47 -07:00
Michael Schurter b5fd075d74 Test fixes from #3383 2017-10-13 15:45:35 -07:00
Michael Schurter b63eee17e9 Merge pull request #3383 from hashicorp/b-migrate-token
base64 migrate token
2017-10-13 13:46:54 -07:00
Michael Schurter dfd2967cdb Merge pull request #3376 from hashicorp/f-node-acls
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter 15b991e039 base64 migrate token
HTTP header values must be ASCII.

Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Alex Dadgar 85178d6048 rkt remove allocid 2017-10-13 10:07:50 -07:00
Adam Stankiewicz cefbc72b49
Remove AllocID from ExecutorContext 2017-10-13 17:07:49 +02:00
Michael Schurter 4a70d4356a Alloc watcher must send Node.SecretID as AuthToken
An auth token is required if ACLs are enabled
2017-10-12 16:38:02 -07:00
Michael Schurter 84d8a51be1 SecretID -> AuthToken 2017-10-12 15:16:33 -07:00
Michael Schurter 59ff94cd71 Don't panic on unexpeced Consul response
Fixes #3326
2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo e1c4701a43 fix up build warnings 2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo b018ca4d46 fixing up code review comments 2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo a77e462465 add tests for functionality 2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo 410adaf726 Add functionality for authenticated volumes 2017-10-11 17:09:20 -07:00
Alex Dadgar 6d3d0a9391 Nomad UI Command 2017-10-09 23:01:55 -07:00
Michael Schurter f788974f8a Merge pull request #3288 from simar7/qemu-improvements
qemu: Add bound checks for memory assignment
2017-10-02 14:47:05 -07:00
Simarpreet Singh d801584c46
qemu: Fix lower memory bound to 128M
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 14:29:44 -07:00
Simarpreet Singh 10d7d6dab0
gofmt: format qemu.go and qemu_test.go
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 13:16:48 -07:00
Michael Schurter a66c53d45a Remove `structs` import from `api`
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Michael Schurter 77f1fe40e7 Properly autodetect Docker IP in Windows
Our Docker network plugin autodetection code was erroneously treating
Window's default network `nat` as a plugin and defaulting to it instead
of the host.

Fixes #3218
2017-09-27 16:49:23 -07:00
Michael Schurter a8a87af7ed Only build rkt driver on linux
Build stub for non-linux targets
2017-09-27 14:21:45 -07:00
Simarpreet Singh 3d99e71de8
qemu: Add bound checks for memory assignment
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-09-26 21:07:48 -07:00
Michael Schurter d7229ce6c5 Merge pull request #3256 from dalegaard/master
Enable rkt driver to use address_mode = 'driver'
2017-09-26 18:04:37 -05:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Lasse Dalegaard 9f584d1114 Ignore rkt network failure if container died early
If the container dies before the network can be read, we now ignore the
error coming out of the network information polling loop. Nomad will
restart the task regardless, so we might be masking the actual error.

The polling loop for the rkt network information, inside the `Start`
method, was getting a bit unwieldy. It's been refactored out so it's not
a seperate function.
2017-09-27 00:15:27 +02:00
Lasse Dalegaard b43ec57c02 Make rkt port mapping test not exit immediately
The rkt port mapping test currently starts redis with --version, which
obviously makes redis exit again almost immediately. This means that the
container exists before the network status can be queried, and so the
test fails.
2017-09-26 23:10:24 +02:00
Lasse Dalegaard 17d155d316 Improve rkt driver network status poll loop
The network status poll loop will now report any networks it ignored, as
well as a no-networks situations.
2017-09-26 21:49:45 +02:00
Lasse Dalegaard bafd32fda0 Refactor rkt network status loop
The network status poll loop for the rkt drivers `Start` method was a
bit messy, and could not display the last encountered error. Here we
clean it up.
2017-09-26 21:27:12 +02:00
Lasse Dalegaard 5e9e2b07bd Small logging fix in rkt/driver 2017-09-26 19:36:13 +02:00
Lasse Dalegaard 3d25fd3b00 Bump minimum rkt version to 1.27.0.
The changes introduces in #3256 require at least rkt 1.27.0 because of
a bug in the JSON output of `rkt status` in previous versions.

Here we upgrade all references to rkt's minimum version, and also make
travis and vagrant use this version when running tests.

Finally we add a CHANGELOG notice.
2017-09-26 19:15:43 +02:00
Lasse Dalegaard f55f2b8f24 Turn rkt network status failure into Start failure
If the rkt driver cannot get the network status, for a task with a
configured port mapping, it will now fail the Start() call and kill the
task instead of simply logging. This matches the Docker behavior.

If no port map is specified, the warnings will be logged but the task
will be allowed to start.
2017-09-26 10:20:57 +02:00
Lasse Dalegaard 55a2e60e1a Test for rkt driver setting DriverNetwork
To test that the rkt driver correctly sets a DriverNetwork, at least
when a port mapping is requested, we amend the
TestRktDriver_PortsMapping test with a small check.
2017-09-26 09:10:50 +02:00
Lasse Dalegaard 2d307d5beb Discard errors from rkt status and cat-manifest
Since we don't actually show these errors anywhere, just discard them
right away.
2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo b26454cf99 Move setGaugeForAllocationStats to emitClientMetrics 2017-09-25 16:05:49 +00:00
Lasse Dalegaard cbcbe0da2e Expose rkt DriverNetwork
Currently the rkt driver does not expose a DriverNetwork instance after
starting the container, which means that address_mode = 'driver' does
not work.

To get the container network information, we can call `rkt status` on
the UUID of the container and grab the container IP from there.

For the port map, we need to grab the pod manifest as it will tell us
which ports the container exposes. We then cross-reference the
configured port name with the container port names, and use that to
create a correct port mapping.

To avoid doing a (bad) reimplementation of the appc schema(which rkt
uses for its manifest) and rkt apis, we pull those in as vendored
dependencies. The versions used are the same ones that rkt use in their
glide dependency configuration for version 1.28.0.
2017-09-21 00:34:22 +02:00
Lasse Dalegaard 7ac599d509 Use rkt prepare + run-prepared instead of run.
The rkt driver currently executes run and asks that the pod UUID is
written to a file that is then polled for changes for up to five
seconds. Many container fetches will take longer than this, so this
method will often not be able to track the pod UUID reliably.

To avoid this problem, rkt allows pods to be first prepared, which will
return their UUID, and then run as a second invocation.

Here we convert the rkt driver's Start method to use this method
instead. This way, the UUID will always be tracked correctly.
2017-09-21 00:17:31 +02:00
Michael Schurter f92ffe5af5 Merge pull request #3105 from hashicorp/f-876-restart-unhealthy
Restart unhealthy tasks
2017-09-17 19:38:32 -07:00
epipho a16c97394f Fix incorrect docker stats 2017-09-16 00:43:03 -04:00
Michael Schurter 67a4a169a9 Name const after what it represents 2017-09-15 14:57:18 -07:00
Michael Schurter 79a7bf3d7c Cleanup and test restart failure code 2017-09-15 14:54:37 -07:00
Michael Schurter 06ca379da0 Add comments 2017-09-15 14:34:36 -07:00
Michael Schurter 4dbaa52aba Fold SetFailure into SetRestartTriggered 2017-09-14 16:48:39 -07:00
Michael Schurter ed77c0944b DRY up restart handling a bit.
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter 73fb71ca10 RestartDelay isn't needed as checks are re-added on restarts
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter 06dd86adbd Remove unused lastStart field 2017-09-14 16:47:41 -07:00
Michael Schurter 0447f79288 Removed partially implemented allocLock 2017-09-14 16:47:41 -07:00
Michael Schurter ade29ecbed Improve check watcher logging and add tests
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter a137676358 Add comments and move delay calc to TaskRunner 2017-09-14 16:46:54 -07:00
Michael Schurter 8a87475498 Use existing restart policy infrastructure 2017-09-14 16:46:54 -07:00
Michael Schurter 22690c5f4c Add check watcher for restarting unhealthy tasks 2017-09-14 16:46:54 -07:00
Alex Dadgar d306da846c changelog and feedback 2017-09-14 14:08:58 -07:00
Alex Dadgar 07ed83fdd5 Non-locked accessors to common Node fields
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.

Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Komlo 536d38454b Merge pull request #3191 from hashicorp/b-tagged-metrics-panic
Fix panic in emitting tagged allocation metrics
2017-09-11 14:28:50 -04:00
Armon Dadgar d4aed839d2 Merge pull request #3185 from hashicorp/f-acl-reset
Add ability to reset ACL bootstrap process
2017-09-11 10:47:17 -07:00
Armon Dadgar 3d5ecaafff Address @dadgar feedback 2017-09-11 10:30:59 -07:00
Alex Dadgar b3958faa14 Merge pull request #3187 from hashicorp/b-windows-docker
Fix MemorySwappiness on Windows Docker
2017-09-11 09:56:49 -07:00
Alex Dadgar 1cd8f7523f Merge pull request #3184 from hashicorp/b-docker-logging
Fix docker user specified syslogging
2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo 848af92183 fix panic in emitting tagged metrics 2017-09-11 15:32:37 +00:00
Alex Dadgar d3a9463358 Fix MemorySwappiness on Windows Docker
Fixes https://github.com/hashicorp/nomad/issues/3181
2017-09-10 17:46:45 -07:00
Alex Dadgar 3ec7946b3e Fix invalid CPU stats on Windows
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420. The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
Alex Dadgar 637ae9580a Fix docker user specified syslogging 2017-09-10 14:57:48 -07:00
James Nugent 448145872f client: Guard against "NaN" values from floats
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Alex Dadgar 31f9e099d9 Merge pull request #3148 from clinta/purge-stopped
Always purge stopped containers
2017-09-05 17:18:05 -07:00
Alex Dadgar 6fdaf38389 Fix repo name passed to docker credential helpers
This PR fixes the server url passed to docker credential helpers and
fixes stderr capture.

Fixes https://github.com/hashicorp/nomad/issues/2957
2017-09-05 16:43:21 -07:00
Alex Dadgar 21564c7c04 Parse Docker mounts correctly (#3163)
* Parse Docker mounts correctly

This PR fixes the parsing of Docker mounts and adds testing to ensure no
regressions.

Fixes https://github.com/hashicorp/nomad/issues/3156

* Review feedback
2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo 0ef43c3c5f final code review fixups 2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo dea1fa089b fix up travis test failure via race condition 2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo a8cbd0b559 fixups from code review 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo f72e4aad13 labels depend on full setup of client beforehand 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo 87a814397d refactor to use baseLabels 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo b2953d905a pass in commonly used values 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo c634043069 create base labels to be used in every metric 2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo f5ea83da8d emit metrics using labels, add option for backwards compatibility 2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo 0175f80775 add metrics options to client config 2017-09-05 14:12:57 +00:00
Armon Dadgar b8bf35f087 ACL RPCs allow stale reads for scalability 2017-09-04 13:07:44 -07:00
Armon Dadgar f31cd6a618 client: fixing policy resolution after ACL endpoint enforcement 2017-09-04 13:05:53 -07:00
Armon Dadgar ddcc5f89bc Add ErrPermissionDenied, rename TokenNotFound 2017-09-04 13:05:53 -07:00
Armon Dadgar 76a03f2d8e Address @dadgar feedback 2017-09-04 13:05:53 -07:00
Armon Dadgar e3f32ca6f1 client: adding token resolution logic 2017-09-04 13:05:36 -07:00
Armon Dadgar 688897561b client: adding token cache for ACL resolution 2017-09-04 13:05:36 -07:00
Armon Dadgar c2e72e8a9c client: create ACL and Policy cache 2017-09-04 13:05:35 -07:00
Armon Dadgar 792f176a44 agent: thread ACL config to client 2017-09-04 13:04:45 -07:00
Clint Armstrong b5c2636313 Always purge stopped containers 2017-08-31 14:28:48 -04:00
Clint Armstrong 7e35ab6abb fix logging re-init 2017-08-30 12:36:31 -04:00
Michael Schurter 78823d559b Squelch logspam when unable to get disk usage stats
To reproduce logspam:

```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar 876732833f Merge pull request #3073 from clinta/docker-500
Allow retry of 500 API errors to be handled by restart policies
2017-08-24 16:57:36 -07:00
Alex Dadgar fd7d614ae4 Handle interfaces that only have link-local addrs
This PR changes the fingerprint handling of network interfaces that only
contain link local addresses. The new behavior is to prefer globally
routable addresses and if none are detected, to fall back to link local
addresses if the operator hasn't disallowed it. This gives us pre 0.6
behavior for interfaces with only link local addresses but 0.6+ behavior
for IPv6 interfaces that will always have a link-local address.

Fixes https://github.com/hashicorp/nomad/issues/3005

/cc diptanuc
2017-08-23 15:32:22 -07:00
Alex Dadgar 211a793530 resolve feedback 2017-08-23 14:17:00 -07:00
Alex Dadgar 653733e093 Clean up docker mounts 2017-08-22 14:12:44 -07:00
Clint Armstrong ae230395ba Allow retry of 500 API errors to be handled by restart policies 2017-08-22 14:04:46 -04:00
Michael Schurter 51a27cc83d Merge pull request #3031 from hashicorp/f-2924-consul-headers
Add Header and Method support for HTTP checks
2017-08-18 13:35:08 -07:00
Michael Schurter 7ebd429a86 Merge mistake made go fmt fail 2017-08-18 13:19:44 -07:00
Michael Schurter 5c015da3cb Merge pull request #3021 from clinta/docker-mount2
Expose docker mount options
2017-08-17 16:57:09 -07:00
Michael Schurter ff3944a981 Update and test service/check interpolation 2017-08-17 16:49:14 -07:00
Michael Schurter b4813747d0 Merge pull request #3043 from hashicorp/f-2441-shutdown-delay
Add optional shutdown delay to tasks
2017-08-17 14:37:48 -07:00
Michael Schurter c709251ed6 Lower ShutdownDelay for non-Travis testing 2017-08-17 14:23:42 -07:00
Michael Schurter b33b2fb4c0 Lower shutdown delay in test 2017-08-17 13:57:22 -07:00
Michael Schurter 0726ca75e3 Make shutdown delay log DEBUG, not INFO 2017-08-17 11:28:33 -07:00
Clint Armstrong f0460156ae restrict mount to volume type 2017-08-17 09:52:13 -04:00
Michael Schurter d529b422b2 Add optional shutdown delay to tasks
Fixes #2441

Defaults to 0 (no delay) for backward compat and because this feature
should be opt-in.
2017-08-16 17:59:46 -07:00
Alex Dadgar d6187cd3e8 Fix tests 2017-08-16 16:26:52 -07:00
Alex Dadgar 1a86aecf55 Add version package
This PR adds a version package and consolidates version strings into a
Version struct.
2017-08-16 15:44:21 -07:00
Alex Dadgar 3d69961c3a Must be root for TestAllocDir_CreateDir 2017-08-16 10:46:14 -07:00
Alex Dadgar 7dd86b5dfe Merge pull request #3025 from hashicorp/f-health-events
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar bb165b97ef comments 2017-08-15 12:23:29 -07:00
Michael Schurter 1126268a81 Fix formatting 2017-08-15 10:37:02 -07:00
Michael Schurter 74d5c272c6 Cleanup comments and return val 2017-08-14 16:59:03 -07:00
Michael Schurter 46b7fd45d7 spelling 2017-08-14 16:55:59 -07:00
Michael Schurter de8ea243b6 Return move errors from local Migrate like remote
Since alloc runner just logs these errors and continues there's no
reason not to return it.
2017-08-14 16:48:56 -07:00
Michael Schurter 7342e23669 Move migrating state into prevAllocWatcher 2017-08-14 16:02:28 -07:00
Alex Dadgar fdc0115427 test 2017-08-12 14:42:53 -07:00
Alex Dadgar 56801349eb Refactor health watcher and emit events 2017-08-12 14:23:36 -07:00
Michael Schurter 4601419d63 Soft fail on migration errors 2017-08-11 16:50:30 -07:00
Michael Schurter 3dbd764969 Exit if alloc listener closes
Add test for that case, add comments, remove debug logging
2017-08-11 16:22:02 -07:00
Michael Schurter b7915bdac7 Update tests for new blocking/migrating code 2017-08-11 16:21:57 -07:00
Michael Schurter ad6cec9e82 Set failed status instead of panic'ing
Fixup some TODOs and formatting left from new prevAllocWatcher code.
2017-08-11 16:21:35 -07:00
Michael Schurter e41a654917 switch from alloc blocker to new interface
interface has 3 implementations:

1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter ee04717a0b initial attempt at refactoring blocked/migrating 2017-08-11 16:21:35 -07:00
Michael Schurter ec6e6e6c66 Only set alloc status if it's not already terminal 2017-08-11 16:21:35 -07:00
Alex Dadgar 0d5127d5fc Merge pull request #3011 from hashicorp/b-cv-fix-TestEnvAWSFingerprint_aws
Updated AWS fingerprint test for ami-id
2017-08-11 10:58:22 -07:00
Alex Dadgar 2fdfd9af4a Merge pull request #2992 from decoomanj/master
Added dnsoptions to the docker driver
2017-08-11 10:12:36 -07:00
Charlie Voiselle 507c75bd16 Updated AWS fingerprint test for ami-id
In https://github.com/hashicorp/nomad/pull/2999, I changed ami-id
to non-unique.  This updates the test to reflect that.
2017-08-11 12:54:27 -04:00
Jan De Cooman 8b88d56c01 updated message in test 2017-08-11 09:24:15 +02:00
Alex Dadgar 1b061b8f47 Unmount task directories when alloc is terminal
This PR unmounts directories from tasks when the alloc is terminal
rather than when it is garbage collected.

/cc @angrycub
2017-08-10 13:28:17 -07:00
Alex Dadgar 6e20acb503 Merge pull request #2984 from hashicorp/b-tags
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar 6b238edc22 Merge pull request #3001 from hashicorp/f-template-events
Template emits events explaining why it is blocked
2017-08-10 13:00:58 -07:00
Alex Dadgar bd9f63d20e address comments 2017-08-10 13:00:06 -07:00
Clint Armstrong 9063b500e0 expose mount options to nomad 2017-08-10 12:37:17 -04:00
Alex Dadgar 83ba2f1814 Template emits events explaining why it is blocked
This PR does the following:
* Adds a mechanism to emit events in the TaskRunner
* Vendors a new version of Consul-Template that allows extraction of
missing dependencies
* Adds logic to our consul_template.go to determine missing events and
emit them in a batched fashion.
* Refactors the consul_template code to split the run method and take in
a config struct rather than many parameters.

Fixes https://github.com/hashicorp/nomad/issues/2578
2017-08-09 18:01:27 -07:00
Charlie Voiselle ae466eaaa7 AMI ID is potentally non-unique
Changed the keys map to reflect that.
2017-08-09 12:53:54 -04:00
Jan De Cooman 633bcee661 fixed typo 2017-08-09 14:44:38 +02:00
Jan De Cooman 804fc0d06f added dnsoptions to the docker driver 2017-08-09 13:30:06 +02:00
Alex Dadgar aba107be99 Merge pull request #2979 from lfarnell/cleanup
Code cleanup
2017-08-08 10:21:15 -07:00
Alex Dadgar 4f6f6a13c8 Emit generic task events 2017-08-07 21:26:04 -07:00
Alex Dadgar 79d25b7db9 Merge pull request #2947 from hashicorp/f-vault-grace
Allow template to set Vault grace
2017-08-07 16:29:53 -07:00
Alex Dadgar 93b9a1bf20 Rename runnerConfig 2017-08-07 16:29:42 -07:00
Alex Dadgar d86b3977b9 Fix alloc health with checks using interpolation
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Michael Schurter c76b3b54b9 Merge branch 'master' into fix-pending-state 2017-08-03 17:27:03 -07:00
Alex Dadgar 067a638478 Allow template to set Vault grace
This PR allows a template to specify the Vault grace duration.

Fixes https://github.com/hashicorp/nomad/issues/2922
2017-08-01 14:14:08 -07:00
Alex Dadgar 562ea52c8e vendor vault api 2017-08-01 09:30:55 -07:00
Michael Schurter 6243c9eb86 Merge pull request #2883 from kmalec/add-support-for-readonly-mount
rkt driver support for read-only volumes mounts
2017-07-31 10:58:22 -07:00
Alex Dadgar 010567dba8 Fix leaked plugin files for syslog server
This PR fixes a leaking of the unix socket used when launching a syslog
server for the Docker driver.

Fixes https://github.com/hashicorp/nomad/issues/2844
2017-07-30 17:51:38 -07:00
Alex Dadgar a9c786a4fe Make test Vault pick random ports 2017-07-25 17:40:59 -07:00
Michael Schurter b01dd31f26 Don't attempt to restore tasks that never sync'd 2017-07-24 15:58:46 -07:00
Alex Dadgar 031da7a21c fix vet 2017-07-22 22:43:33 -07:00
Alex Dadgar 0f3f1ea68b travis check fixes 2017-07-22 21:01:22 -07:00
Alex Dadgar c1a72d24e6 fingerprinters 2017-07-22 20:38:03 -07:00
Alex Dadgar 62c55c8fc9 fix slow resolve on mac 2017-07-22 19:58:30 -07:00
Alex Dadgar 72d055aa9c drop rkt deadline 2017-07-22 19:54:06 -07:00
Alex Dadgar 219fecc705 Merge branch 'master' of github.com:hashicorp/nomad 2017-07-22 19:48:54 -07:00
Alex Dadgar d760e68774 darwin test fixes 2017-07-22 19:48:47 -07:00
Alex Dadgar 553bc91725 Parallel client tests (#2890)
* alloc_runner

* Random tests

* parallel task_runner and no exec compatible check

* Parallel client

* Fail fast and use random ports

* Fix docker port mapping

* Make concurrent pull less timing dependant

* up parallel

* Fixes

* don't build chroots in parallel on travis

* Reduce parallelism on travis with lxc/rkt

* make java test app not run forever

* drop parallelism a little

* use docker ports that are out of the os's ephemeral port range

* Limit even more on travis

* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar b6f0782732 typo 2017-07-22 12:55:30 -07:00
Alex Dadgar 8cf9d15b01 typo 2017-07-22 12:33:07 -07:00
Alex Dadgar 9e9c20ca77 small fixes 2017-07-22 12:25:02 -07:00
Alex Dadgar 5a3df2ed89 Merge pull request #2888 from hashicorp/b-fix-allocrunner-test
Fix TestAllocRunner_TaskLeader_StopTG and unrelated races
2017-07-22 11:44:04 -07:00
Alex Dadgar 46c8bec9b0 faster vaultclient 2017-07-21 19:38:37 -07:00
Michael Schurter d840fc8c95 Fix tr race by not sharing alloc/task
prestart only needs the original alloc/task so pass their pointers in.
Task updates may concurrently replace the pointer on tr.
2017-07-21 16:17:42 -07:00
Michael Schurter a22cfa8387 Minor test race fix 2017-07-21 16:17:23 -07:00
Michael Schurter 9a7a1d8c13 Fix race by not accessing tr.task from ar 2017-07-21 16:16:53 -07:00
Michael Schurter 2e9a1e3fa6 Remove unneeded saveTaskRunnerState method
Collapse it into the one place it's called
2017-07-21 16:16:02 -07:00
Michael Schurter 996ce9286e Fix test race by locking around ar.tasks access 2017-07-21 14:25:51 -07:00
Michael Schurter 8d1d8eac46 Fix handle race 2017-07-21 14:00:32 -07:00
Michael Schurter 5f40901422 Fix more test races 2017-07-21 14:00:21 -07:00
Michael Schurter b9ba447399 Fixup a few more even rarer test races 2017-07-21 13:43:32 -07:00
Michael Schurter 38cb2021dd Always interpolate task before calling with Consul
Also switch to returning a copy of the task to avoid races between
altering the Task and persitence.
2017-07-21 13:37:16 -07:00
Michael Schurter 6e80a8ee39 Fix TestAllocRunner_TaskLeader_StopTG
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar e509661cf9 executor and logging pkg 2017-07-21 12:14:54 -07:00
Alex Dadgar 7c433a1767 Parallel 2017-07-21 12:06:39 -07:00
Karel Malec 4b98f94a88 Allow rkt driver to mount volumes read-only 2017-07-21 13:05:15 +02:00
Alex Dadgar 56f9cf86df Speed up client startup 2017-07-20 22:34:24 -07:00
Michael Schurter 0d7f7e2b9d Merge pull request #2878 from hashicorp/b-save-state
Fix state handling on restart
2017-07-20 17:16:46 -07:00
Karel Malec cf985f011c Pass task group name as NOMAD_GROUP_NAME environment variable 2017-07-21 01:22:54 +02:00
Alex Dadgar 09c8ee621b Destroy tasks that are part of terminal alloc 2017-07-20 12:02:04 -07:00
Michael Schurter 9a7f649e56 Don't save task runner state if it is destroyed 2017-07-20 10:17:41 -07:00
Alex Dadgar 64776b1370 Should not persist state after alloc_runner is garbage collected 2017-07-19 17:31:30 -07:00
Michael Schurter c1b8bef813 Use broadcast send retry logic everywhere 2017-07-18 14:36:32 -07:00
Alex Dadgar d2381c9263 Merge pull request #2853 from hashicorp/b-watcher
Improve alloc health watcher
2017-07-18 14:12:28 -07:00
Alex Dadgar bd43bd509c Save deployment status 2017-07-18 12:37:52 -07:00
Alex Dadgar 41f67e3535 Small fixes 2017-07-18 12:19:57 -07:00
Michael Schurter c24e73ede7 Fix deadlock caused by syncing during destroy
When replacing an alloc the new alloc is blocked until the old alloc is
destroyed. This could cause a deadlock:

1. Destroying the old alloc includes a final sync of its status
2. Syncing status causes a GC
3. A GC looks for terminal allocs to cleanup
4. The GC waits for an alloc to stop completely before GC'ing

If the GC chooses the currently-being-destroyed-alloc to GC, the GC
deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged
until the Nomad process is restarted.

Performing the final sync asynchronously is an ugly hack but prevents
the deadlock by allowing the final sync to occur after the alloc runner
has shutdown and been destroyed.
2017-07-18 11:12:56 -07:00
Michael Schurter 420be86e39 Test AllocDir.Copy 2017-07-17 15:46:54 -07:00
Michael Schurter cdb2e96d99 Add AllocRunner.allocID for ease-of-use
Since the AllocRunner.alloc struct can be mutated, most of AllocRunner
needs to acquire a lock to get the alloc's ID. Log lines always need to
include the alloc ID, so we often skipped acquiring a lock just to grab
the ID and accepted the race.

Let's make the race detector a little happier by storing the ID in a
single assignment field.
2017-07-17 15:46:54 -07:00
Michael Schurter 181fda825a Fix log level 2017-07-17 15:46:54 -07:00
Michael Schurter 98f6e7f10f Don't fail if task dirs don't exist on creation
Task dir metadata is created in AllocRunner.Run which may not run before
an alloc is sync'd and Nomad exits. There's no reason not to just create
task dir metadata on restore if it doesn't exist.
2017-07-17 15:46:54 -07:00
Michael Schurter 51515cbe0c Ensure allocDir is never nil and persisted safely
Fixes #2834
2017-07-17 15:46:54 -07:00
Alex Dadgar 0821ee67f5 Fix alloc broadcaster panic on double close 2017-07-17 14:09:05 -07:00
Michael Schurter 0a6bf87365 Fix nil panic in Docker error condition
Fixes #2835

Yet another bug caused by overwriting container and then trying to
reference container.ID in the err handling block. Did a quick audit of
docker.go and it seems to be the last offender. See #2804 for previous
bug.
2017-07-14 10:48:19 -07:00
Michael Schurter e9a416b731 Merge branch 'master' into fix-pending-state 2017-07-10 10:43:23 -07:00
unknown 26b16fa3ce #2563 fixed pending state for allocations with terminal status 2017-07-09 16:18:06 +03:00
Alex Dadgar 05894f4611 Small fixes 2017-07-07 17:34:50 -07:00
Michael Schurter fecb16cfb2 Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername
Propagate vault.tls_server_name to consul-template
2017-07-07 16:44:19 -07:00
Michael Schurter 95a9a5da71 Merge pull request #2787 from hashicorp/f-docker-test-mac
Test #2652 - Docker MAC Address option
2017-07-07 16:22:10 -07:00
Michael Schurter 4be4df21c9 Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip
Add driver.docker.bridge_ip node attribute
2017-07-07 16:20:20 -07:00
Michael Schurter 94389c3ecc Remove debug logging 2017-07-07 16:19:42 -07:00
Michael Schurter 5e3e3818db Merge pull request #2804 from hashicorp/b-2802-docker-panic
Don't panic in container list/remove/inspect race
2017-07-07 15:35:51 -07:00
Michael Schurter 67a7b0eac9 Don't panic in container list/remove/inspect race
Fixes #2802

While it's hard to reproduce the theoretical race is:

1. This goroutine calls ListContainers()
2. Another goroutine removes a container X
3. This goroutine attempts to InspectContainer(X)

However, this bug could be hit in the much simpler case of
InspectContainer() timing out.

In those cases an error is returned and the old code attempted to wrap
the error with the now-nil container.ID. Storing the container ID fixes
that panic.
2017-07-07 15:10:59 -07:00
Alex Dadgar bf97a2455c Vet and small improvement on watcher failure detection 2017-07-07 14:53:01 -07:00
Alex Dadgar 45712c6ca3 test fixes 2017-07-07 14:11:27 -07:00
Alex Dadgar ade9a7c768 @jippi Changed my mind! Good suggestion 2017-07-07 12:12:48 -07:00
Alex Dadgar c063eba836 Warn log 2017-07-07 12:10:04 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar 001058227e watcher per alloc 2017-07-07 12:07:08 -07:00
Alex Dadgar 2e2fd26bed Update index 2017-07-07 12:07:08 -07:00
Alex Dadgar ecee5e370e initial watcher 2017-07-07 12:07:08 -07:00
Alex Dadgar c77944ed29 assign names 2017-07-07 12:03:11 -07:00
Michael Schurter 084dd384c1 Add driver.docker.bridge_ip node attribute
Fixes #2785
2017-07-07 10:14:10 -07:00
Michael Schurter d38d48151a Propagate vault.tls_server_name to consul-template
Fixes #2776
2017-07-06 16:56:50 -07:00
Michael Schurter 39edf23fd5 Merge pull request #2786 from hashicorp/f-docker-auth-soft-fail
Default to auth hard fail but optionally soft fail
2017-07-06 13:25:56 -07:00
Michael Schurter bae1b7db2d Test #2652
Also cleanup docker config opts docs
2017-07-06 12:46:25 -07:00
Michael Schurter 8f4353779a Merge branch 'master' into master 2017-07-06 12:09:36 -07:00
Michael Schurter 2900f941b5 Default to auth hard fail but optionally soft fail 2017-07-06 11:35:34 -07:00
Michael Schurter 08b452adf5 Merge pull request #2781 from hashicorp/f-2678-getter-mode
Add support for go-getter modes
2017-07-06 11:06:40 -07:00
Michael Schurter b000bb8598 Merge pull request #2744 from aep/master
Do not fail when no docker registry auth is available
2017-07-06 11:04:11 -07:00
Michael Schurter 0d3bdf7210 Add support for go-getter modes
Fixes #2678
2017-07-06 10:45:44 -07:00
Michael Schurter 644f0cfaa4 Consistently quote alloc ids in client logs 2017-07-06 10:24:52 -07:00
Michael Schurter 4fd9ef6a8c Tiny client race condition fix
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter 8e2e26c607 rkt: use %s instead of %q when interpolating env
Fixes #2686
2017-07-05 09:36:17 -07:00
Michael Schurter b2382f99f2 0 compute == error 2017-07-03 14:51:02 -07:00
Michael Schurter ecf090e980 Fix cpu_total_compute override 2017-07-03 14:51:02 -07:00
Michael Schurter 2d741c770b Merge pull request #2732 from hashicorp/b-persist-alloc-updates
Persist Alloc when EvalID changes
2017-07-03 14:46:43 -07:00
Michael Schurter 56a6f8ca8a Merge pull request #2763 from hashicorp/f-bad-state-help
Add more logging to restore state errors
2017-07-03 14:45:03 -07:00
Michael Schurter 9d4b0651ef Merge pull request #2753 from hashicorp/b-leader-dies-first
Destroy task group leader first
2017-07-03 14:38:04 -07:00
Michael Schurter 6e7cc3964e Merge pull request #2709 from hashicorp/f-advertise-docker-ips
Advertise driver-specific addresses
2017-07-03 14:04:12 -07:00
Michael Schurter 5ec52ec24a Destroy task group leader first
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.

This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter 596727230b Suggest wiping out alloc dir too 2017-07-03 12:29:21 -07:00
Michael Schurter 11f68bfca2 Add more logging to restore state errors 2017-07-03 11:58:41 -07:00
Arvid E. Picciani aa4f029f10 Do not fail when no docker registry auth is available
this amends the behaviour introduced with #2651
and allows pulling public images when docker.auth.helper is set
2017-06-27 11:11:18 +02:00
Michael Schurter 8fcf866a7d Fix some tests still expecting reverted behavior 2017-06-23 16:51:38 -07:00
Michael Schurter e81252ba45 Default no_host_uuid to true instead of false
The host UUID isn't unique in many virtualized cases and of dubious
value even when it is univerally unique. Default to a random UUID.
2017-06-23 16:23:01 -07:00
Michael Schurter 5a274e6683 Style and comments 2017-06-23 15:20:04 -07:00
Michael Schurter cff8546035 Fix spelling & re-add immutable state struct 2017-06-23 13:01:39 -07:00
Michael Schurter d359d3b554 Rename immutable -> alloc
meh; naming is hard
2017-06-23 10:58:36 -07:00
Michael Schurter af2fc0f1bc Persist Alloc when EvalID changes 2017-06-22 17:33:12 -07:00
Michael Schurter f3a6ddc57d Remove DRIVER env vars
Also make NOMAD_ADDR_* use host ip:port for consistency. NOMAD_PORT_*
varies based on port map and the driver IP isn't exposed as an env var
as the only place it can be used is in script checks anyway.
2017-06-21 17:19:08 -07:00
Michael Schurter 0633d0c286 Have Qemu return PortMap 2017-06-21 17:19:08 -07:00
Michael Schurter 38a0695687 Simplify Docker Networks processing 2017-06-21 17:19:08 -07:00
Michael Schurter fec83b271a Bump error log level 2017-06-21 17:19:08 -07:00
Michael Schurter 8d677bc6b9 Fix lxc tests 2017-06-21 17:19:08 -07:00
Michael Schurter 8d440b1675 Skip DRIVER env vars for labels without a port mapping 2017-06-21 17:19:08 -07:00
Michael Schurter c0eff81383 Fix Service.AddressMode changes during task updates 2017-06-21 17:19:08 -07:00
Michael Schurter 67d154a274 Test driver network advertisement and checks 2017-06-21 17:19:08 -07:00
Michael Schurter b9bfb84b53 Implement DriverNetwork and Service.AddressMode
Ideally DriverNetwork would be fully populated in Driver.Prestart, but
Docker doesn't assign the container's IP until you start the container.

However, it's important to setup the port env vars before calling
Driver.Start, so Prestart should populate that.
2017-06-21 17:19:08 -07:00
Hynek Schlawack 59ab34c264 Fix typos 2017-06-16 16:10:12 +02:00
Michael Schurter b69e060071 Log PID when sending signals 2017-06-12 11:11:36 -07:00
Michael Schurter ffb417a300 Merge pull request #2697 from hashicorp/b-port-map
Fix port map interpolation for docker
2017-06-09 13:29:36 -07:00
Michael Schurter a3827d2cc6 Fix bad merge conflict resolution 2017-06-09 10:40:47 -07:00
Michael Schurter eabd6759c6 Merge branch 'master' into add-no-overlay-option 2017-06-09 09:59:35 -07:00
Alex Dadgar 5ba2662b30 Merge pull request #2687 from mmickan/issue-2685
Include symlinks in snapshots when migrating disks
2017-06-08 13:35:46 -07:00
Michael Schurter 784d69789e Merge branch 'master' into add-no-overlay-option 2017-06-08 13:15:56 -07:00
Alex Dadgar 7695e636d5 Fix port map interpolation for docker
This PR fixes an issue in which the value of the portmap could not be
interpolated.

Fixes https://github.com/hashicorp/nomad/issues/2680
2017-06-08 13:12:32 -07:00
Karel Malec b55f4bf601 Fix backticks in docs; refine --debug comment 2017-06-07 21:11:22 +02:00
Karel Malec a258a803f2 Added insecure_options config list 2017-06-07 09:58:42 +02:00
Karel Malec 1957e9dfa6 Add a no_overlay option for the rkt task config. 2017-06-07 00:17:33 +02:00
Mark Mickan c196d320f8 Add tests for migrating symlinks in alloc and local directories 2017-06-04 15:56:22 +09:30
Mark Mickan 236f24c9a4 Include symlinks in snapshots when migrating disks
Fixes #2685
2017-06-04 00:36:18 +09:30
Michael Schurter d1dd380890 Switch to hashicorp/go-envparse 2017-06-02 15:58:52 -07:00
Michael Schurter a552bcdb55 Move env file parsing to a library 2017-06-02 15:03:27 -07:00
Alex Dadgar 3b46fe136f small cleanup 2017-05-31 15:56:54 -07:00
Alex Dadgar 8d6e28ace8 Merge branch 'master' into feature/2334 2017-05-31 14:27:07 -07:00
Alex Dadgar 044f1da5ff Merge pull request #2681 from hashicorp/b-deadlock
Fix a deadlock relating to blocked allocations
2017-05-31 14:26:54 -07:00
Alex Dadgar ec9cb2c751 Merge pull request #2672 from eyberg/master
dont throw away errors in log rotation
2017-05-31 14:14:22 -07:00
Alex Dadgar b1eea2269a Fix deadlock 2017-05-31 14:05:47 -07:00
Michael Schurter cb568a5cf6 Cleanup lots of leaked alloc runners in tests 2017-05-31 11:39:50 -07:00
Ulrik Mikaelsson 6138564f00 Implement support for docker-credential-helpers
Solves: #2334
2017-05-31 12:45:02 +02:00
Michael Schurter ffc2b36dc7 Merge pull request #2636 from hashicorp/f-gc-alloc-limit
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter dd51aa1cb9 Merge pull request #2654 from hashicorp/f-env-consul
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Michael Schurter e1a7c2d6d7 Fix Error -> Errorf 2017-05-30 12:08:59 -07:00
Michael Schurter 53d713bacb Fix getter tests
And use an interface for ReplaceEnv since its all getter needs.
2017-05-26 16:52:47 -07:00
Michael Schurter 51d8231911 Fix executor tests 2017-05-26 16:46:03 -07:00
Michael Schurter 3184616936 Always use PATH-only env for rkt commands 2017-05-26 15:41:26 -07:00