Alex Dadgar
bec9a72eec
Remove networking from basic resources
2018-01-12 14:33:42 -08:00
Charlie Voiselle
867bb6f7f9
Found more priviledge.
...
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Alex Dadgar
9e1e04c6f1
Merge pull request #3727 from filipochnik/fix-gh-2832
...
Recognize renewing non-renewable Vault lease as fatal
2018-01-10 11:47:10 -08:00
Michael Schurter
189ce7f991
Merge pull request #3723 from hashicorp/b-3702-chown-dirs
...
chown dirs when migrating ephemeral_disk data
2018-01-09 09:27:26 -08:00
Michael Schurter
e6c27256b7
Test streamed directory ownership
2018-01-08 16:00:07 -08:00
Michael Schurter
2c79ffb213
chown dirs when migrating ephemeral_disk data
...
Fixes #3702
Added missing chown call and made it conditional on running as root and
not on Windows as we do with files.
2018-01-08 15:31:12 -08:00
Charlie Voiselle
1bb1ab5069
fix typo
...
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Chelsea Holland Komlo
214d128eb9
reload raft transport layer
...
fix up linting
2018-01-08 14:52:28 -05:00
Filip Ochnik
d265e11c36
Recognize renewing non-renewable Vault lease as fatal
2018-01-08 20:32:31 +01:00
Chelsea Holland Komlo
0708d34135
call reload on agent, client, and server separately
2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo
9741097406
reloading tls config should be atomic for clients/servers
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
ae7fc4695e
fixups from code review
...
Revert "close raft long-lived connections"
This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.
reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
acd3d1b162
fix up downgrading client to plaintext
...
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
c0ad9a4627
add ability to upgrade/downgrade nomad agents tls configurations via sighup
2018-01-08 09:21:06 -05:00
Michael Schurter
ef76c65da1
Lookup euid outside of loop
2017-12-13 11:50:12 -08:00
Michael Schurter
5032bf4f5a
Skip tests that require root when not root
...
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar
f0b0697b57
Keyify struct
2017-12-11 17:23:14 -08:00
Michael Schurter
c4d4ead199
Fix test broken by mock updates
2017-12-08 16:45:25 -08:00
Michael Schurter
4b20441eef
Validate port label for host address mode
...
Also skip getting an address for script checks which don't use them.
Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter
30dd570061
Fix interpolation bug with service/check updates
...
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
2017-12-08 12:03:00 -08:00
Michael Schurter
4347026f83
Test Consul from TaskRunner thoroughly
...
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Alex Dadgar
a0d6b6a121
Merge pull request #3630 from hashicorp/b-periodic
...
Handle race between fingerprinters and registration
2017-12-07 16:11:13 -08:00
Alex Dadgar
91ffbbb517
Review feedback
2017-12-07 16:10:57 -08:00
Chelsea Komlo
c8e0cb3044
Merge pull request #3591 from hashicorp/b-1755-stop
...
Allow controlling the stop signal for drivers
2017-12-07 17:06:43 -05:00
Alex Dadgar
02baa6c52b
Handle race between fingerprinters and registration
2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo
61fa8ad4ba
code review fixes
2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo
77ab41124b
set default kill signal on executor shutdown
2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo
6cae8fe6e6
extend configurable kill signal to java driver
2017-12-07 11:40:10 -05:00
Alex Dadgar
4409fdacc0
Drop trace logging
2017-12-06 18:02:24 -08:00
Alex Dadgar
cd9a7f14b8
Add logging around heartbeats
2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo
350319239c
change location of default kill signal
2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo
7dfb64f941
extract signal helper into utils
2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo
b08611cfac
move kill_signal to task level, extend to docker
2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo
80de7d5ebd
allow controlling the stop signal in exec/raw_exec
2017-12-06 11:28:45 -05:00
Chelsea Komlo
9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
...
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Michael Schurter
b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
...
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo
4463dc607e
fix up test
2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo
7284f2385a
remove unused user option
2017-12-04 18:01:31 -05:00
Michael Schurter
6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
...
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo
7c74968452
add ability to specify user for rkt
2017-12-04 14:21:48 -05:00
Michael Schurter
2bf1d6d85e
rkt: Don't require port_map with host networking
...
Also don't try to return a DriverNetwork with host networking. None will
ever exist as that's the point of host networking: rkt won't create a
network namespace.
2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo
4ee2122536
get KillTimeout in seconds, not nanoseconds
2017-12-01 10:43:00 -05:00
Michael Schurter
5e975bbd0f
Add comment and normalize err check ordering
...
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter
d996c3a231
Check for error file when receiving snapshots
2017-11-29 17:26:11 -08:00
Michael Schurter
ca946679f6
Destroy partially migrated alloc dirs
...
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Schurter
23c66e37c5
Handle errors during snapshotting
...
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo
2208964948
Support StopTimeout for Docker tasksw
...
Update github.com/fsouza/go-dockerclient
2017-11-29 14:33:05 -05:00
Preetha Appan
6ad65c51e6
Missed assert in one place
2017-11-20 13:04:38 -06:00
Preetha Appan
747bd59daa
Better error validation, and added test case for invalid sysctl inputs
2017-11-20 12:07:18 -06:00
Preetha Appan
c68973747b
Address some review comments
2017-11-20 11:15:09 -06:00
Preetha Appan
39ef9ee76d
Fix gofmt warnings
2017-11-18 09:23:09 -06:00
Preetha Appan
e53dd15f58
Fix test compilation after rebase
2017-11-17 17:46:04 -06:00
Samuel BERTHE
0fca2e19c8
review(docker driver): sysctls -> sysctl + ulimits -> ulimit
2017-11-17 16:30:45 -06:00
Samuel BERTHE
6c93922cb7
Oops
2017-11-17 16:14:14 -06:00
Samuel BERTHE
c8363bc44b
💄
2017-11-17 16:03:22 -06:00
Samuel BERTHE
281ab90484
test(docker driver): testing sysctls and ulimits
2017-11-17 16:03:22 -06:00
Samuel BERTHE
b9a10ff7fa
feat(docker driver): adds sysctls and ulimits configs
2017-11-17 16:03:22 -06:00
Alex Dadgar
69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
...
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter
3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
...
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter
737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
...
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter
437fce9954
Improve rktRemove error message
2017-11-16 15:45:14 -08:00
Michael Schurter
3ceec0caab
Remove rkt pods when exiting
...
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle
7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
...
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle
969ddf9c2a
Lowered to DEBUG from AD feedback
2017-11-16 14:13:03 -05:00
Alex Dadgar
05b1588cea
Only publish metric when the task is running and dev mode publishes metrics
2017-11-15 13:21:06 -08:00
Alex Dadgar
07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
...
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar
97ec3974a9
Use interface attached to default route
2017-11-15 11:32:32 -08:00
Michael Schurter
f86f0bd9ea
Handle leader task being dead in RestoreState
...
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932
While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*
The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle
1197637251
Dropped loglevel for AWS fingerprinter env reads
...
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally). Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo
2dfda33703
Nomad agent reload TLS configuration on SIGHUP ( #3479 )
...
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
3023336b39
Add a test demonstrating the bug
...
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar
ee31e15f51
Better interface selection heuristic
...
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.
Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan
926c9ed997
Make device mounting unit test verify configuration via docker inspect
2017-11-13 09:56:54 -06:00
Preetha Appan
dc2d5fb5a4
Unit test (linux only) that tests mounting a device in the docker driver
2017-11-13 09:56:54 -06:00
Preetha Appan
4834710e45
Add default value for cgroup permissions for device if not set
2017-11-13 09:56:54 -06:00
Preetha Appan
9cdee6991c
Remove unnecessary check since validate method already checks this
2017-11-13 09:56:54 -06:00
Preetha Appan
110c1fd4f0
Add support for passing device into docker driver
2017-11-13 09:56:54 -06:00
Alex Dadgar
d1358ec1b6
alway load all templates
2017-11-10 12:35:51 -08:00
Alex Dadgar
a3ea0c17a0
Handle multiple environment templates
...
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar
b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
...
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter
690b8f4cfb
Remove noisy log line
...
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer
11e2870875
Qemu driver: clean up logging; fail unsupported features on Windows
2017-11-03 15:40:20 -07:00
Alex Dadgar
6034916ad1
fix spelling mistake
2017-11-03 15:04:59 -07:00
Alex Dadgar
a23033932a
Merge pull request #3459 from multani/docker-oom-notification
...
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer
cef9ba9770
Qemu driver: tweaks in response to PR feedback
...
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan
0eaef09675
Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.
2017-11-03 10:10:06 -05:00
Preetha Appan
5f09c968b3
Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.
2017-11-03 09:13:01 -05:00
Alex Dadgar
b4af10edde
Alloc Runner doesn't panic on restoration.
2017-11-02 16:14:13 -07:00
Alex Dadgar
abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
...
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter
eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
...
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury
cb68889652
Added the node_id as a tag
2017-11-02 13:29:10 -07:00
Alex Dadgar
701f462d33
remove atlas
2017-11-02 11:27:21 -07:00
Michael Schurter
fc33c945be
Make unable-to-gc log level adaptive
...
WARNing when someone has over 50 non-terminal allocs was just too
confusing.
Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:
```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury
8a9d0d40b1
Added support for tagged metrics
2017-11-02 10:07:57 -07:00
Diptanu Choudhury
5f522c6de3
Incrementing the start counter when we are actually starting a container
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
44535e5d10
Recording counter for dead allocs properly
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
0b34e811b7
Added metrics to track task/alloc start/restarts/dead events
2017-11-02 09:51:20 -07:00
Matt Mercer
00f90323c2
Qemu driver: defer cleanup sooner
2017-11-01 17:37:43 -07:00
Matt Mercer
43256af5f3
Qemu driver: clean up test logging; retry integration test for longer
2017-11-01 17:21:56 -07:00
Matt Mercer
b1145705d3
Use strings.Replace() instead of custom function
2017-11-01 15:31:35 -07:00
Matt Mercer
d51d174fa0
Qemu driver: basic testing of graceful shutdown feature
2017-11-01 15:31:30 -07:00
Matt Mercer
c26013ea0b
Qemu driver: include PIDs in log output
2017-11-01 15:31:24 -07:00
Matt Mercer
38d9a391aa
Qemu driver: ensure proper cleanup of resources
2017-11-01 15:31:20 -07:00
Matt Mercer
46f7e2fa4c
Qemu driver: minor logging fixes
2017-11-01 15:31:14 -07:00
Matt Mercer
4afb9dfa2d
Standardize driver.qemu logging prefix
2017-11-01 15:30:44 -07:00
Matt Mercer
5127e75569
Qemu driver: add graceful shutdown feature
2017-11-01 15:30:36 -07:00
Michael Schurter
1769db98b7
Fix regression by returning error on unknown alloc
2017-11-01 15:16:38 -05:00
Michael Schurter
9f26b9a403
Fix race in test
2017-11-01 15:16:38 -05:00
Michael Schurter
73e9b57908
Trigger GCs after alloc changes
...
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter
2a81160dcd
Fix GC'd alloc tracking
...
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar
c710550551
fix test
2017-10-30 12:35:31 -07:00
Alex Dadgar
4831380e57
Node access is done using locked Node copy
...
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Jonathan Ballet
5429d1c656
docker: changed OOM killed error message
2017-10-27 20:30:52 +02:00
Jonathan Ballet
12615bde9c
docker: log that a container has been killed by the OOM killer
...
Fix : #2203 (at least for Docker tasks)
2017-10-27 18:05:27 +02:00
Alex Dadgar
f117eb28c7
go style vars
2017-10-25 10:49:34 -07:00
Alex Dadgar
3f8495dd0e
fix two flaky tests
2017-10-23 18:15:52 -07:00
Alex Dadgar
cb0d0ef009
move to consul freeport implementation
2017-10-23 16:51:40 -07:00
Alex Dadgar
dbc014b360
Standardize retrieving a free port into a helper package
2017-10-23 16:48:20 -07:00
Alex Dadgar
4a69e1ad15
don't double parallel
2017-10-23 16:48:06 -07:00
Alex Dadgar
96ca2bbe4c
respond to comments
2017-10-23 15:50:27 -07:00
Alex Dadgar
99c81b5848
Skip if no docker
2017-10-19 16:55:10 -07:00
Alex Dadgar
593536664e
fix flaky java tests
2017-10-19 16:49:57 -07:00
Alex Dadgar
4bc452b479
Undo darwin user setting
2017-10-19 16:49:57 -07:00
Alex Dadgar
c7c6964313
Run as user on mac
2017-10-19 16:49:57 -07:00
Alex Dadgar
55a1dffa2f
sudo docker works
2017-10-19 16:49:57 -07:00
Alex Dadgar
805e7b3b62
docker tests
2017-10-19 16:49:57 -07:00
Michael Schurter
797f49702e
Add logging around moby/moby#32648 bug
2017-10-18 10:44:03 -07:00
Michael Schurter
22ac450b2f
Properly fail rkt fingerprinting on old vesions
2017-10-16 13:58:58 -07:00
Michael Schurter
d7732c1a58
Squelch repeated rkt version warnings
2017-10-16 12:09:47 -07:00
Michael Schurter
b5fd075d74
Test fixes from #3383
2017-10-13 15:45:35 -07:00
Michael Schurter
b63eee17e9
Merge pull request #3383 from hashicorp/b-migrate-token
...
base64 migrate token
2017-10-13 13:46:54 -07:00
Michael Schurter
dfd2967cdb
Merge pull request #3376 from hashicorp/f-node-acls
...
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter
15b991e039
base64 migrate token
...
HTTP header values must be ASCII.
Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Alex Dadgar
85178d6048
rkt remove allocid
2017-10-13 10:07:50 -07:00
Adam Stankiewicz
cefbc72b49
Remove AllocID from ExecutorContext
2017-10-13 17:07:49 +02:00
Michael Schurter
4a70d4356a
Alloc watcher must send Node.SecretID as AuthToken
...
An auth token is required if ACLs are enabled
2017-10-12 16:38:02 -07:00
Michael Schurter
84d8a51be1
SecretID -> AuthToken
2017-10-12 15:16:33 -07:00
Michael Schurter
59ff94cd71
Don't panic on unexpeced Consul response
...
Fixes #3326
2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo
e1c4701a43
fix up build warnings
2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo
b018ca4d46
fixing up code review comments
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
a77e462465
add tests for functionality
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
410adaf726
Add functionality for authenticated volumes
2017-10-11 17:09:20 -07:00
Alex Dadgar
6d3d0a9391
Nomad UI Command
2017-10-09 23:01:55 -07:00
Michael Schurter
f788974f8a
Merge pull request #3288 from simar7/qemu-improvements
...
qemu: Add bound checks for memory assignment
2017-10-02 14:47:05 -07:00
Simarpreet Singh
d801584c46
qemu: Fix lower memory bound to 128M
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 14:29:44 -07:00
Simarpreet Singh
10d7d6dab0
gofmt: format qemu.go and qemu_test.go
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 13:16:48 -07:00
Michael Schurter
a66c53d45a
Remove `structs` import from `api`
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Michael Schurter
77f1fe40e7
Properly autodetect Docker IP in Windows
...
Our Docker network plugin autodetection code was erroneously treating
Window's default network `nat` as a plugin and defaulting to it instead
of the host.
Fixes #3218
2017-09-27 16:49:23 -07:00
Michael Schurter
a8a87af7ed
Only build rkt driver on linux
...
Build stub for non-linux targets
2017-09-27 14:21:45 -07:00
Simarpreet Singh
3d99e71de8
qemu: Add bound checks for memory assignment
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-09-26 21:07:48 -07:00
Michael Schurter
d7229ce6c5
Merge pull request #3256 from dalegaard/master
...
Enable rkt driver to use address_mode = 'driver'
2017-09-26 18:04:37 -05:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Lasse Dalegaard
9f584d1114
Ignore rkt network failure if container died early
...
If the container dies before the network can be read, we now ignore the
error coming out of the network information polling loop. Nomad will
restart the task regardless, so we might be masking the actual error.
The polling loop for the rkt network information, inside the `Start`
method, was getting a bit unwieldy. It's been refactored out so it's not
a seperate function.
2017-09-27 00:15:27 +02:00
Lasse Dalegaard
b43ec57c02
Make rkt port mapping test not exit immediately
...
The rkt port mapping test currently starts redis with --version, which
obviously makes redis exit again almost immediately. This means that the
container exists before the network status can be queried, and so the
test fails.
2017-09-26 23:10:24 +02:00
Lasse Dalegaard
17d155d316
Improve rkt driver network status poll loop
...
The network status poll loop will now report any networks it ignored, as
well as a no-networks situations.
2017-09-26 21:49:45 +02:00
Lasse Dalegaard
bafd32fda0
Refactor rkt network status loop
...
The network status poll loop for the rkt drivers `Start` method was a
bit messy, and could not display the last encountered error. Here we
clean it up.
2017-09-26 21:27:12 +02:00
Lasse Dalegaard
5e9e2b07bd
Small logging fix in rkt/driver
2017-09-26 19:36:13 +02:00
Lasse Dalegaard
3d25fd3b00
Bump minimum rkt version to 1.27.0.
...
The changes introduces in #3256 require at least rkt 1.27.0 because of
a bug in the JSON output of `rkt status` in previous versions.
Here we upgrade all references to rkt's minimum version, and also make
travis and vagrant use this version when running tests.
Finally we add a CHANGELOG notice.
2017-09-26 19:15:43 +02:00
Lasse Dalegaard
f55f2b8f24
Turn rkt network status failure into Start failure
...
If the rkt driver cannot get the network status, for a task with a
configured port mapping, it will now fail the Start() call and kill the
task instead of simply logging. This matches the Docker behavior.
If no port map is specified, the warnings will be logged but the task
will be allowed to start.
2017-09-26 10:20:57 +02:00
Lasse Dalegaard
55a2e60e1a
Test for rkt driver setting DriverNetwork
...
To test that the rkt driver correctly sets a DriverNetwork, at least
when a port mapping is requested, we amend the
TestRktDriver_PortsMapping test with a small check.
2017-09-26 09:10:50 +02:00
Lasse Dalegaard
2d307d5beb
Discard errors from rkt status and cat-manifest
...
Since we don't actually show these errors anywhere, just discard them
right away.
2017-09-26 09:05:47 +02:00
Chelsea Holland Komlo
b26454cf99
Move setGaugeForAllocationStats to emitClientMetrics
2017-09-25 16:05:49 +00:00
Lasse Dalegaard
cbcbe0da2e
Expose rkt DriverNetwork
...
Currently the rkt driver does not expose a DriverNetwork instance after
starting the container, which means that address_mode = 'driver' does
not work.
To get the container network information, we can call `rkt status` on
the UUID of the container and grab the container IP from there.
For the port map, we need to grab the pod manifest as it will tell us
which ports the container exposes. We then cross-reference the
configured port name with the container port names, and use that to
create a correct port mapping.
To avoid doing a (bad) reimplementation of the appc schema(which rkt
uses for its manifest) and rkt apis, we pull those in as vendored
dependencies. The versions used are the same ones that rkt use in their
glide dependency configuration for version 1.28.0.
2017-09-21 00:34:22 +02:00
Lasse Dalegaard
7ac599d509
Use rkt prepare + run-prepared instead of run.
...
The rkt driver currently executes run and asks that the pod UUID is
written to a file that is then polled for changes for up to five
seconds. Many container fetches will take longer than this, so this
method will often not be able to track the pod UUID reliably.
To avoid this problem, rkt allows pods to be first prepared, which will
return their UUID, and then run as a second invocation.
Here we convert the rkt driver's Start method to use this method
instead. This way, the UUID will always be tracked correctly.
2017-09-21 00:17:31 +02:00
Michael Schurter
f92ffe5af5
Merge pull request #3105 from hashicorp/f-876-restart-unhealthy
...
Restart unhealthy tasks
2017-09-17 19:38:32 -07:00
epipho
a16c97394f
Fix incorrect docker stats
2017-09-16 00:43:03 -04:00
Michael Schurter
67a4a169a9
Name const after what it represents
2017-09-15 14:57:18 -07:00
Michael Schurter
79a7bf3d7c
Cleanup and test restart failure code
2017-09-15 14:54:37 -07:00
Michael Schurter
06ca379da0
Add comments
2017-09-15 14:34:36 -07:00
Michael Schurter
4dbaa52aba
Fold SetFailure into SetRestartTriggered
2017-09-14 16:48:39 -07:00
Michael Schurter
ed77c0944b
DRY up restart handling a bit.
...
All 3 error/failure cases share restart logic, but 2 of them have
special cased conditions.
2017-09-14 16:48:39 -07:00
Michael Schurter
73fb71ca10
RestartDelay isn't needed as checks are re-added on restarts
...
@dadgar made the excellent observation in #3105 that TaskRunner removes
and re-registers checks on restarts. This means checkWatcher doesn't
need to do *any* internal restart tracking. Individual checks can just
remove themselves and be re-added when the task restarts.
2017-09-14 16:48:39 -07:00
Michael Schurter
06dd86adbd
Remove unused lastStart field
2017-09-14 16:47:41 -07:00
Michael Schurter
0447f79288
Removed partially implemented allocLock
2017-09-14 16:47:41 -07:00
Michael Schurter
ade29ecbed
Improve check watcher logging and add tests
...
Also expose a mock Consul Agent to allow testing ServiceClient and
checkWatcher from TaskRunner without actually talking to a real Consul.
2017-09-14 16:47:41 -07:00
Michael Schurter
a137676358
Add comments and move delay calc to TaskRunner
2017-09-14 16:46:54 -07:00
Michael Schurter
8a87475498
Use existing restart policy infrastructure
2017-09-14 16:46:54 -07:00
Michael Schurter
22690c5f4c
Add check watcher for restarting unhealthy tasks
2017-09-14 16:46:54 -07:00
Alex Dadgar
d306da846c
changelog and feedback
2017-09-14 14:08:58 -07:00
Alex Dadgar
07ed83fdd5
Non-locked accessors to common Node fields
...
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.
Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Komlo
536d38454b
Merge pull request #3191 from hashicorp/b-tagged-metrics-panic
...
Fix panic in emitting tagged allocation metrics
2017-09-11 14:28:50 -04:00
Armon Dadgar
d4aed839d2
Merge pull request #3185 from hashicorp/f-acl-reset
...
Add ability to reset ACL bootstrap process
2017-09-11 10:47:17 -07:00
Armon Dadgar
3d5ecaafff
Address @dadgar feedback
2017-09-11 10:30:59 -07:00
Alex Dadgar
b3958faa14
Merge pull request #3187 from hashicorp/b-windows-docker
...
Fix MemorySwappiness on Windows Docker
2017-09-11 09:56:49 -07:00
Alex Dadgar
1cd8f7523f
Merge pull request #3184 from hashicorp/b-docker-logging
...
Fix docker user specified syslogging
2017-09-11 09:31:33 -07:00
Chelsea Holland Komlo
848af92183
fix panic in emitting tagged metrics
2017-09-11 15:32:37 +00:00
Alex Dadgar
d3a9463358
Fix MemorySwappiness on Windows Docker
...
Fixes https://github.com/hashicorp/nomad/issues/3181
2017-09-10 17:46:45 -07:00
Alex Dadgar
3ec7946b3e
Fix invalid CPU stats on Windows
...
This PR fixes an issue introduced in Nomad 0.6.0 due to
https://github.com/shirou/gopsutil/issues/420 . The issue arised from the
fact that the Windows stats from gopsutil reports CPUs in
percentages where we expected ticks.
2017-09-10 15:30:48 -07:00
Alex Dadgar
637ae9580a
Fix docker user specified syslogging
2017-09-10 14:57:48 -07:00
James Nugent
448145872f
client: Guard against "NaN" values from floats
...
This commit protects against finding `0.NaN` tokens in JSON streams
because of infinity representation on serialization.
2017-09-08 16:21:07 -05:00
Alex Dadgar
31f9e099d9
Merge pull request #3148 from clinta/purge-stopped
...
Always purge stopped containers
2017-09-05 17:18:05 -07:00
Alex Dadgar
6fdaf38389
Fix repo name passed to docker credential helpers
...
This PR fixes the server url passed to docker credential helpers and
fixes stderr capture.
Fixes https://github.com/hashicorp/nomad/issues/2957
2017-09-05 16:43:21 -07:00
Alex Dadgar
21564c7c04
Parse Docker mounts correctly ( #3163 )
...
* Parse Docker mounts correctly
This PR fixes the parsing of Docker mounts and adds testing to ensure no
regressions.
Fixes https://github.com/hashicorp/nomad/issues/3156
* Review feedback
2017-09-05 14:02:57 -07:00
Chelsea Holland Komlo
0ef43c3c5f
final code review fixups
2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
dea1fa089b
fix up travis test failure via race condition
2017-09-05 15:04:59 +00:00
Chelsea Holland Komlo
a8cbd0b559
fixups from code review
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f72e4aad13
labels depend on full setup of client beforehand
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
87a814397d
refactor to use baseLabels
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
b2953d905a
pass in commonly used values
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
c634043069
create base labels to be used in every metric
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f5ea83da8d
emit metrics using labels, add option for backwards compatibility
2017-09-05 14:12:57 +00:00
Chelsea Holland Komlo
0175f80775
add metrics options to client config
2017-09-05 14:12:57 +00:00
Armon Dadgar
b8bf35f087
ACL RPCs allow stale reads for scalability
2017-09-04 13:07:44 -07:00
Armon Dadgar
f31cd6a618
client: fixing policy resolution after ACL endpoint enforcement
2017-09-04 13:05:53 -07:00
Armon Dadgar
ddcc5f89bc
Add ErrPermissionDenied, rename TokenNotFound
2017-09-04 13:05:53 -07:00
Armon Dadgar
76a03f2d8e
Address @dadgar feedback
2017-09-04 13:05:53 -07:00
Armon Dadgar
e3f32ca6f1
client: adding token resolution logic
2017-09-04 13:05:36 -07:00
Armon Dadgar
688897561b
client: adding token cache for ACL resolution
2017-09-04 13:05:36 -07:00
Armon Dadgar
c2e72e8a9c
client: create ACL and Policy cache
2017-09-04 13:05:35 -07:00
Armon Dadgar
792f176a44
agent: thread ACL config to client
2017-09-04 13:04:45 -07:00
Clint Armstrong
b5c2636313
Always purge stopped containers
2017-08-31 14:28:48 -04:00
Clint Armstrong
7e35ab6abb
fix logging re-init
2017-08-30 12:36:31 -04:00
Michael Schurter
78823d559b
Squelch logspam when unable to get disk usage stats
...
To reproduce logspam:
```
$ docker plugin install --grant-all-permissions vieux/sshfs
$ nomad agent -dev
...
2017/08/25 17:09:03.282868 [WARN] client: error fetching host disk usage stats for /var/lib/docker/plugins/a8b4a69b07e5180f828d19e1e9e102ccc0e26f9c9939eaef85357260c30b20a7/rootfs/mnt/volumes: permission denied
... repeats every collection period ...
```
2017-08-28 12:04:32 -07:00
Alex Dadgar
876732833f
Merge pull request #3073 from clinta/docker-500
...
Allow retry of 500 API errors to be handled by restart policies
2017-08-24 16:57:36 -07:00
Alex Dadgar
fd7d614ae4
Handle interfaces that only have link-local addrs
...
This PR changes the fingerprint handling of network interfaces that only
contain link local addresses. The new behavior is to prefer globally
routable addresses and if none are detected, to fall back to link local
addresses if the operator hasn't disallowed it. This gives us pre 0.6
behavior for interfaces with only link local addresses but 0.6+ behavior
for IPv6 interfaces that will always have a link-local address.
Fixes https://github.com/hashicorp/nomad/issues/3005
/cc diptanuc
2017-08-23 15:32:22 -07:00
Alex Dadgar
211a793530
resolve feedback
2017-08-23 14:17:00 -07:00
Alex Dadgar
653733e093
Clean up docker mounts
2017-08-22 14:12:44 -07:00
Clint Armstrong
ae230395ba
Allow retry of 500 API errors to be handled by restart policies
2017-08-22 14:04:46 -04:00
Michael Schurter
51a27cc83d
Merge pull request #3031 from hashicorp/f-2924-consul-headers
...
Add Header and Method support for HTTP checks
2017-08-18 13:35:08 -07:00
Michael Schurter
7ebd429a86
Merge mistake made go fmt fail
2017-08-18 13:19:44 -07:00
Michael Schurter
5c015da3cb
Merge pull request #3021 from clinta/docker-mount2
...
Expose docker mount options
2017-08-17 16:57:09 -07:00
Michael Schurter
ff3944a981
Update and test service/check interpolation
2017-08-17 16:49:14 -07:00
Michael Schurter
b4813747d0
Merge pull request #3043 from hashicorp/f-2441-shutdown-delay
...
Add optional shutdown delay to tasks
2017-08-17 14:37:48 -07:00
Michael Schurter
c709251ed6
Lower ShutdownDelay for non-Travis testing
2017-08-17 14:23:42 -07:00
Michael Schurter
b33b2fb4c0
Lower shutdown delay in test
2017-08-17 13:57:22 -07:00
Michael Schurter
0726ca75e3
Make shutdown delay log DEBUG, not INFO
2017-08-17 11:28:33 -07:00
Clint Armstrong
f0460156ae
restrict mount to volume type
2017-08-17 09:52:13 -04:00
Michael Schurter
d529b422b2
Add optional shutdown delay to tasks
...
Fixes #2441
Defaults to 0 (no delay) for backward compat and because this feature
should be opt-in.
2017-08-16 17:59:46 -07:00
Alex Dadgar
d6187cd3e8
Fix tests
2017-08-16 16:26:52 -07:00
Alex Dadgar
1a86aecf55
Add version package
...
This PR adds a version package and consolidates version strings into a
Version struct.
2017-08-16 15:44:21 -07:00
Alex Dadgar
3d69961c3a
Must be root for TestAllocDir_CreateDir
2017-08-16 10:46:14 -07:00
Alex Dadgar
7dd86b5dfe
Merge pull request #3025 from hashicorp/f-health-events
...
Emit task events explaining alloc health
2017-08-15 12:23:46 -07:00
Alex Dadgar
bb165b97ef
comments
2017-08-15 12:23:29 -07:00
Michael Schurter
1126268a81
Fix formatting
2017-08-15 10:37:02 -07:00
Michael Schurter
74d5c272c6
Cleanup comments and return val
2017-08-14 16:59:03 -07:00
Michael Schurter
46b7fd45d7
spelling
2017-08-14 16:55:59 -07:00
Michael Schurter
de8ea243b6
Return move errors from local Migrate like remote
...
Since alloc runner just logs these errors and continues there's no
reason not to return it.
2017-08-14 16:48:56 -07:00
Michael Schurter
7342e23669
Move migrating state into prevAllocWatcher
2017-08-14 16:02:28 -07:00
Alex Dadgar
fdc0115427
test
2017-08-12 14:42:53 -07:00
Alex Dadgar
56801349eb
Refactor health watcher and emit events
2017-08-12 14:23:36 -07:00
Michael Schurter
4601419d63
Soft fail on migration errors
2017-08-11 16:50:30 -07:00
Michael Schurter
3dbd764969
Exit if alloc listener closes
...
Add test for that case, add comments, remove debug logging
2017-08-11 16:22:02 -07:00
Michael Schurter
b7915bdac7
Update tests for new blocking/migrating code
2017-08-11 16:21:57 -07:00
Michael Schurter
ad6cec9e82
Set failed status instead of panic'ing
...
Fixup some TODOs and formatting left from new prevAllocWatcher code.
2017-08-11 16:21:35 -07:00
Michael Schurter
e41a654917
switch from alloc blocker to new interface
...
interface has 3 implementations:
1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter
ee04717a0b
initial attempt at refactoring blocked/migrating
2017-08-11 16:21:35 -07:00
Michael Schurter
ec6e6e6c66
Only set alloc status if it's not already terminal
2017-08-11 16:21:35 -07:00
Alex Dadgar
0d5127d5fc
Merge pull request #3011 from hashicorp/b-cv-fix-TestEnvAWSFingerprint_aws
...
Updated AWS fingerprint test for ami-id
2017-08-11 10:58:22 -07:00
Alex Dadgar
2fdfd9af4a
Merge pull request #2992 from decoomanj/master
...
Added dnsoptions to the docker driver
2017-08-11 10:12:36 -07:00
Charlie Voiselle
507c75bd16
Updated AWS fingerprint test for ami-id
...
In https://github.com/hashicorp/nomad/pull/2999 , I changed ami-id
to non-unique. This updates the test to reflect that.
2017-08-11 12:54:27 -04:00
Jan De Cooman
8b88d56c01
updated message in test
2017-08-11 09:24:15 +02:00
Alex Dadgar
1b061b8f47
Unmount task directories when alloc is terminal
...
This PR unmounts directories from tasks when the alloc is terminal
rather than when it is garbage collected.
/cc @angrycub
2017-08-10 13:28:17 -07:00
Alex Dadgar
6e20acb503
Merge pull request #2984 from hashicorp/b-tags
...
Fix alloc health with checks using interpolation
2017-08-10 13:07:25 -07:00
Alex Dadgar
6b238edc22
Merge pull request #3001 from hashicorp/f-template-events
...
Template emits events explaining why it is blocked
2017-08-10 13:00:58 -07:00
Alex Dadgar
bd9f63d20e
address comments
2017-08-10 13:00:06 -07:00
Clint Armstrong
9063b500e0
expose mount options to nomad
2017-08-10 12:37:17 -04:00
Alex Dadgar
83ba2f1814
Template emits events explaining why it is blocked
...
This PR does the following:
* Adds a mechanism to emit events in the TaskRunner
* Vendors a new version of Consul-Template that allows extraction of
missing dependencies
* Adds logic to our consul_template.go to determine missing events and
emit them in a batched fashion.
* Refactors the consul_template code to split the run method and take in
a config struct rather than many parameters.
Fixes https://github.com/hashicorp/nomad/issues/2578
2017-08-09 18:01:27 -07:00
Charlie Voiselle
ae466eaaa7
AMI ID is potentally non-unique
...
Changed the keys map to reflect that.
2017-08-09 12:53:54 -04:00
Jan De Cooman
633bcee661
fixed typo
2017-08-09 14:44:38 +02:00
Jan De Cooman
804fc0d06f
added dnsoptions to the docker driver
2017-08-09 13:30:06 +02:00
Alex Dadgar
aba107be99
Merge pull request #2979 from lfarnell/cleanup
...
Code cleanup
2017-08-08 10:21:15 -07:00
Alex Dadgar
4f6f6a13c8
Emit generic task events
2017-08-07 21:26:04 -07:00
Alex Dadgar
79d25b7db9
Merge pull request #2947 from hashicorp/f-vault-grace
...
Allow template to set Vault grace
2017-08-07 16:29:53 -07:00
Alex Dadgar
93b9a1bf20
Rename runnerConfig
2017-08-07 16:29:42 -07:00
Alex Dadgar
d86b3977b9
Fix alloc health with checks using interpolation
...
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.
Fixes https://github.com/hashicorp/nomad/issues/2969
2017-08-07 16:27:08 -07:00
Luke Farnell
f0ced87b95
fixed all spelling mistakes for goreport
2017-08-07 17:13:05 -04:00
Michael Schurter
c76b3b54b9
Merge branch 'master' into fix-pending-state
2017-08-03 17:27:03 -07:00
Alex Dadgar
067a638478
Allow template to set Vault grace
...
This PR allows a template to specify the Vault grace duration.
Fixes https://github.com/hashicorp/nomad/issues/2922
2017-08-01 14:14:08 -07:00
Alex Dadgar
562ea52c8e
vendor vault api
2017-08-01 09:30:55 -07:00
Michael Schurter
6243c9eb86
Merge pull request #2883 from kmalec/add-support-for-readonly-mount
...
rkt driver support for read-only volumes mounts
2017-07-31 10:58:22 -07:00
Alex Dadgar
010567dba8
Fix leaked plugin files for syslog server
...
This PR fixes a leaking of the unix socket used when launching a syslog
server for the Docker driver.
Fixes https://github.com/hashicorp/nomad/issues/2844
2017-07-30 17:51:38 -07:00
Alex Dadgar
a9c786a4fe
Make test Vault pick random ports
2017-07-25 17:40:59 -07:00
Michael Schurter
b01dd31f26
Don't attempt to restore tasks that never sync'd
2017-07-24 15:58:46 -07:00
Alex Dadgar
031da7a21c
fix vet
2017-07-22 22:43:33 -07:00
Alex Dadgar
0f3f1ea68b
travis check fixes
2017-07-22 21:01:22 -07:00
Alex Dadgar
c1a72d24e6
fingerprinters
2017-07-22 20:38:03 -07:00
Alex Dadgar
62c55c8fc9
fix slow resolve on mac
2017-07-22 19:58:30 -07:00
Alex Dadgar
72d055aa9c
drop rkt deadline
2017-07-22 19:54:06 -07:00
Alex Dadgar
219fecc705
Merge branch 'master' of github.com:hashicorp/nomad
2017-07-22 19:48:54 -07:00
Alex Dadgar
d760e68774
darwin test fixes
2017-07-22 19:48:47 -07:00
Alex Dadgar
553bc91725
Parallel client tests ( #2890 )
...
* alloc_runner
* Random tests
* parallel task_runner and no exec compatible check
* Parallel client
* Fail fast and use random ports
* Fix docker port mapping
* Make concurrent pull less timing dependant
* up parallel
* Fixes
* don't build chroots in parallel on travis
* Reduce parallelism on travis with lxc/rkt
* make java test app not run forever
* drop parallelism a little
* use docker ports that are out of the os's ephemeral port range
* Limit even more on travis
* rkt deadline
2017-07-22 19:04:36 -07:00
Alex Dadgar
b6f0782732
typo
2017-07-22 12:55:30 -07:00
Alex Dadgar
8cf9d15b01
typo
2017-07-22 12:33:07 -07:00
Alex Dadgar
9e9c20ca77
small fixes
2017-07-22 12:25:02 -07:00
Alex Dadgar
5a3df2ed89
Merge pull request #2888 from hashicorp/b-fix-allocrunner-test
...
Fix TestAllocRunner_TaskLeader_StopTG and unrelated races
2017-07-22 11:44:04 -07:00
Alex Dadgar
46c8bec9b0
faster vaultclient
2017-07-21 19:38:37 -07:00
Michael Schurter
d840fc8c95
Fix tr race by not sharing alloc/task
...
prestart only needs the original alloc/task so pass their pointers in.
Task updates may concurrently replace the pointer on tr.
2017-07-21 16:17:42 -07:00
Michael Schurter
a22cfa8387
Minor test race fix
2017-07-21 16:17:23 -07:00
Michael Schurter
9a7a1d8c13
Fix race by not accessing tr.task from ar
2017-07-21 16:16:53 -07:00
Michael Schurter
2e9a1e3fa6
Remove unneeded saveTaskRunnerState method
...
Collapse it into the one place it's called
2017-07-21 16:16:02 -07:00
Michael Schurter
996ce9286e
Fix test race by locking around ar.tasks access
2017-07-21 14:25:51 -07:00
Michael Schurter
8d1d8eac46
Fix handle race
2017-07-21 14:00:32 -07:00
Michael Schurter
5f40901422
Fix more test races
2017-07-21 14:00:21 -07:00
Michael Schurter
b9ba447399
Fixup a few more even rarer test races
2017-07-21 13:43:32 -07:00
Michael Schurter
38cb2021dd
Always interpolate task before calling with Consul
...
Also switch to returning a copy of the task to avoid races between
altering the Task and persitence.
2017-07-21 13:37:16 -07:00
Michael Schurter
6e80a8ee39
Fix TestAllocRunner_TaskLeader_StopTG
...
Also make alloc runner tests less racy. Basically every alloc runner
test used to have races with `upd.{Count,Allocs}`
2017-07-21 13:37:16 -07:00
Alex Dadgar
e509661cf9
executor and logging pkg
2017-07-21 12:14:54 -07:00
Alex Dadgar
7c433a1767
Parallel
2017-07-21 12:06:39 -07:00
Karel Malec
4b98f94a88
Allow rkt driver to mount volumes read-only
2017-07-21 13:05:15 +02:00
Alex Dadgar
56f9cf86df
Speed up client startup
2017-07-20 22:34:24 -07:00
Michael Schurter
0d7f7e2b9d
Merge pull request #2878 from hashicorp/b-save-state
...
Fix state handling on restart
2017-07-20 17:16:46 -07:00
Karel Malec
cf985f011c
Pass task group name as NOMAD_GROUP_NAME environment variable
2017-07-21 01:22:54 +02:00
Alex Dadgar
09c8ee621b
Destroy tasks that are part of terminal alloc
2017-07-20 12:02:04 -07:00
Michael Schurter
9a7f649e56
Don't save task runner state if it is destroyed
2017-07-20 10:17:41 -07:00
Alex Dadgar
64776b1370
Should not persist state after alloc_runner is garbage collected
2017-07-19 17:31:30 -07:00
Michael Schurter
c1b8bef813
Use broadcast send retry logic everywhere
2017-07-18 14:36:32 -07:00
Alex Dadgar
d2381c9263
Merge pull request #2853 from hashicorp/b-watcher
...
Improve alloc health watcher
2017-07-18 14:12:28 -07:00
Alex Dadgar
bd43bd509c
Save deployment status
2017-07-18 12:37:52 -07:00
Alex Dadgar
41f67e3535
Small fixes
2017-07-18 12:19:57 -07:00
Michael Schurter
c24e73ede7
Fix deadlock caused by syncing during destroy
...
When replacing an alloc the new alloc is blocked until the old alloc is
destroyed. This could cause a deadlock:
1. Destroying the old alloc includes a final sync of its status
2. Syncing status causes a GC
3. A GC looks for terminal allocs to cleanup
4. The GC waits for an alloc to stop completely before GC'ing
If the GC chooses the currently-being-destroyed-alloc to GC, the GC
deadlocks. If `client.max_parallel` deadlocks happen the GC is wedged
until the Nomad process is restarted.
Performing the final sync asynchronously is an ugly hack but prevents
the deadlock by allowing the final sync to occur after the alloc runner
has shutdown and been destroyed.
2017-07-18 11:12:56 -07:00
Michael Schurter
420be86e39
Test AllocDir.Copy
2017-07-17 15:46:54 -07:00
Michael Schurter
cdb2e96d99
Add AllocRunner.allocID for ease-of-use
...
Since the AllocRunner.alloc struct can be mutated, most of AllocRunner
needs to acquire a lock to get the alloc's ID. Log lines always need to
include the alloc ID, so we often skipped acquiring a lock just to grab
the ID and accepted the race.
Let's make the race detector a little happier by storing the ID in a
single assignment field.
2017-07-17 15:46:54 -07:00
Michael Schurter
181fda825a
Fix log level
2017-07-17 15:46:54 -07:00
Michael Schurter
98f6e7f10f
Don't fail if task dirs don't exist on creation
...
Task dir metadata is created in AllocRunner.Run which may not run before
an alloc is sync'd and Nomad exits. There's no reason not to just create
task dir metadata on restore if it doesn't exist.
2017-07-17 15:46:54 -07:00
Michael Schurter
51515cbe0c
Ensure allocDir is never nil and persisted safely
...
Fixes #2834
2017-07-17 15:46:54 -07:00
Alex Dadgar
0821ee67f5
Fix alloc broadcaster panic on double close
2017-07-17 14:09:05 -07:00
Michael Schurter
0a6bf87365
Fix nil panic in Docker error condition
...
Fixes #2835
Yet another bug caused by overwriting container and then trying to
reference container.ID in the err handling block. Did a quick audit of
docker.go and it seems to be the last offender. See #2804 for previous
bug.
2017-07-14 10:48:19 -07:00
Michael Schurter
e9a416b731
Merge branch 'master' into fix-pending-state
2017-07-10 10:43:23 -07:00
unknown
26b16fa3ce
#2563 fixed pending state for allocations with terminal status
2017-07-09 16:18:06 +03:00
Alex Dadgar
05894f4611
Small fixes
2017-07-07 17:34:50 -07:00
Michael Schurter
fecb16cfb2
Merge pull request #2793 from hashicorp/b-2776-ct-vault-servername
...
Propagate vault.tls_server_name to consul-template
2017-07-07 16:44:19 -07:00
Michael Schurter
95a9a5da71
Merge pull request #2787 from hashicorp/f-docker-test-mac
...
Test #2652 - Docker MAC Address option
2017-07-07 16:22:10 -07:00
Michael Schurter
4be4df21c9
Merge pull request #2797 from hashicorp/f-2785-docker-bridge-ip
...
Add driver.docker.bridge_ip node attribute
2017-07-07 16:20:20 -07:00
Michael Schurter
94389c3ecc
Remove debug logging
2017-07-07 16:19:42 -07:00
Michael Schurter
5e3e3818db
Merge pull request #2804 from hashicorp/b-2802-docker-panic
...
Don't panic in container list/remove/inspect race
2017-07-07 15:35:51 -07:00
Michael Schurter
67a7b0eac9
Don't panic in container list/remove/inspect race
...
Fixes #2802
While it's hard to reproduce the theoretical race is:
1. This goroutine calls ListContainers()
2. Another goroutine removes a container X
3. This goroutine attempts to InspectContainer(X)
However, this bug could be hit in the much simpler case of
InspectContainer() timing out.
In those cases an error is returned and the old code attempted to wrap
the error with the now-nil container.ID. Storing the container ID fixes
that panic.
2017-07-07 15:10:59 -07:00
Alex Dadgar
bf97a2455c
Vet and small improvement on watcher failure detection
2017-07-07 14:53:01 -07:00
Alex Dadgar
45712c6ca3
test fixes
2017-07-07 14:11:27 -07:00
Alex Dadgar
ade9a7c768
@jippi Changed my mind! Good suggestion
2017-07-07 12:12:48 -07:00
Alex Dadgar
c063eba836
Warn log
2017-07-07 12:10:04 -07:00
Alex Dadgar
067ed86a47
Client watches for allocation health using task state and Consul checks
...
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar
001058227e
watcher per alloc
2017-07-07 12:07:08 -07:00
Alex Dadgar
2e2fd26bed
Update index
2017-07-07 12:07:08 -07:00
Alex Dadgar
ecee5e370e
initial watcher
2017-07-07 12:07:08 -07:00
Alex Dadgar
c77944ed29
assign names
2017-07-07 12:03:11 -07:00
Michael Schurter
084dd384c1
Add driver.docker.bridge_ip node attribute
...
Fixes #2785
2017-07-07 10:14:10 -07:00
Michael Schurter
d38d48151a
Propagate vault.tls_server_name to consul-template
...
Fixes #2776
2017-07-06 16:56:50 -07:00
Michael Schurter
39edf23fd5
Merge pull request #2786 from hashicorp/f-docker-auth-soft-fail
...
Default to auth hard fail but optionally soft fail
2017-07-06 13:25:56 -07:00
Michael Schurter
bae1b7db2d
Test #2652
...
Also cleanup docker config opts docs
2017-07-06 12:46:25 -07:00
Michael Schurter
8f4353779a
Merge branch 'master' into master
2017-07-06 12:09:36 -07:00
Michael Schurter
2900f941b5
Default to auth hard fail but optionally soft fail
2017-07-06 11:35:34 -07:00
Michael Schurter
08b452adf5
Merge pull request #2781 from hashicorp/f-2678-getter-mode
...
Add support for go-getter modes
2017-07-06 11:06:40 -07:00
Michael Schurter
b000bb8598
Merge pull request #2744 from aep/master
...
Do not fail when no docker registry auth is available
2017-07-06 11:04:11 -07:00
Michael Schurter
0d3bdf7210
Add support for go-getter modes
...
Fixes #2678
2017-07-06 10:45:44 -07:00
Michael Schurter
644f0cfaa4
Consistently quote alloc ids in client logs
2017-07-06 10:24:52 -07:00
Michael Schurter
4fd9ef6a8c
Tiny client race condition fix
...
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
8e2e26c607
rkt: use %s instead of %q when interpolating env
...
Fixes #2686
2017-07-05 09:36:17 -07:00
Michael Schurter
b2382f99f2
0 compute == error
2017-07-03 14:51:02 -07:00
Michael Schurter
ecf090e980
Fix cpu_total_compute override
2017-07-03 14:51:02 -07:00
Michael Schurter
2d741c770b
Merge pull request #2732 from hashicorp/b-persist-alloc-updates
...
Persist Alloc when EvalID changes
2017-07-03 14:46:43 -07:00
Michael Schurter
56a6f8ca8a
Merge pull request #2763 from hashicorp/f-bad-state-help
...
Add more logging to restore state errors
2017-07-03 14:45:03 -07:00
Michael Schurter
9d4b0651ef
Merge pull request #2753 from hashicorp/b-leader-dies-first
...
Destroy task group leader first
2017-07-03 14:38:04 -07:00
Michael Schurter
6e7cc3964e
Merge pull request #2709 from hashicorp/f-advertise-docker-ips
...
Advertise driver-specific addresses
2017-07-03 14:04:12 -07:00
Michael Schurter
5ec52ec24a
Destroy task group leader first
...
Before this commit all tasks in a task group were destroyed
concurrently. This meant logging sidecars might be stopped before the
leader task whose logs still need to be shipped.
This commit blocks on the leader shutting down before signalling to
followers to shutdown.
2017-07-03 13:56:56 -07:00
Michael Schurter
596727230b
Suggest wiping out alloc dir too
2017-07-03 12:29:21 -07:00
Michael Schurter
11f68bfca2
Add more logging to restore state errors
2017-07-03 11:58:41 -07:00
Arvid E. Picciani
aa4f029f10
Do not fail when no docker registry auth is available
...
this amends the behaviour introduced with #2651
and allows pulling public images when docker.auth.helper is set
2017-06-27 11:11:18 +02:00
Michael Schurter
8fcf866a7d
Fix some tests still expecting reverted behavior
2017-06-23 16:51:38 -07:00
Michael Schurter
e81252ba45
Default no_host_uuid to true instead of false
...
The host UUID isn't unique in many virtualized cases and of dubious
value even when it is univerally unique. Default to a random UUID.
2017-06-23 16:23:01 -07:00
Michael Schurter
5a274e6683
Style and comments
2017-06-23 15:20:04 -07:00
Michael Schurter
cff8546035
Fix spelling & re-add immutable state struct
2017-06-23 13:01:39 -07:00
Michael Schurter
d359d3b554
Rename immutable -> alloc
...
meh; naming is hard
2017-06-23 10:58:36 -07:00
Michael Schurter
af2fc0f1bc
Persist Alloc when EvalID changes
2017-06-22 17:33:12 -07:00
Michael Schurter
f3a6ddc57d
Remove DRIVER env vars
...
Also make NOMAD_ADDR_* use host ip:port for consistency. NOMAD_PORT_*
varies based on port map and the driver IP isn't exposed as an env var
as the only place it can be used is in script checks anyway.
2017-06-21 17:19:08 -07:00
Michael Schurter
0633d0c286
Have Qemu return PortMap
2017-06-21 17:19:08 -07:00
Michael Schurter
38a0695687
Simplify Docker Networks processing
2017-06-21 17:19:08 -07:00
Michael Schurter
fec83b271a
Bump error log level
2017-06-21 17:19:08 -07:00
Michael Schurter
8d677bc6b9
Fix lxc tests
2017-06-21 17:19:08 -07:00
Michael Schurter
8d440b1675
Skip DRIVER env vars for labels without a port mapping
2017-06-21 17:19:08 -07:00
Michael Schurter
c0eff81383
Fix Service.AddressMode changes during task updates
2017-06-21 17:19:08 -07:00
Michael Schurter
67d154a274
Test driver network advertisement and checks
2017-06-21 17:19:08 -07:00
Michael Schurter
b9bfb84b53
Implement DriverNetwork and Service.AddressMode
...
Ideally DriverNetwork would be fully populated in Driver.Prestart, but
Docker doesn't assign the container's IP until you start the container.
However, it's important to setup the port env vars before calling
Driver.Start, so Prestart should populate that.
2017-06-21 17:19:08 -07:00
Hynek Schlawack
59ab34c264
Fix typos
2017-06-16 16:10:12 +02:00
Michael Schurter
b69e060071
Log PID when sending signals
2017-06-12 11:11:36 -07:00
Michael Schurter
ffb417a300
Merge pull request #2697 from hashicorp/b-port-map
...
Fix port map interpolation for docker
2017-06-09 13:29:36 -07:00
Michael Schurter
a3827d2cc6
Fix bad merge conflict resolution
2017-06-09 10:40:47 -07:00
Michael Schurter
eabd6759c6
Merge branch 'master' into add-no-overlay-option
2017-06-09 09:59:35 -07:00
Alex Dadgar
5ba2662b30
Merge pull request #2687 from mmickan/issue-2685
...
Include symlinks in snapshots when migrating disks
2017-06-08 13:35:46 -07:00
Michael Schurter
784d69789e
Merge branch 'master' into add-no-overlay-option
2017-06-08 13:15:56 -07:00
Alex Dadgar
7695e636d5
Fix port map interpolation for docker
...
This PR fixes an issue in which the value of the portmap could not be
interpolated.
Fixes https://github.com/hashicorp/nomad/issues/2680
2017-06-08 13:12:32 -07:00
Karel Malec
b55f4bf601
Fix backticks in docs; refine --debug comment
2017-06-07 21:11:22 +02:00
Karel Malec
a258a803f2
Added insecure_options config list
2017-06-07 09:58:42 +02:00
Karel Malec
1957e9dfa6
Add a no_overlay option for the rkt task config.
2017-06-07 00:17:33 +02:00
Mark Mickan
c196d320f8
Add tests for migrating symlinks in alloc and local directories
2017-06-04 15:56:22 +09:30
Mark Mickan
236f24c9a4
Include symlinks in snapshots when migrating disks
...
Fixes #2685
2017-06-04 00:36:18 +09:30
Michael Schurter
d1dd380890
Switch to hashicorp/go-envparse
2017-06-02 15:58:52 -07:00
Michael Schurter
a552bcdb55
Move env file parsing to a library
2017-06-02 15:03:27 -07:00
Alex Dadgar
3b46fe136f
small cleanup
2017-05-31 15:56:54 -07:00
Alex Dadgar
8d6e28ace8
Merge branch 'master' into feature/2334
2017-05-31 14:27:07 -07:00
Alex Dadgar
044f1da5ff
Merge pull request #2681 from hashicorp/b-deadlock
...
Fix a deadlock relating to blocked allocations
2017-05-31 14:26:54 -07:00
Alex Dadgar
ec9cb2c751
Merge pull request #2672 from eyberg/master
...
dont throw away errors in log rotation
2017-05-31 14:14:22 -07:00
Alex Dadgar
b1eea2269a
Fix deadlock
2017-05-31 14:05:47 -07:00
Michael Schurter
cb568a5cf6
Cleanup lots of leaked alloc runners in tests
2017-05-31 11:39:50 -07:00
Ulrik Mikaelsson
6138564f00
Implement support for docker-credential-helpers
...
Solves: #2334
2017-05-31 12:45:02 +02:00
Michael Schurter
ffc2b36dc7
Merge pull request #2636 from hashicorp/f-gc-alloc-limit
...
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter
dd51aa1cb9
Merge pull request #2654 from hashicorp/f-env-consul
...
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Michael Schurter
e1a7c2d6d7
Fix Error -> Errorf
2017-05-30 12:08:59 -07:00
Michael Schurter
53d713bacb
Fix getter tests
...
And use an interface for ReplaceEnv since its all getter needs.
2017-05-26 16:52:47 -07:00
Michael Schurter
51d8231911
Fix executor tests
2017-05-26 16:46:03 -07:00
Michael Schurter
3184616936
Always use PATH-only env for rkt commands
2017-05-26 15:41:26 -07:00