Samuel BERTHE
6c93922cb7
Oops
2017-11-17 16:14:14 -06:00
Samuel BERTHE
c8363bc44b
💄
2017-11-17 16:03:22 -06:00
Samuel BERTHE
281ab90484
test(docker driver): testing sysctls and ulimits
2017-11-17 16:03:22 -06:00
Samuel BERTHE
b9a10ff7fa
feat(docker driver): adds sysctls and ulimits configs
2017-11-17 16:03:22 -06:00
Alex Dadgar
69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
...
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter
3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
...
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter
737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
...
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter
437fce9954
Improve rktRemove error message
2017-11-16 15:45:14 -08:00
Michael Schurter
3ceec0caab
Remove rkt pods when exiting
...
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle
7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
...
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle
969ddf9c2a
Lowered to DEBUG from AD feedback
2017-11-16 14:13:03 -05:00
Alex Dadgar
05b1588cea
Only publish metric when the task is running and dev mode publishes metrics
2017-11-15 13:21:06 -08:00
Alex Dadgar
07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
...
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar
97ec3974a9
Use interface attached to default route
2017-11-15 11:32:32 -08:00
Michael Schurter
f86f0bd9ea
Handle leader task being dead in RestoreState
...
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932
While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*
The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle
1197637251
Dropped loglevel for AWS fingerprinter env reads
...
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally). Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo
2dfda33703
Nomad agent reload TLS configuration on SIGHUP ( #3479 )
...
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
3023336b39
Add a test demonstrating the bug
...
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar
ee31e15f51
Better interface selection heuristic
...
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.
Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan
926c9ed997
Make device mounting unit test verify configuration via docker inspect
2017-11-13 09:56:54 -06:00
Preetha Appan
dc2d5fb5a4
Unit test (linux only) that tests mounting a device in the docker driver
2017-11-13 09:56:54 -06:00
Preetha Appan
4834710e45
Add default value for cgroup permissions for device if not set
2017-11-13 09:56:54 -06:00
Preetha Appan
9cdee6991c
Remove unnecessary check since validate method already checks this
2017-11-13 09:56:54 -06:00
Preetha Appan
110c1fd4f0
Add support for passing device into docker driver
2017-11-13 09:56:54 -06:00
Alex Dadgar
d1358ec1b6
alway load all templates
2017-11-10 12:35:51 -08:00
Alex Dadgar
a3ea0c17a0
Handle multiple environment templates
...
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar
b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
...
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter
690b8f4cfb
Remove noisy log line
...
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer
11e2870875
Qemu driver: clean up logging; fail unsupported features on Windows
2017-11-03 15:40:20 -07:00
Alex Dadgar
6034916ad1
fix spelling mistake
2017-11-03 15:04:59 -07:00
Alex Dadgar
a23033932a
Merge pull request #3459 from multani/docker-oom-notification
...
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer
cef9ba9770
Qemu driver: tweaks in response to PR feedback
...
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan
0eaef09675
Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.
2017-11-03 10:10:06 -05:00
Preetha Appan
5f09c968b3
Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.
2017-11-03 09:13:01 -05:00
Alex Dadgar
b4af10edde
Alloc Runner doesn't panic on restoration.
2017-11-02 16:14:13 -07:00
Alex Dadgar
abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
...
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter
eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
...
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury
cb68889652
Added the node_id as a tag
2017-11-02 13:29:10 -07:00
Alex Dadgar
701f462d33
remove atlas
2017-11-02 11:27:21 -07:00
Michael Schurter
fc33c945be
Make unable-to-gc log level adaptive
...
WARNing when someone has over 50 non-terminal allocs was just too
confusing.
Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:
```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury
8a9d0d40b1
Added support for tagged metrics
2017-11-02 10:07:57 -07:00
Diptanu Choudhury
5f522c6de3
Incrementing the start counter when we are actually starting a container
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
44535e5d10
Recording counter for dead allocs properly
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
0b34e811b7
Added metrics to track task/alloc start/restarts/dead events
2017-11-02 09:51:20 -07:00
Matt Mercer
00f90323c2
Qemu driver: defer cleanup sooner
2017-11-01 17:37:43 -07:00
Matt Mercer
43256af5f3
Qemu driver: clean up test logging; retry integration test for longer
2017-11-01 17:21:56 -07:00
Matt Mercer
b1145705d3
Use strings.Replace() instead of custom function
2017-11-01 15:31:35 -07:00
Matt Mercer
d51d174fa0
Qemu driver: basic testing of graceful shutdown feature
2017-11-01 15:31:30 -07:00
Matt Mercer
c26013ea0b
Qemu driver: include PIDs in log output
2017-11-01 15:31:24 -07:00
Matt Mercer
38d9a391aa
Qemu driver: ensure proper cleanup of resources
2017-11-01 15:31:20 -07:00
Matt Mercer
46f7e2fa4c
Qemu driver: minor logging fixes
2017-11-01 15:31:14 -07:00
Matt Mercer
4afb9dfa2d
Standardize driver.qemu logging prefix
2017-11-01 15:30:44 -07:00
Matt Mercer
5127e75569
Qemu driver: add graceful shutdown feature
2017-11-01 15:30:36 -07:00
Michael Schurter
1769db98b7
Fix regression by returning error on unknown alloc
2017-11-01 15:16:38 -05:00
Michael Schurter
9f26b9a403
Fix race in test
2017-11-01 15:16:38 -05:00
Michael Schurter
73e9b57908
Trigger GCs after alloc changes
...
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter
2a81160dcd
Fix GC'd alloc tracking
...
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar
c710550551
fix test
2017-10-30 12:35:31 -07:00
Alex Dadgar
4831380e57
Node access is done using locked Node copy
...
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Jonathan Ballet
5429d1c656
docker: changed OOM killed error message
2017-10-27 20:30:52 +02:00
Jonathan Ballet
12615bde9c
docker: log that a container has been killed by the OOM killer
...
Fix : #2203 (at least for Docker tasks)
2017-10-27 18:05:27 +02:00
Alex Dadgar
f117eb28c7
go style vars
2017-10-25 10:49:34 -07:00
Alex Dadgar
3f8495dd0e
fix two flaky tests
2017-10-23 18:15:52 -07:00
Alex Dadgar
cb0d0ef009
move to consul freeport implementation
2017-10-23 16:51:40 -07:00
Alex Dadgar
dbc014b360
Standardize retrieving a free port into a helper package
2017-10-23 16:48:20 -07:00
Alex Dadgar
4a69e1ad15
don't double parallel
2017-10-23 16:48:06 -07:00
Alex Dadgar
96ca2bbe4c
respond to comments
2017-10-23 15:50:27 -07:00
Alex Dadgar
99c81b5848
Skip if no docker
2017-10-19 16:55:10 -07:00
Alex Dadgar
593536664e
fix flaky java tests
2017-10-19 16:49:57 -07:00
Alex Dadgar
4bc452b479
Undo darwin user setting
2017-10-19 16:49:57 -07:00
Alex Dadgar
c7c6964313
Run as user on mac
2017-10-19 16:49:57 -07:00
Alex Dadgar
55a1dffa2f
sudo docker works
2017-10-19 16:49:57 -07:00
Alex Dadgar
805e7b3b62
docker tests
2017-10-19 16:49:57 -07:00
Michael Schurter
797f49702e
Add logging around moby/moby#32648 bug
2017-10-18 10:44:03 -07:00
Michael Schurter
22ac450b2f
Properly fail rkt fingerprinting on old vesions
2017-10-16 13:58:58 -07:00
Michael Schurter
d7732c1a58
Squelch repeated rkt version warnings
2017-10-16 12:09:47 -07:00
Michael Schurter
b5fd075d74
Test fixes from #3383
2017-10-13 15:45:35 -07:00
Michael Schurter
b63eee17e9
Merge pull request #3383 from hashicorp/b-migrate-token
...
base64 migrate token
2017-10-13 13:46:54 -07:00
Michael Schurter
dfd2967cdb
Merge pull request #3376 from hashicorp/f-node-acls
...
Allow Node.SecretID for Node.GetNode and Allocs.GetAlloc
2017-10-13 11:51:48 -07:00
Michael Schurter
15b991e039
base64 migrate token
...
HTTP header values must be ASCII.
Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Alex Dadgar
85178d6048
rkt remove allocid
2017-10-13 10:07:50 -07:00
Adam Stankiewicz
cefbc72b49
Remove AllocID from ExecutorContext
2017-10-13 17:07:49 +02:00
Michael Schurter
4a70d4356a
Alloc watcher must send Node.SecretID as AuthToken
...
An auth token is required if ACLs are enabled
2017-10-12 16:38:02 -07:00
Michael Schurter
84d8a51be1
SecretID -> AuthToken
2017-10-12 15:16:33 -07:00
Michael Schurter
59ff94cd71
Don't panic on unexpeced Consul response
...
Fixes #3326
2017-10-11 18:25:54 -07:00
Chelsea Holland Komlo
e1c4701a43
fix up build warnings
2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo
b018ca4d46
fixing up code review comments
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
a77e462465
add tests for functionality
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
410adaf726
Add functionality for authenticated volumes
2017-10-11 17:09:20 -07:00
Alex Dadgar
6d3d0a9391
Nomad UI Command
2017-10-09 23:01:55 -07:00
Michael Schurter
f788974f8a
Merge pull request #3288 from simar7/qemu-improvements
...
qemu: Add bound checks for memory assignment
2017-10-02 14:47:05 -07:00
Simarpreet Singh
d801584c46
qemu: Fix lower memory bound to 128M
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 14:29:44 -07:00
Simarpreet Singh
10d7d6dab0
gofmt: format qemu.go and qemu_test.go
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-10-02 13:16:48 -07:00
Michael Schurter
a66c53d45a
Remove structs
import from api
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Michael Schurter
77f1fe40e7
Properly autodetect Docker IP in Windows
...
Our Docker network plugin autodetection code was erroneously treating
Window's default network `nat` as a plugin and defaulting to it instead
of the host.
Fixes #3218
2017-09-27 16:49:23 -07:00
Michael Schurter
a8a87af7ed
Only build rkt driver on linux
...
Build stub for non-linux targets
2017-09-27 14:21:45 -07:00
Simarpreet Singh
3d99e71de8
qemu: Add bound checks for memory assignment
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2017-09-26 21:07:48 -07:00
Michael Schurter
d7229ce6c5
Merge pull request #3256 from dalegaard/master
...
Enable rkt driver to use address_mode = 'driver'
2017-09-26 18:04:37 -05:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Lasse Dalegaard
9f584d1114
Ignore rkt network failure if container died early
...
If the container dies before the network can be read, we now ignore the
error coming out of the network information polling loop. Nomad will
restart the task regardless, so we might be masking the actual error.
The polling loop for the rkt network information, inside the `Start`
method, was getting a bit unwieldy. It's been refactored out so it's not
a seperate function.
2017-09-27 00:15:27 +02:00