Nick Ethier
3598925ca4
client/driver: use correct repo address when using docker-credential helper
2018-05-08 15:17:28 -04:00
Nick Ethier
54c86a0292
client/driver/env: interpolate empty optional meta params as empty strings
2018-05-07 20:19:51 -04:00
Nick Ethier
016ab7a105
client/driver: remove unused const 'dockerPullProgressEmitInterval'
2018-05-07 16:24:48 -04:00
Michael Schurter
f1d13683e6
consul: remove services with/without canary tags
...
Guard against Canary being set to false at the same time as an
allocation is being stopped: this could cause RemoveTask to be called
with the wrong Canary value and leaking a service.
Deleting both Canary values is the safest route.
2018-05-07 14:55:01 -05:00
Michael Schurter
50e04c976e
consul: support canary tags for services
...
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.
Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Alex Dadgar
df8fce4347
Ensure canaries tags are interpolated
2018-05-07 14:50:01 -05:00
Alex Dadgar
552604451c
rework where time gets set
2018-05-07 14:50:01 -05:00
Alex Dadgar
ee50789c22
Initial implementation
2018-05-07 14:50:01 -05:00
Nick Ethier
d8de354dbf
client/driver: add waiting layer status count to pull progress status msg
2018-05-07 12:18:20 -04:00
Nick Ethier
77af17efbc
client/driver: add seperate handler for emitting pull progress
2018-05-07 12:17:34 -04:00
Nick Ethier
0bdd976b7d
client/driver: remove pull timeout due to race condition that can lead to unexpected timeouts
...
If two jobs are pulling the same image simultaneously, which ever starts the pull first will set the pull timeout.
This can lead to a poor UX where the first job requested a short timeout while the second job requested a longer timeout
causing the pull to potentially timeout much sooner than expected by the second job.
2018-05-07 12:18:11 -04:00
Nick Ethier
7c5821d7c6
client/driver: do accounting on layer pull progress
2018-05-07 12:17:53 -04:00
Nick Ethier
8efda7dc6c
client/driver: emit progress to all allocs pulling same image
2018-05-07 12:17:34 -04:00
Nick Ethier
e35948ab91
client/driver: add image pull progress monitoring
2018-05-07 12:17:38 -04:00
Michael Schurter
0d534d30d6
Merge pull request #4251 from hashicorp/f-grpc-checks
...
Support Consul gRPC Health Checks
2018-05-04 14:55:16 -07:00
Michael Schurter
f6a4713141
consul: make grpc checks more like http checks
2018-05-04 11:08:11 -07:00
Michael Schurter
382caec1e1
consul: initial grpc implementation
...
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Jesus Vazquez
08a390448b
Update counter driver.docker.oom labels
2018-05-04 14:02:34 +08:00
Jesus Vazquez
4f6db56283
Initialize dockerhandle with jobname, taskgroupname, taskname and allocid
2018-05-04 14:02:19 +08:00
Jesus Vazquez
127b764dfb
Add Job, taskgroupname, taskname, and allocid to the DockerHandle struct
2018-05-04 14:01:26 +08:00
Jesus Vazquez
fd1ff1a0cf
Run goimports
2018-05-04 13:46:36 +08:00
Jesus Vazquez
5dd4059527
Add driver.docker counter metric for OOM Killer events
2018-05-04 13:46:36 +08:00
Michael Schurter
526af6a246
framer: fix early exit/truncation in framer
2018-05-02 10:46:16 -07:00
Michael Schurter
f1a6aa103a
framer: fix race and remove unused error var
...
In the old code `sending` in the `send()` method shared the Data slice's
underlying backing array with its caller. Clearing StreamFrame.Data
didn't break the reference from the sent frame to the StreamFramer's
data slice.
2018-05-02 10:46:16 -07:00
Michael Schurter
7360fe3a6d
client: squelch errors on cleanly closed pipes
2018-05-02 10:46:16 -07:00
Michael Schurter
ffff97e25f
client: don't spin on read errors
2018-05-02 10:46:16 -07:00
Michael Schurter
5ef0a82e6e
client: reset encoders between uses
...
According to go/codec's docs, Reset(...) should be called on
Decoders/Encoders before reuse:
https://godoc.org/github.com/ugorji/go/codec
I could find no evidence that *not* calling Reset() caused bugs, but
might as well do what the docs say?
2018-05-02 10:46:16 -07:00
Alex Dadgar
de4af37249
version bump and remove generated
2018-04-27 11:10:00 -07:00
Alex Dadgar
845a43864a
generated files
2018-04-27 10:45:40 -07:00
Alex Dadgar
35e06ddb31
Remove generated and version bump
2018-04-26 16:49:19 -07:00
Alex Dadgar
43192cefae
generated files
2018-04-26 16:28:58 -07:00
Michael Schurter
0e602d4779
Merge pull request #4188 from hashicorp/f-rkt-stats
...
rkt: create parent cgroup to enable stats
2018-04-24 14:54:36 -07:00
Michael Schurter
d687761ebf
rkt: test Stats() and always run tests
...
Remove the NOMAD_TEST_RKT flag as a guard for rkt tests. Still require
Linux, root, and rkt to be installed. Only check for rkt installation
once in hopes of speeding up rkt tests a bit.
2018-04-24 11:05:42 -07:00
Javier Palomo Almena
3e6c01ffa1
docker tests: Fix usage of NewDriverContext
2018-04-23 22:51:06 +02:00
Javier Palomo Almena
74d3c5df07
DriverContext: Add the TaskGroup and the Job name
...
Adding this fields to the DriverContext object, will allow us to pass
them to the drivers.
An use case for this, will be to emit tagged metrics in the drivers,
which contain all relevant information:
- Job
- TaskGroup
- Task
- ...
Ref: https://github.com/hashicorp/nomad/pull/4185
2018-04-23 00:15:29 +02:00
Michael Schurter
4cee6cca6c
rkt: create parent cgroup to enable stats
...
Having the Nomad executor create parent cgroups that rkt is launched
within allows the stats collection code used for the exec driver to Just
Work. The only downside is that now the Nomad executor's resource
utilization counts against the cgroups resource limits just as it does
for the exec driver.
2018-04-19 15:14:56 -07:00
Michael Schurter
1a85d0c990
run goimports
2018-04-19 11:16:28 -07:00
Michael Schurter
d77c265d1f
Merge pull request #4168 from ninoles/b-2117-windows-group-process
...
B 2117 windows group process
2018-04-19 11:10:51 -07:00
Michael Schurter
fdbcbd4e5b
Merge pull request #4058 from hashicorp/f-mock-by-default
...
[Post-0.8] test: build with mock_driver by default
2018-04-18 15:57:00 -07:00
Michael Schurter
d3650fb2cd
test: build with mock_driver by default
...
`make release` and `make prerelease` set a `release` tag to disable
enabling the `mock_driver`
2018-04-18 14:45:33 -07:00
Michael Schurter
a991923389
tests: fix race in alloc_runner_test.go
...
I could not reproduce the failure locally even with `stress -cpu ...`
eating all the cpu it could on my machine.
But I think the race was in one of two places:
* The task could restart which could create new events
* I think there could be a race between the updater's version of events
and alloc runners as updates are async
I fixed both. Here's hoping that fixes this flaky test.
2018-04-17 17:14:59 -07:00
Fabien Ninoles
c81bec48c9
Merge branch 'master' into b-2117-windows-group-process
2018-04-17 13:47:25 -04:00
Fabien Ninoles
35cf641416
Update based on PR request.
2018-04-17 13:43:04 -04:00
Alex Dadgar
c4ad76091d
Merge pull request #4166 from hashicorp/b-panic-fix-update
...
Fixes races accessing node and updating it during fingerprinting
2018-04-17 10:02:19 -07:00
Chelsea Holland Komlo
9b8a079558
fix up comments
2018-04-17 11:53:08 -04:00
Alex Dadgar
9d612c8cb0
Cleanup
2018-04-16 15:48:34 -07:00
Alex Dadgar
32adaf9dfc
Copy the config given to the alloc runner
2018-04-16 15:45:52 -07:00
Alex Dadgar
3ff2d4d795
fix race node access
2018-04-16 15:45:51 -07:00
Alex Dadgar
4f2a7b6949
Fix copying drivers
2018-04-16 15:45:51 -07:00
Alex Dadgar
0b799822ff
Operate on copy
2018-04-16 15:45:49 -07:00
Fabien Ninoles
27cf4995ce
- Clean up for windows compilation.
...
- Set CREATE_NEW_PROCESS_GROUP for Windows subprocess.
- Ensure we only kill actual process that need to.
2018-04-14 13:58:42 -04:00
Michael Schurter
3836b8a335
Merge pull request #3572 from emate/master
...
Create new process group on process startup.
2018-04-13 11:56:38 -07:00
Alex Dadgar
adaf4fa7e0
Remove generated structs
2018-04-12 16:35:31 -07:00
Alex Dadgar
663c4d0433
Version bump and generated files
2018-04-12 16:21:50 -07:00
Alex Dadgar
ff1a1a63e8
Move where attribute for driver detection is set
2018-04-12 15:50:25 -07:00
Chelsea Holland Komlo
5291788b40
delete driver name from only health check attributes
2018-04-12 18:24:41 -04:00
Alex Dadgar
3d53d380f7
Fix tests
2018-04-12 14:29:30 -07:00
Alex Dadgar
f24ce2c50c
Driver health detection cleanups
...
This PR does:
1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Charlie Voiselle
ba88f00ccb
Changed "til" to "until"
...
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Andrei Burd
502d17fa90
Added node class to tagged metrics
2018-04-11 12:20:59 +03:00
Chelsea Komlo
eb5aac16e6
Merge pull request #4111 from hashicorp/b-undetected-set-health-to-false
...
Immediately set driver health status to false when driver moves to undetected
2018-04-10 18:30:31 -04:00
Chelsea Holland Komlo
d58b3e473c
update comment for when the fingerprinter setting health status
2018-04-10 16:53:00 -04:00
Chelsea Holland Komlo
f7ef13cc64
fingerprinter should set health check status if health check is not periodic
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
ede4f518bd
add setters for access to the fingerprint manager's node
...
refactor extracting driver info
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
f479da19f5
guard against overwriting health status
2018-04-10 15:29:51 -04:00
Chelsea Holland Komlo
ece1618815
immediately set healthy to false when driver moves to undetected
2018-04-10 15:29:51 -04:00
Alex Dadgar
3d367d6fd7
Fix client uptime metric missing client prefix
2018-04-10 10:39:36 -07:00
Seth Vargo
df4fe7e76c
Set user-agent when talking to GCE metadata
2018-04-10 10:36:46 -04:00
Chelsea Komlo
d3bd8fb96e
Merge pull request #4109 from hashicorp/f-shorten-docker-health-timeout
...
Shorten docker health timeout
2018-04-09 15:38:39 -04:00
Chelsea Holland Komlo
ea4b65dd41
only initialize docker clients if they are nil
2018-04-09 14:13:07 -04:00
Chelsea Holland Komlo
288c7a33a1
refacotoring simplification from code review
2018-04-09 10:34:17 -04:00
Chelsea Holland Komlo
6e3b056c37
only run health check if driver moves from undetected to detected
2018-04-09 10:10:43 -04:00
Alex Dadgar
ae1f76477e
Start rebalance after discovering new servers
2018-04-05 15:41:59 -07:00
Alex Dadgar
929b6823a3
Merge pull request #4106 from hashicorp/b-servers
...
Improved Client handling of failed RPCs
2018-04-05 13:48:50 -07:00
Alex Dadgar
be2513e0f9
more jitter
2018-04-05 13:48:33 -07:00
Chelsea Holland Komlo
d3637825ef
group similar functions; update comments
...
health check timeout should be 1 minute
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
e8743f1f7b
remove do once block when creating a new docker client
...
only set cached connections upon no error
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
d0d793fc23
use client with shorter timeouts for health checks
2018-04-05 16:19:02 -04:00
Chelsea Holland Komlo
5d1b2b77cb
refactor docker clients method to be able to extend to creating new clients
2018-04-05 16:19:02 -04:00
Alex Dadgar
bd3345942c
Handle no leader and faster retries near limit
...
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar
279b5c22e5
Scale heartbeat retrying based on remaining heartbeat time
2018-04-05 10:58:13 -07:00
Alex Dadgar
7941f4eb2d
Fire retry only when consul discovers new servers
2018-04-05 10:40:17 -07:00
Preetha
6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
...
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
12ba4c45da
remove outdated commented out test code
2018-04-04 15:03:24 -05:00
Preetha Appan
6363a6fb4d
Remove old comment
2018-04-04 15:01:48 -05:00
Preetha Appan
5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests.
2018-04-04 14:38:15 -05:00
Alex Dadgar
86c32358d4
Spelling error
2018-04-03 18:30:01 -07:00
Alex Dadgar
01a6beafbf
RPC Retry Watcher
2018-04-03 18:05:28 -07:00
Preetha Appan
e6bbce3fa0
Add comment
2018-04-03 19:49:03 -05:00
Alex Dadgar
ec844f19d9
randomize servers
2018-04-03 17:46:13 -07:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Alex Dadgar
86f9044676
remove generated files
2018-03-30 16:52:49 -07:00
Alex Dadgar
af81349dbe
Generated files
2018-03-30 16:14:40 -07:00
Michael Schurter
257ba5937d
test: don't rely on alloc runner update count
...
We were incorrectly relying on the count of alloc updates in a number of
tests. Since alloc updates are async, their number is non-determinstic
and largely meaningless.
This should fix quite a few flaky tests in Travis and prevent future
mistaken assumptions in tests.
2018-03-30 09:34:33 -07:00
Michael Schurter
62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
...
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
beee130a6e
Always capture the finish time
2018-03-29 11:27:22 -07:00
Michael Schurter
91b5bb58d9
add HasHealth helper for nil checks
...
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
4338360da9
Merge pull request #4065 from hashicorp/emit-node-event-on-first-health-change
...
Emit first node event after initialization on health status change
2018-03-29 11:23:25 -04:00
Chelsea Holland Komlo
2174ede6b9
add clarifying comment
2018-03-29 10:58:39 -04:00
Michael Schurter
3a79c32677
Merge pull request #4059 from hashicorp/b-drain-health-svc-only
...
only service allocs should have health watched
2018-03-28 16:49:22 -07:00
Michael Schurter
5eb0cb7176
only service allocs should have health watched
2018-03-28 16:20:11 -07:00
Chelsea Holland Komlo
e3319afee1
emit first node event
2018-03-28 17:26:53 -04:00
Chelsea Komlo
7812ac5abf
Merge pull request #4057 from hashicorp/specify-docker-msg
...
Specify docker name in driver health messages
2018-03-28 13:32:36 -04:00
Preetha
177d2d6010
Merge pull request #4052 from hashicorp/f-specify-total-memory
...
Allow to specify total memory on agent configuration
2018-03-28 12:28:41 -05:00
Chelsea Holland Komlo
efc03e252c
specify driver health messages
2018-03-28 11:35:21 -04:00
Preetha Appan
329428b49f
Code review feedback and unit test
2018-03-28 10:07:15 -05:00
Charlie Voiselle
ea10588227
rkt: logging enhancements ( #4044 )
...
* Added extra debug logging; extended timeout; added jitter.
* small log changes
* increase timeout
* remove unneccessary uuid
2018-03-27 17:30:06 -07:00
Michael Schurter
fcaee471a0
client: always mark exited sys/svc allocs as failed
...
When restarts.attempts=0 was set in a jobspec a system or service alloc
that exited with 0 status would be marked as `completed` instead of
`failed`. Since system and service jobs are intended to run until
stopped or updated, they should always be marked as failed when they
exit even in cases where the exit code is 0.
2018-03-27 14:30:19 -07:00
Mildred Ki'Lya
1017cbe8ab
Allow to specify total memory on agent configuration
...
Allow to set the total memory of an agent in its configuration file. This
can be used in case the automatic detection doesn't work or in specific
environments when memory overcommit (using swap for example) can be
desirable.
2018-03-27 15:46:18 -05:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Alex Dadgar
432784dae3
Fix alloc watcher snapshot streaming
2018-03-27 11:14:53 -07:00
Alex Dadgar
05449fea09
drop stats fetching log
2018-03-23 12:01:50 -07:00
Chelsea Komlo
5f0c382021
Merge pull request #4030 from hashicorp/health-check-ux
...
UX improvments to driver health checks
2018-03-23 09:46:50 -04:00
Alex Dadgar
da27fc3880
Driver Info output
2018-03-22 17:18:32 -07:00
Chelsea Holland Komlo
e9005d8cfb
ux improvments to driver health checks
2018-03-22 18:38:29 -04:00
Michael Schurter
a318684738
Merge pull request #4022 from hashicorp/f-more-executor-logging
...
executor: increase level for helpful log lines
2018-03-22 15:21:20 -07:00
Michael Schurter
a4f346abeb
remove spurious TODOs and FIXMEs
2018-03-21 16:55:22 -07:00
Michael Schurter
8b346c6176
test: try to prevent flakiness on travis
2018-03-21 16:51:45 -07:00
Michael Schurter
1b7ac447e9
alloc_runner: watch health for deployed batch jobs
2018-03-21 16:51:45 -07:00
Michael Schurter
62960ed7bd
client: don't monitor health of non-service jobs
...
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
a37329189a
Improve DeadlineTime helper
2018-03-21 16:51:44 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter
bb0ff44fb4
mock_driver: improve Kill() logging
2018-03-21 16:49:48 -07:00
Michael Schurter
c0542474db
drain: initial drainv2 structs and impl
2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo
f329e45e03
always set initial health status for every driver
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
bbaffe3eca
set driver to unhealthy once if it cannot be detected in periodic check
2018-03-21 15:15:26 -04:00
Alex Dadgar
5df4b3728d
Docker driver doesn't return errors but injects into the DriverInfo
2018-03-21 15:15:26 -04:00
Alex Dadgar
4365bb7f59
Only run health check if driver is detected
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
f801709a0a
fix issue when updating node events
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
285729aee2
function rename and re-arrange functions in fingerprint_manager
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
60f12d206f
improve comments; update watchDriver
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
739784736a
remove unused function
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d92703617c
simplify logic
...
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
86b7b3d2d9
fix up health check logic comparison; add node events to client driver checks
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
53a5bc2bb3
Code review feedback
2018-03-21 15:15:26 -04:00
Alex Dadgar
34dc58421c
notes from walk through
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
44b6951dda
improve tests
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d740a6a46e
refresh driver information for non-health checking drivers periodically
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d8f68e5ef8
fix up codereview feedback
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d5f6c940c4
fix up racy tests
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
0425be8f48
updating comments; locking concurrent node access
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c50d02ae93
go style; update comments
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3aa726baab
fix scheduler driver name; create node structs file
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3cba95e8a7
allow nomad to schedule based on the status of a client driver health check
...
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
0bde357731
add concept of health checks to fingerprinters and nodes
...
fix up feedback from code review
add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Michael Schurter
1022170bf3
executor: increase level for helpful log lines
...
Should help with debugging issues like #3971
2018-03-21 11:53:58 -07:00
Marcin Matlaszek
6019a88824
Make raw_exec processes cleanup function more precise.
2018-03-20 13:40:21 +01:00
Marcin Matlaszek
bb36c122e2
Fix errors when trying to kill whole process group.
2018-03-20 13:40:21 +01:00
Marcin Matlaszek
86d650d7b0
Make starting & cleaning process group Windows compatible.
2018-03-20 13:40:21 +01:00
Marcin Matlaszek
79c139f2ef
Create new process group on process startup.
...
Clean up by sending SIGKILL to the whole process group.
2018-03-20 13:40:21 +01:00
Michael Schurter
1044bc0feb
Merge pull request #3984 from hashicorp/f-loosen-consul-skipverify
...
Replace Consul TLSSkipVerify handling
2018-03-16 11:21:28 -07:00
Michael Schurter
32ee5e0d53
Merge pull request #3990 from hashicorp/f-rkt-groups
...
rkt: allow specifying --group
2018-03-16 11:19:53 -07:00
Michael Schurter
bd78cfb039
rkt: allow specifying --group
2018-03-16 11:08:22 -07:00
Michael Schurter
fb10ec9c01
docker: make volume errors recoverable
...
The interface+mock just to test this one little error handling may seem
like overkill but there was just no other way to write an automated test
around this logic as there's no way to simluate this error with stock
Docker.
2018-03-15 17:52:43 -07:00
Michael Schurter
0971114f0c
Replace Consul TLSSkipVerify handling
...
Instead of checking Consul's version on startup to see if it supports
TLSSkipVerify, assume that it does and only log in the job service
handler if we discover Consul does not support TLSSkipVerify.
The old code would break TLSSkipVerify support if Nomad started before
Consul (such as on system boot) as TLSSkipVerify would default to false
if Consul wasn't running. Since TLSSkipVerify has been supported since
Consul 0.7.2, it's safe to relax our handling.
2018-03-14 17:43:06 -07:00
Preetha Appan
3c38eededd
Fix spelling in comment
2018-03-14 15:54:25 -05:00
Alex Dadgar
bef4a8ee09
fix clearing node events
2018-03-14 09:48:59 -07:00
Chelsea Komlo
810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
...
Add node events
2018-03-14 08:42:55 -04:00
Preetha
360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
...
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar
de6ebb6e6c
small cleanup
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
b41501e442
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8655320fd
fix up go check warnings
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
0934769b04
add client side emitting of node events
...
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan
914eaed64f
Address some code review comments
2018-03-13 18:19:16 -05:00
Preetha Appan
09c231ce43
Return the err from server correctly
2018-03-13 18:10:14 -05:00
Preetha Appan
9618f52746
Remove error wrapping and make vault connection server side errors clearer.
2018-03-13 17:09:03 -05:00
Michael Schurter
79df90acb0
Merge pull request #3958 from simplesurance/swappiness
...
fix: disable swap for executor_linux allocations
2018-03-13 10:10:22 -07:00
Fabian Holler
e6af051c93
fix: disable swap for executor_linux allocations
...
A comment in the nomad source code states that swapping for
executor_linux allocations is disabled but it wasn't.
Nomad wrote -1 to the memsw.limit_in_bytes cgroup file to disable
swapping.
This has the following problems:
1.) Writing -1 to the file does not disable swapping. It sets
the limit for memory and swap to unlimited.
2.) On common Linux distributions like Ubuntu 16.04 LTS the
memsw.limit_in_bytes cgroup file does not exist by default.
The memsw.limit_in_bytes file only exist if the Linux kernel is
build with CONFIG_MEMCG_SWAP=yes and either
CONFIG_MEMCG_SWAP_ENABLED=yes or when the kernel parameter
swapaccount=1 is passed during boot.
Most Linux distributions disable swap accounting by default because
of higher memory usage.
Nomad silently ignores if writing to the memsw.limit_in_bytes file
fails. The allocation succeeds, no message is logged to notify the
user.
To ensure that disabling swap works on common Linux kernels, disable
swapping by writing 0 to the memory.swappiness file.
Using the memory.swappiness file only requires that the kernel is
compiled with CONFIG_MEMCG=yes. This is the default in common Linux
kernels.
2018-03-13 10:52:50 +01:00
Alex Dadgar
4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
...
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Michael Schurter
7dd7fbcda2
non-Existent -> nonexistent
...
Reverting from #3963
https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref
18c5659474
spelling: version
2018-03-11 19:13:25 +00:00
Josh Soref
6222bd564e
spelling: verify
2018-03-11 19:13:32 +00:00
Josh Soref
1359fd2c3d
spelling: unexpected
2018-03-11 19:08:07 +00:00
Josh Soref
173ce63fe9
spelling: transition
2018-03-11 19:06:05 +00:00
Josh Soref
782c704de6
spelling: thresholds
2018-03-11 19:03:47 +00:00
Josh Soref
ac6d3767da
spelling: terminated
2018-03-11 19:01:49 +00:00
Josh Soref
2dda6abab9
spelling: templates
2018-03-11 19:01:39 +00:00
Josh Soref
8978caea28
spelling: shutdown
2018-03-11 18:55:49 +00:00
Josh Soref
8d191c9273
spelling: severity
2018-03-11 18:53:52 +00:00
Josh Soref
a79eccaa58
spelling: service
2018-03-11 18:53:47 +00:00
Josh Soref
8149694f3a
spelling: server
2018-03-11 18:55:30 +00:00
Josh Soref
3787d8141e
spelling: serialize
2018-03-11 18:53:39 +00:00
Josh Soref
e37626561c
spelling: semantics
2018-03-11 19:00:26 +00:00
Josh Soref
e4639ac62f
spelling: secrets
2018-03-11 18:53:26 +00:00
Josh Soref
cec45c6bc8
spelling: safety
2018-03-11 18:52:54 +00:00
Josh Soref
de9d0c7180
spelling: retrieved
2018-03-11 18:51:40 +00:00
Josh Soref
e949d23e1b
spelling: resource
2018-03-11 18:51:03 +00:00
Josh Soref
82221f9a2b
spelling: represents
2018-03-11 18:42:29 +00:00
Josh Soref
1c3b60ae70
spelling: replace
2018-03-11 18:41:53 +00:00
Josh Soref
b47ab9ab8c
spelling: removes
2018-03-11 18:41:43 +00:00
Josh Soref
db166c6cf6
spelling: remnants
2018-03-11 18:41:26 +00:00
Josh Soref
258d76ec13
spelling: registry
2018-03-11 18:41:13 +00:00
Josh Soref
7ad77f568b
spelling: purposes
2018-03-11 18:39:35 +00:00
Josh Soref
6fa892a463
spelling: propagated
2018-03-11 18:39:26 +00:00
Josh Soref
1a8204fa11
spelling: previous
2018-03-11 18:38:23 +00:00
Josh Soref
f764e5552a
spelling: periodically
2018-03-11 18:36:59 +00:00
Josh Soref
96e47bd4c1
spelling: parallelism
2018-03-11 18:35:54 +00:00
Josh Soref
3c1ce6d16d
spelling: otherwise
2018-03-11 18:34:27 +00:00
Josh Soref
96dba3e267
spelling: mount
2018-03-11 18:27:18 +00:00
Josh Soref
13e5fb8221
spelling: malicious
2018-03-11 18:26:25 +00:00
Josh Soref
1ef6d6319e
spelling: labels
2018-03-11 18:21:44 +00:00
Josh Soref
b6ec60fb5f
spelling: isolation
2018-03-11 18:19:02 +00:00
Josh Soref
337ac13f0a
spelling: interpolation
2018-03-11 18:16:36 +00:00
Josh Soref
75d1240446
spelling: interface
2018-03-11 18:15:37 +00:00
Josh Soref
c1a0ae3161
spelling: inspect
2018-03-11 18:15:27 +00:00
Josh Soref
2a1cf2f216
spelling: initialization
2018-03-11 18:18:37 +00:00
Josh Soref
b293b48287
spelling: idempotent
2018-03-11 18:14:50 +00:00
Josh Soref
52b83328fc
spelling: heartbeating
2018-03-11 18:12:19 +00:00
Josh Soref
3ad579930e
spelling: fingerprint
2018-03-11 18:07:37 +00:00
Josh Soref
7f6e4012a0
spelling: existent
2018-03-11 18:30:37 +00:00
Josh Soref
7cd95f6eb3
spelling: executor
2018-03-11 18:05:31 +00:00
Josh Soref
b9ce8b9e37
spelling: each
2018-03-11 17:56:19 +00:00
Josh Soref
0fc23b0ba3
spelling: down
2018-03-11 17:55:47 +00:00
Josh Soref
e8478c4065
spelling: documentation
2018-03-11 17:55:21 +00:00
Josh Soref
4241ffc5ab
spelling: disable
2018-03-11 17:55:12 +00:00
Josh Soref
858b9e809f
spelling: directory
2018-03-11 17:55:06 +00:00
Josh Soref
09970343b5
spelling: destruction
2018-03-11 17:54:39 +00:00
Josh Soref
2f135f0ed7
spelling: destroy
2018-03-11 17:54:13 +00:00
Josh Soref
97dc9a00c0
spelling: default
2018-03-11 17:52:58 +00:00
Josh Soref
aaa6e104ed
spelling: could
2018-03-11 17:51:47 +00:00
Josh Soref
c9b86bbc2f
spelling: controls
2018-03-11 17:50:39 +00:00
Josh Soref
f2a7c95379
spelling: constraints
2018-03-11 17:50:28 +00:00
Josh Soref
cb1303e47a
spelling: conjunction
2018-03-11 17:48:37 +00:00
Josh Soref
42fa13bbc6
spelling: cancelled
2018-03-11 17:45:47 +00:00
Josh Soref
7077386916
spelling: cancelable
2018-03-11 17:45:34 +00:00
Josh Soref
a70fe97556
spelling: assert
2018-03-11 17:41:33 +00:00
Josh Soref
58b794875f
spelling: artifact
2018-03-11 17:41:02 +00:00
Josh Soref
e78cf9c81a
spelling: already
2018-03-11 17:39:04 +00:00
Josh Soref
b8b46d3f74
spelling: allocation
2018-03-11 17:37:22 +00:00
Josh Soref
e87b0a4d86
spelling: alloc
2018-03-11 17:36:34 +00:00
Josh Soref
b67449796a
spelling: added
2018-03-11 17:34:28 +00:00
Chelsea Komlo
bd88877249
Merge pull request #3909 from hashicorp/b-node-attributes-concurrent-access
...
Fingerprinters accessing node information should be thread safe
2018-03-06 11:57:46 -05:00
Chelsea Komlo
7c7e2f4d0b
Merge pull request #3873 from hashicorp/r-edge-trigger-node-watcher
...
Edge trigger node updates
2018-03-01 15:18:59 -05:00
Chelsea Holland Komlo
122d1c4e4a
simplify retry logic
2018-03-01 09:48:26 -05:00
Michael Schurter
557a70f78d
Merge pull request #3917 from jaininshah9/master
...
changing the formula to correctly pass the CPUQota to docker
2018-02-28 20:00:37 -08:00
Jainin Shah
39e1fc06e5
adding comments to the change
2018-02-28 16:19:51 -08:00
Preetha Appan
eaedffc7f7
Fix go vet errors
2018-02-28 12:21:27 -06:00
Chelsea Holland Komlo
355805db56
reset timer after updating node copy
2018-02-27 17:18:10 -05:00
Jainin Shah
6eb7da002f
changing the formula to correctly pass the CPUQota to docker
2018-02-27 12:32:23 -08:00
Chelsea Holland Komlo
a72aaaf47f
add network resources equal method, use time ticker
...
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo
e736e31820
use time ticker, update how network resources are compared
2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo
5059065b52
improved testing; node networks comparison
2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo
b7bcd0b59f
fingerprinters accessing node information should be thread safe
2018-02-26 15:25:54 -05:00
Chelsea Holland Komlo
1f31b39fe8
code review fixups
2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo
ed8c8afbcd
edge trigger node update
...
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar
49a47483d1
Registering back to initializing
...
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.
It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar
eff4455c68
Fix original client server list behavior
2018-02-15 16:04:53 -08:00
Alex Dadgar
0ebf7f3b7f
remove tmp file
2018-02-15 15:51:27 -08:00
Alex Dadgar
f9cf642436
Client tls
2018-02-15 15:22:57 -08:00
Alex Dadgar
0e85ae77b4
fix flaky gc tests
2018-02-15 13:59:03 -08:00
Alex Dadgar
38b695b69c
feedback and rebasing
2018-02-15 13:59:03 -08:00
Alex Dadgar
9117ef4650
HTTP agent
2018-02-15 13:59:03 -08:00
Alex Dadgar
d7029965ca
Server side impl + touch ups
2018-02-15 13:59:02 -08:00
Alex Dadgar
ce0caccad2
client implementation of alloc gc and stats
2018-02-15 13:59:02 -08:00
Alex Dadgar
e685211892
Code review feedback
2018-02-15 13:59:02 -08:00
Alex Dadgar
a9c4f8a4c8
clarify force
2018-02-15 13:59:02 -08:00
Alex Dadgar
dc75501c69
Respond to comments
2018-02-15 13:59:02 -08:00
Alex Dadgar
cea77df6a7
Add Streaming RPC ack
...
This PR introduces an ack allowing the receiving end of the streaming
RPC to return any error that may have occured during the establishment
of the streaming RPC.
2018-02-15 13:59:02 -08:00
Alex Dadgar
2f9d33f479
vet
2018-02-15 13:59:02 -08:00
Alex Dadgar
f5f43218f5
HTTP and tests
2018-02-15 13:59:02 -08:00
Alex Dadgar
6546b43a17
Client implementation of stream
2018-02-15 13:59:02 -08:00
Alex Dadgar
9a5569678c
Client Stat/List impl
2018-02-15 13:59:02 -08:00
Alex Dadgar
8854b35b34
Agent logs
2018-02-15 13:59:02 -08:00
Alex Dadgar
857b0ab6c7
client tests
2018-02-15 13:59:02 -08:00
Alex Dadgar
69def2ff22
Server tests of logs
2018-02-15 13:59:02 -08:00
Alex Dadgar
9479cb7f25
Remove logging
2018-02-15 13:59:01 -08:00
Alex Dadgar
14f57024b7
test stream framer
2018-02-15 13:59:01 -08:00
Alex Dadgar
ddd67f5f11
Server streaming
2018-02-15 13:59:01 -08:00
Alex Dadgar
ca9379be09
Logs over RPC w/ lots to touch up
2018-02-15 13:59:01 -08:00
Alex Dadgar
2c0ad26374
New RPC Modes and basic setup for streaming RPC handlers
2018-02-15 13:59:01 -08:00
Alex Dadgar
fea0e69d4f
wip fs endpoint
2018-02-15 13:59:01 -08:00
Alex Dadgar
b5037f20db
Remove circular dependency
2018-02-15 13:59:01 -08:00
Alex Dadgar
9bc75f0ad4
Fix manager tests and make testagent recover from port conflicts
2018-02-15 13:59:01 -08:00
Alex Dadgar
feb943c873
Fix lint/comments
2018-02-15 13:59:01 -08:00
Alex Dadgar
ac67da3b06
Unjankify the pkg
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f1f8604bb
initial round of comment review
2018-02-15 13:59:01 -08:00
Alex Dadgar
e03b074650
Plumb config
2018-02-15 13:59:01 -08:00
Alex Dadgar
05c4fe8675
Change defaults for min use duration
2018-02-15 13:59:01 -08:00
Alex Dadgar
c8c1284bc3
SetServer command actually returns an error if given an invalid server
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f786b904b
use server manager
2018-02-15 13:59:01 -08:00
Alex Dadgar
b24b05e025
Remove testing
2018-02-15 13:59:01 -08:00
Alex Dadgar
4e1cb1d96e
Test RPC from server
2018-02-15 13:59:00 -08:00
Alex Dadgar
6dd1c9f49d
Refactor
2018-02-15 13:59:00 -08:00
Alex Dadgar
a6dfffa4fa
Add testing interfaces
2018-02-15 13:59:00 -08:00
Alex Dadgar
d918f9bd5c
RPC Listener
2018-02-15 13:59:00 -08:00
Alex Dadgar
1472b943d6
Stats Endpoint
2018-02-15 13:59:00 -08:00
Chelsea Komlo
0c0b56a1a4
Merge pull request #3807 from hashicorp/f-client-add-fingerprint-manager
...
Add fingerprint manager to manage fingerprinting node
2018-02-13 11:22:50 -05:00
Chelsea Holland Komlo
b321287712
extract test helper
...
lock concurrent accesses to node
comment exported method
2018-02-12 18:30:10 -05:00
Michael Schurter
101e85f078
Merge pull request #3819 from schmichael/qemu-graceful-shutdown-alpine
...
Test QEMU graceful shutdown
2018-02-12 12:32:14 -08:00
Michael Schurter
ed6bce2ccf
Improve test logging
2018-02-12 11:25:52 -08:00
Michael Schurter
06397ba59d
Merge pull request #3825 from jaininshah9/master
...
add a flag for cpu_hard_limit
2018-02-08 20:40:38 -08:00
Michael Schurter
6e6915e7f5
Merge branch 'master' into f-cpu_hard_limit
2018-02-08 20:14:29 -08:00
Alan Scherger
eee7144643
drivers: use ctx.TaskEnv for mount points
2018-02-08 12:59:20 -06:00
Jainin Shah
a4516aa71a
removing underscore in variable name
2018-02-07 16:28:43 -08:00
Chelsea Holland Komlo
4a26959825
code review feedback
2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo
d626d24488
remove dependency on client for fingerprint manager
2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo
e012e5ab8a
add fingerprint manager
2018-02-07 18:10:33 -05:00
Jainin Shah
8149587abe
clearing the confusion between microsecond,nanosecond and millisecond
2018-02-06 19:11:39 -08:00
Jainin Shah
d3087d6069
using d.node.Resources.CPU as suggested
2018-02-06 14:52:15 -08:00
Michael Schurter
279a3b3f28
Merge pull request #3790 from 42wim/dockerv6
...
Service registration for IPv6 docker addresses (Fixes #3785 )
2018-02-05 17:07:53 -08:00
Michael Schurter
25f0ad050f
docker: Skip IPv6 test if IPv6 disabled
2018-02-05 16:24:30 -08:00
Chelsea Komlo
42d20234a3
Merge pull request #3781 from hashicorp/f-client-fingerprint-refactor
...
Refactor client fingerprinters to return a diff of node attributes
2018-02-01 20:13:44 -05:00
Chelsea Holland Komlo
b21233fe23
update log message
2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo
6f9c0ab361
req/resp should be within config locks; rename for detected fingerprints
...
changelog
2018-02-01 19:00:39 -05:00
Wim
a1a2ca8e33
Add AdvertiseIPv6Address test
2018-02-01 23:21:47 +01:00
Jainin Shah
94d0ce6006
wrapping the line to less than 80 characters
2018-02-01 14:16:38 -08:00
Jainin Shah
0d99f256de
changes after running go fmt
2018-02-01 12:07:05 -08:00
Jainin Shah
04c14b3cb2
add a flag for cpu_hard_limit
2018-02-01 10:09:12 -08:00
Chelsea Holland Komlo
d889e471a2
fix up linting
2018-02-01 12:26:38 -05:00
Chelsea Holland Komlo
b54203eddc
add detected to more drivers where the driver is found but unusable
2018-02-01 11:28:17 -05:00
Michael Schurter
0ac43a7622
Skip QEMU graceful shutdown test except on Travis
...
Hopefully we can reuse the SkipSlow helper elsewhere.
2018-01-31 15:47:26 -08:00
Chelsea Holland Komlo
b8e8064835
code review fixup
2018-01-31 18:34:03 -05:00
Michael Schurter
24d060bbb4
Test graceful shutdown
...
Uses an Alpine image which supports ACPI poweroff signal handling.
Handling is only enabled after the VM has booted, so this test blocks
until sshd starts before issuing the command.
2018-01-31 15:05:02 -08:00
Wim
db3bdfe898
* Change use_ipv6_address to advertise_ipv6_address.
...
* Set autoadvertise to true.
* Update documentation.
2018-02-01 00:01:25 +01:00
Chelsea Holland Komlo
7b53474a6e
add applicable boolean to fingerprint response
...
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Michael Schurter
cc54e36f91
Merge pull request #3798 from simar7/qemu-graceful-shutdown-bug
...
[QEMU] Fixing an unintentional variable shadowing
2018-01-30 17:43:44 -08:00
Michael Schurter
c662cc0172
Merge pull request #3773 from mikemccracken/2018-01-18/destroy-container-on-err
...
lxc: cleanup partially configured containers after errors in Start
2018-01-30 14:52:29 -08:00
Chelsea Holland Komlo
9482c322b7
locks for fingerprint reads/writes
2018-01-30 11:32:45 -05:00
Wim
76f09db067
Service registration for IPv6 docker addresses
2018-01-30 17:07:47 +01:00
Alex Dadgar
3ad5916f72
Merge pull request #3799 from mikemccracken/2018-01-25/lxc-log-outside-container
...
lxc: move lxc log file out of container-visible alloc dir
2018-01-29 14:32:22 -08:00
Chelsea Holland Komlo
14147c8496
remove attributes from periodic fingerprints when state changes
...
write test for client periodic fingerprinters
2018-01-29 13:48:54 -05:00
Alex Dadgar
3d28774f74
Merge pull request #3802 from filipochnik/docker-readonly-rootfs
...
Add ReadonlyRootfs option to the Docker driver
2018-01-29 09:47:27 -08:00
Indradhanush Gupta
7db4ee1122
rkt_test.go: Remove underscore from variable names
2018-01-29 11:39:50 +01:00
Filip Ochnik
80a17ee8dd
Add ReadonlyRootfs option to the Docker driver
2018-01-27 14:38:29 +01:00
Chelsea Holland Komlo
7c19de797c
create safe getters and setters for fingerprint response
2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo
896d6f8058
fixups from code review
2018-01-26 07:04:32 -05:00
Simarpreet Singh
ac720b84f0
qemu: Make the driver debugging output more indicative
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:40:16 -08:00
Simarpreet Singh
8b058f7570
qemu: Fix unintentional shadowing of monitorPath variable
...
Signed-off-by: Simarpreet Singh <simar@linux.com>
2018-01-25 16:24:10 -08:00
Michael McCracken
09c9ca23f5
lxc: move lxc log file out of container-visible alloc dir
...
The LXC runtime's log file is currently written to TaskDir.LogDir,
which is mounted as alloc/logs inside the containers in the task
group.
This file is not intended to be visible to containers, and depending
on the log level, may have information about the host that a container
should not be allowed to see.
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 14:41:37 -08:00
Michael McCracken
88e3063717
fix speling in log
...
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-25 13:56:14 -08:00
Chelsea Holland Komlo
3d38868b88
add test case for available cgroups
2018-01-25 06:08:07 -05:00
Chelsea Holland Komlo
9a8344333b
refactor Fingerprint to request/response construct
2018-01-24 11:54:02 -05:00
Michael McCracken
f8fe2ea8cb
review cleanup
...
don't export an internal function, and simplify some code
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-23 15:03:09 -08:00
Alex Dadgar
a43e0a7b08
Allow overriding an image's entrypoint in Docker
...
Fixes https://github.com/hashicorp/nomad/issues/2219
2018-01-23 14:05:00 -08:00
Alex Dadgar
98a03ad689
Merge pull request #3754 from filipochnik/docker-caps
...
Add an option to add and drop capabilities in the Docker driver
2018-01-23 12:02:50 -08:00
Chelsea Komlo
d09cc2a69f
Merge pull request #3492 from hashicorp/f-client-tls-reload
...
Client/Server TLS dynamic reload
2018-01-23 05:51:32 -05:00
Filip Ochnik
4abd269a68
Merge branch 'master' into docker-caps
2018-01-21 12:18:22 +01:00
Filip Ochnik
558812350d
Finish implementation of the capabilities whitelist
2018-01-21 12:14:24 +01:00
Michael McCracken
00dcfa6db9
lxc: cleanup partially configured containers after errors in Start
...
If there are any errors in container setup after c.Create() in
Start(), the container will be left around, with no way to clean it up
because the handle will not be created or returned from Start.
Added a wrapper that checks for errors and performs appropriate
cleanup. Returning a cleanup function from a wrapped function instead
of just doing the cleanup before returning the error helps to ensure
that future changes that might add or change error exits can't forget
to consider a cleanup function.
Adds a check to the invalid config test case to check that a container
created with an invalid config doesn't get left behind.
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 16:03:03 -08:00
Michael Schurter
38182bebea
Drop log level to TRACE
...
For people not using driver networks these log lines would just be
confusing.
2018-01-18 15:35:24 -08:00
Michael Schurter
9d410c88a7
Improve driver network logging
2018-01-18 15:35:24 -08:00
Michael Schurter
583e17fad5
Always advertise driver IP when in driver mode
...
Fixes #3681
When in drive address mode Nomad should always advertise the driver's IP
in Consul even when no network exists. This matches the 0.6 behavior.
When in host address mode Nomad advertises the alloc's network's IP if
one exists. Otherwise it lets Consul determine the IP.
I also added some much needed logging around Docker's network discovery.
2018-01-18 15:35:24 -08:00
Michael McCracken
70817f728c
lxc_test: add test for contents of file in bind-mounted dir
...
Ensure that bind mounting via the volumes config really did work.
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 05:36:45 -08:00
Michael McCracken
fd44bdee37
Simplify with gofmt -s
...
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken
f176e02a64
lxc: add tests for volume support
...
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken
c78c00a2d2
lxc: Add config flag to disable volume support
...
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Michael McCracken
d694a8921f
Add volumes config to LXC driver
...
Allow lxc driver to accept bind mount config similarly to the docker
driver.
Includes some static sanity checks in Validate step
Signed-off-by: Michael McCracken <mikmccra@cisco.com>
2018-01-18 04:17:42 -08:00
Chelsea Holland Komlo
649f86f094
refactor creating a new tls configuration
2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo
6c9f9c8ac3
adding additional test assertions; differentiate reloading agent and http server
2018-01-16 07:34:39 -05:00
Filip Ochnik
4eeb552a4f
Add a sketch of capabilities whitelist logic for the Docker driver
2018-01-14 20:01:47 +01:00
Filip Ochnik
8ee3ce7a26
Add an option to add and drop capabilities in the Docker driver
2018-01-14 19:56:57 +01:00
Alex Dadgar
bec9a72eec
Remove networking from basic resources
2018-01-12 14:33:42 -08:00
Charlie Voiselle
867bb6f7f9
Found more priviledge.
...
priviledge -> privilege
2018-01-12 09:44:53 -05:00
Alex Dadgar
9e1e04c6f1
Merge pull request #3727 from filipochnik/fix-gh-2832
...
Recognize renewing non-renewable Vault lease as fatal
2018-01-10 11:47:10 -08:00
Michael Schurter
189ce7f991
Merge pull request #3723 from hashicorp/b-3702-chown-dirs
...
chown dirs when migrating ephemeral_disk data
2018-01-09 09:27:26 -08:00
Michael Schurter
e6c27256b7
Test streamed directory ownership
2018-01-08 16:00:07 -08:00
Michael Schurter
2c79ffb213
chown dirs when migrating ephemeral_disk data
...
Fixes #3702
Added missing chown call and made it conditional on running as root and
not on Windows as we do with files.
2018-01-08 15:31:12 -08:00
Charlie Voiselle
1bb1ab5069
fix typo
...
Priviledge -> privilege
2018-01-08 15:56:07 -05:00
Chelsea Holland Komlo
214d128eb9
reload raft transport layer
...
fix up linting
2018-01-08 14:52:28 -05:00
Filip Ochnik
d265e11c36
Recognize renewing non-renewable Vault lease as fatal
2018-01-08 20:32:31 +01:00
Chelsea Holland Komlo
0708d34135
call reload on agent, client, and server separately
2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo
9741097406
reloading tls config should be atomic for clients/servers
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
ae7fc4695e
fixups from code review
...
Revert "close raft long-lived connections"
This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.
reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
acd3d1b162
fix up downgrading client to plaintext
...
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
c0ad9a4627
add ability to upgrade/downgrade nomad agents tls configurations via sighup
2018-01-08 09:21:06 -05:00
Michael Schurter
ef76c65da1
Lookup euid outside of loop
2017-12-13 11:50:12 -08:00
Michael Schurter
5032bf4f5a
Skip tests that require root when not root
...
Also skip Chown on allocdir migration on Windows and when non-root.
Windows doesn't support it, and it will always fail as a non-root user.
2017-12-12 16:58:27 -08:00
Alex Dadgar
f0b0697b57
Keyify struct
2017-12-11 17:23:14 -08:00
Michael Schurter
c4d4ead199
Fix test broken by mock updates
2017-12-08 16:45:25 -08:00
Michael Schurter
4b20441eef
Validate port label for host address mode
...
Also skip getting an address for script checks which don't use them.
Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
2017-12-08 12:03:43 -08:00
Michael Schurter
30dd570061
Fix interpolation bug with service/check updates
...
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
2017-12-08 12:03:00 -08:00
Michael Schurter
4347026f83
Test Consul from TaskRunner thoroughly
...
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
2017-12-08 12:03:00 -08:00
Alex Dadgar
a0d6b6a121
Merge pull request #3630 from hashicorp/b-periodic
...
Handle race between fingerprinters and registration
2017-12-07 16:11:13 -08:00
Alex Dadgar
91ffbbb517
Review feedback
2017-12-07 16:10:57 -08:00
Chelsea Komlo
c8e0cb3044
Merge pull request #3591 from hashicorp/b-1755-stop
...
Allow controlling the stop signal for drivers
2017-12-07 17:06:43 -05:00
Alex Dadgar
02baa6c52b
Handle race between fingerprinters and registration
2017-12-07 13:09:37 -08:00
Chelsea Holland Komlo
61fa8ad4ba
code review fixes
2017-12-07 13:46:25 -05:00
Chelsea Holland Komlo
77ab41124b
set default kill signal on executor shutdown
2017-12-07 11:40:15 -05:00
Chelsea Holland Komlo
6cae8fe6e6
extend configurable kill signal to java driver
2017-12-07 11:40:10 -05:00
Alex Dadgar
4409fdacc0
Drop trace logging
2017-12-06 18:02:24 -08:00
Alex Dadgar
cd9a7f14b8
Add logging around heartbeats
2017-12-06 17:57:50 -08:00
Chelsea Holland Komlo
350319239c
change location of default kill signal
2017-12-06 17:48:25 -05:00
Chelsea Holland Komlo
7dfb64f941
extract signal helper into utils
2017-12-06 14:36:44 -05:00
Chelsea Holland Komlo
b08611cfac
move kill_signal to task level, extend to docker
2017-12-06 14:36:39 -05:00
Chelsea Holland Komlo
80de7d5ebd
allow controlling the stop signal in exec/raw_exec
2017-12-06 11:28:45 -05:00
Chelsea Komlo
9ae849e09c
Merge pull request #3612 from hashicorp/docker-rkt-user
...
Set user for rkt tasks
2017-12-05 17:45:08 -05:00
Michael Schurter
b66aa5b7f6
Merge pull request #3563 from hashicorp/b-snapshot-atomic
...
Atomic Snapshotting / Sticky Volume Migration
2017-12-05 09:16:33 -08:00
Chelsea Holland Komlo
4463dc607e
fix up test
2017-12-05 10:12:40 -05:00
Chelsea Holland Komlo
7284f2385a
remove unused user option
2017-12-04 18:01:31 -05:00
Michael Schurter
6ccc4219d3
Merge pull request #3615 from hashicorp/b-rkt-host-ports
...
rkt: Don't require port_map with host networking
2017-12-04 14:49:42 -08:00
Chelsea Holland Komlo
7c74968452
add ability to specify user for rkt
2017-12-04 14:21:48 -05:00
Michael Schurter
2bf1d6d85e
rkt: Don't require port_map with host networking
...
Also don't try to return a DriverNetwork with host networking. None will
ever exist as that's the point of host networking: rkt won't create a
network namespace.
2017-12-01 17:23:25 -08:00
Chelsea Holland Komlo
4ee2122536
get KillTimeout in seconds, not nanoseconds
2017-12-01 10:43:00 -05:00
Michael Schurter
5e975bbd0f
Add comment and normalize err check ordering
...
as per PR comments
2017-11-29 17:26:11 -08:00
Michael Schurter
d996c3a231
Check for error file when receiving snapshots
2017-11-29 17:26:11 -08:00
Michael Schurter
ca946679f6
Destroy partially migrated alloc dirs
...
Test that snapshot errors don't return a valid tar currently fails.
2017-11-29 17:26:11 -08:00
Michael Schurter
23c66e37c5
Handle errors during snapshotting
...
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
2017-11-29 17:26:11 -08:00
Chelsea Holland Komlo
2208964948
Support StopTimeout for Docker tasksw
...
Update github.com/fsouza/go-dockerclient
2017-11-29 14:33:05 -05:00
Preetha Appan
6ad65c51e6
Missed assert in one place
2017-11-20 13:04:38 -06:00
Preetha Appan
747bd59daa
Better error validation, and added test case for invalid sysctl inputs
2017-11-20 12:07:18 -06:00
Preetha Appan
c68973747b
Address some review comments
2017-11-20 11:15:09 -06:00
Preetha Appan
39ef9ee76d
Fix gofmt warnings
2017-11-18 09:23:09 -06:00
Preetha Appan
e53dd15f58
Fix test compilation after rebase
2017-11-17 17:46:04 -06:00
Samuel BERTHE
0fca2e19c8
review(docker driver): sysctls -> sysctl + ulimits -> ulimit
2017-11-17 16:30:45 -06:00
Samuel BERTHE
6c93922cb7
Oops
2017-11-17 16:14:14 -06:00
Samuel BERTHE
c8363bc44b
💄
2017-11-17 16:03:22 -06:00
Samuel BERTHE
281ab90484
test(docker driver): testing sysctls and ulimits
2017-11-17 16:03:22 -06:00
Samuel BERTHE
b9a10ff7fa
feat(docker driver): adds sysctls and ulimits configs
2017-11-17 16:03:22 -06:00
Alex Dadgar
69d3bf7392
Merge pull request #3559 from hashicorp/b-metrics
...
Don't emit metrics for non-running tasks
2017-11-17 10:33:23 -08:00
Michael Schurter
3845c8d200
Merge pull request #3562 from hashicorp/b-3561-rkt-rm
...
Remove rkt pods when exiting
2017-11-16 17:30:21 -08:00
Michael Schurter
737fb45640
Merge pull request #3551 from hashicorp/b-3419-docker-409-bug
...
Fix Docker name conflict bug by updating dockerclient
2017-11-16 16:38:54 -08:00
Michael Schurter
437fce9954
Improve rktRemove error message
2017-11-16 15:45:14 -08:00
Michael Schurter
3ceec0caab
Remove rkt pods when exiting
...
Fixes #3561
2017-11-16 14:33:44 -08:00
Charlie Voiselle
7a231897a5
Merge pull request #3556 from angrycub/f-fingerprint-log-level
...
Dropped loglevel for AWS fingerprinter env read misses to DEBUG
2017-11-16 16:27:25 -05:00
Charlie Voiselle
969ddf9c2a
Lowered to DEBUG from AD feedback
2017-11-16 14:13:03 -05:00
Alex Dadgar
05b1588cea
Only publish metric when the task is running and dev mode publishes metrics
2017-11-15 13:21:06 -08:00
Alex Dadgar
07963f0b6d
Merge pull request #3546 from hashicorp/f-heuristic
...
Better interface selection heuristic
2017-11-15 12:51:21 -08:00
Alex Dadgar
97ec3974a9
Use interface attached to default route
2017-11-15 11:32:32 -08:00
Michael Schurter
f86f0bd9ea
Handle leader task being dead in RestoreState
...
Fixes the panic mentioned in
https://github.com/hashicorp/nomad/issues/3420#issuecomment-341666932
While a leader task dying serially stops all follower tasks, the
synchronizing of state is asynchrnous. Nomad can shutdown before all
follower tasks have updated their state to dead thus saving the state
necessary to hit this panic: *have a non-terminal alloc with a dead
leader.*
The actual fix is a simple nil check to not assume non-terminal allocs
leader's have a TaskRunner.
2017-11-15 10:36:13 -08:00
Charlie Voiselle
1197637251
Dropped loglevel for AWS fingerprinter env reads
...
Certain environments use WARN for serious logging; however, it's very
possible to have machines without some of the fingerprinted keys
(public-ipv4 and public-hostname specifcally). Setting log level to
INFO seems more consistent with this possibility.
2017-11-15 18:20:59 +00:00
Chelsea Komlo
2dfda33703
Nomad agent reload TLS configuration on SIGHUP ( #3479 )
...
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
3023336b39
Add a test demonstrating the bug
...
Fails on Docker 17.09, passes on Docker 17.06 and earlier
2017-11-14 15:25:52 -08:00
Alex Dadgar
ee31e15f51
Better interface selection heuristic
...
This PR introduces a better interface selection heuristic such that we
select interfaces with globally routable unicast addresses over link
local addresses.
Fixes https://github.com/hashicorp/nomad/issues/3487
2017-11-13 15:13:43 -08:00
Preetha Appan
926c9ed997
Make device mounting unit test verify configuration via docker inspect
2017-11-13 09:56:54 -06:00
Preetha Appan
dc2d5fb5a4
Unit test (linux only) that tests mounting a device in the docker driver
2017-11-13 09:56:54 -06:00
Preetha Appan
4834710e45
Add default value for cgroup permissions for device if not set
2017-11-13 09:56:54 -06:00
Preetha Appan
9cdee6991c
Remove unnecessary check since validate method already checks this
2017-11-13 09:56:54 -06:00
Preetha Appan
110c1fd4f0
Add support for passing device into docker driver
2017-11-13 09:56:54 -06:00
Alex Dadgar
d1358ec1b6
alway load all templates
2017-11-10 12:35:51 -08:00
Alex Dadgar
a3ea0c17a0
Handle multiple environment templates
...
Fixes https://github.com/hashicorp/nomad/issues/3498
2017-11-10 11:08:19 -08:00
Alex Dadgar
b3edc12dd9
Merge pull request #3411 from cheeseprocedure/f-qemu-graceful-shutdown
...
Qemu driver: graceful shutdown feature
2017-11-03 16:41:34 -07:00
Michael Schurter
690b8f4cfb
Remove noisy log line
...
Didn't mean to commit this
2017-11-03 16:00:30 -07:00
Matt Mercer
11e2870875
Qemu driver: clean up logging; fail unsupported features on Windows
2017-11-03 15:40:20 -07:00
Alex Dadgar
6034916ad1
fix spelling mistake
2017-11-03 15:04:59 -07:00
Alex Dadgar
a23033932a
Merge pull request #3459 from multani/docker-oom-notification
...
docker: log that a container has been killed by the OOM killer
2017-11-03 13:24:03 -07:00
Matt Mercer
cef9ba9770
Qemu driver: tweaks in response to PR feedback
...
Remove attribute for long qemu monitor path; misc cleanup; update tests
2017-11-03 11:28:56 -07:00
Preetha Appan
0eaef09675
Remove event GenericSource, and address other code review comments. Also added deprecation info in comments.
2017-11-03 10:10:06 -05:00
Preetha Appan
5f09c968b3
Move logic for determinic event display message to task_runner, added two new fields DisplayMessage and Details.
2017-11-03 09:13:01 -05:00
Alex Dadgar
b4af10edde
Alloc Runner doesn't panic on restoration.
2017-11-02 16:14:13 -07:00
Alex Dadgar
abd28cbd7d
Merge pull request #3493 from hashicorp/f-remove-atlas
...
Remove Atlas and Scada from codebase
2017-11-02 16:00:44 -07:00
Michael Schurter
eedbe8efbb
Merge pull request #3490 from hashicorp/f-gc-logging
...
Make unable-to-gc log level adaptive
2017-11-02 14:32:40 -07:00
Diptanu Choudhury
cb68889652
Added the node_id as a tag
2017-11-02 13:29:10 -07:00
Alex Dadgar
701f462d33
remove atlas
2017-11-02 11:27:21 -07:00
Michael Schurter
fc33c945be
Make unable-to-gc log level adaptive
...
WARNing when someone has over 50 non-terminal allocs was just too
confusing.
Tested manually with `gc_max_allocs = 10` and bumping a job from `count
= 19` to `count = 21`:
```
2017/11/02 17:54:21.076132 [INFO] client.gc: garbage collection due to number of allocations (19) is over the limit (10) skipped because no terminal allocations
...
2017/11/02 17:54:48.634529 [WARN] client.gc: garbage collection due to number of allocations (21) is over the limit (10) skipped because no terminal allocations
```
2017-11-02 10:57:42 -07:00
Diptanu Choudhury
8a9d0d40b1
Added support for tagged metrics
2017-11-02 10:07:57 -07:00
Diptanu Choudhury
5f522c6de3
Incrementing the start counter when we are actually starting a container
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
44535e5d10
Recording counter for dead allocs properly
2017-11-02 09:51:20 -07:00
Diptanu Choudhury
0b34e811b7
Added metrics to track task/alloc start/restarts/dead events
2017-11-02 09:51:20 -07:00