Alex Dadgar
f5ff509fa5
Refactor - wip
2018-06-12 10:23:45 -07:00
Chelsea Holland Komlo
f74e74b22d
add client logic to determine whether TLS RPC connections should reload
2018-06-08 14:38:58 -04:00
Chelsea Holland Komlo
064b5481e0
add server join info to server and client
2018-05-31 10:50:03 -07:00
Chelsea Holland Komlo
38f611a7f2
refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
...
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Chelsea Holland Komlo
796bae6f1b
allow configurable cipher suites
...
disallow 3DES and RC4 ciphers
add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Chelsea Holland Komlo
9b8a079558
fix up comments
2018-04-17 11:53:08 -04:00
Alex Dadgar
9d612c8cb0
Cleanup
2018-04-16 15:48:34 -07:00
Alex Dadgar
32adaf9dfc
Copy the config given to the alloc runner
2018-04-16 15:45:52 -07:00
Alex Dadgar
4f2a7b6949
Fix copying drivers
2018-04-16 15:45:51 -07:00
Alex Dadgar
0b799822ff
Operate on copy
2018-04-16 15:45:49 -07:00
Alex Dadgar
ff1a1a63e8
Move where attribute for driver detection is set
2018-04-12 15:50:25 -07:00
Alex Dadgar
f24ce2c50c
Driver health detection cleanups
...
This PR does:
1. Health message based on detection has format "Driver XXX detected"
and "Driver XXX not detected"
2. Set initial health description based on detection status and don't
wait for the first health check.
3. Combine updating attributes on the node, fingerprint and health
checking update for drivers into a single call back.
4. Condensed driver info in `node status` only shows detected drivers
and make the output less wide by removing spaces.
2018-04-12 12:46:40 -07:00
Andrei Burd
502d17fa90
Added node class to tagged metrics
2018-04-11 12:20:59 +03:00
Alex Dadgar
3d367d6fd7
Fix client uptime metric missing client prefix
2018-04-10 10:39:36 -07:00
Alex Dadgar
ae1f76477e
Start rebalance after discovering new servers
2018-04-05 15:41:59 -07:00
Alex Dadgar
be2513e0f9
more jitter
2018-04-05 13:48:33 -07:00
Alex Dadgar
bd3345942c
Handle no leader and faster retries near limit
...
Handle the ErrNoLeader case and apply slower retries. Also when we have
missed the heartbeat retry aggressively, backing off after we have
missed for more than 30 seconds.
2018-04-05 11:22:47 -07:00
Alex Dadgar
279b5c22e5
Scale heartbeat retrying based on remaining heartbeat time
2018-04-05 10:58:13 -07:00
Alex Dadgar
7941f4eb2d
Fire retry only when consul discovers new servers
2018-04-05 10:40:17 -07:00
Alex Dadgar
86c32358d4
Spelling error
2018-04-03 18:30:01 -07:00
Alex Dadgar
01a6beafbf
RPC Retry Watcher
2018-04-03 18:05:28 -07:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Chelsea Holland Komlo
2174ede6b9
add clarifying comment
2018-03-29 10:58:39 -04:00
Chelsea Holland Komlo
e3319afee1
emit first node event
2018-03-28 17:26:53 -04:00
Chelsea Holland Komlo
efc03e252c
specify driver health messages
2018-03-28 11:35:21 -04:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
f801709a0a
fix issue when updating node events
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
60f12d206f
improve comments; update watchDriver
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
739784736a
remove unused function
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d92703617c
simplify logic
...
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
86b7b3d2d9
fix up health check logic comparison; add node events to client driver checks
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
53a5bc2bb3
Code review feedback
2018-03-21 15:15:26 -04:00
Alex Dadgar
34dc58421c
notes from walk through
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
44b6951dda
improve tests
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
0425be8f48
updating comments; locking concurrent node access
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c50d02ae93
go style; update comments
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3aa726baab
fix scheduler driver name; create node structs file
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3cba95e8a7
allow nomad to schedule based on the status of a client driver health check
...
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
0bde357731
add concept of health checks to fingerprinters and nodes
...
fix up feedback from code review
add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha Appan
3c38eededd
Fix spelling in comment
2018-03-14 15:54:25 -05:00
Alex Dadgar
bef4a8ee09
fix clearing node events
2018-03-14 09:48:59 -07:00
Chelsea Komlo
810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
...
Add node events
2018-03-14 08:42:55 -04:00
Preetha
360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
...
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Alex Dadgar
de6ebb6e6c
small cleanup
2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo
b41501e442
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
1488b076d1
code review feedback
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
a8655320fd
fix up go check warnings
2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo
0934769b04
add client side emitting of node events
...
Changelog
2018-03-13 18:08:21 -07:00
Preetha Appan
914eaed64f
Address some code review comments
2018-03-13 18:19:16 -05:00
Preetha Appan
09c231ce43
Return the err from server correctly
2018-03-13 18:10:14 -05:00
Preetha Appan
9618f52746
Remove error wrapping and make vault connection server side errors clearer.
2018-03-13 17:09:03 -05:00
Alex Dadgar
4844317cc2
Merge pull request #3890 from hashicorp/b-heartbeat
...
Heartbeat improvements and handling failures during establishing leadership
2018-03-12 14:41:59 -07:00
Josh Soref
173ce63fe9
spelling: transition
2018-03-11 19:06:05 +00:00
Josh Soref
782c704de6
spelling: thresholds
2018-03-11 19:03:47 +00:00
Josh Soref
8149694f3a
spelling: server
2018-03-11 18:55:30 +00:00
Josh Soref
258d76ec13
spelling: registry
2018-03-11 18:41:13 +00:00
Josh Soref
3c1ce6d16d
spelling: otherwise
2018-03-11 18:34:27 +00:00
Josh Soref
1ef6d6319e
spelling: labels
2018-03-11 18:21:44 +00:00
Josh Soref
52b83328fc
spelling: heartbeating
2018-03-11 18:12:19 +00:00
Josh Soref
c9b86bbc2f
spelling: controls
2018-03-11 17:50:39 +00:00
Josh Soref
e78cf9c81a
spelling: already
2018-03-11 17:39:04 +00:00
Josh Soref
b8b46d3f74
spelling: allocation
2018-03-11 17:37:22 +00:00
Chelsea Holland Komlo
122d1c4e4a
simplify retry logic
2018-03-01 09:48:26 -05:00
Chelsea Holland Komlo
355805db56
reset timer after updating node copy
2018-02-27 17:18:10 -05:00
Chelsea Holland Komlo
a72aaaf47f
add network resources equal method, use time ticker
...
remove impossible test case
2018-02-27 12:42:53 -05:00
Chelsea Holland Komlo
e736e31820
use time ticker, update how network resources are compared
2018-02-26 18:47:11 -05:00
Chelsea Holland Komlo
5059065b52
improved testing; node networks comparison
2018-02-26 15:55:38 -05:00
Chelsea Holland Komlo
1f31b39fe8
code review fixups
2018-02-26 12:36:30 -05:00
Chelsea Holland Komlo
ed8c8afbcd
edge trigger node update
...
test update config copy trigger
2018-02-26 12:36:04 -05:00
Alex Dadgar
49a47483d1
Registering back to initializing
...
Fix a bug in which if the node attributes/meta changed, we would
re-register the node in status initializing. This would incorrectly
trigger the client to log that it missed its heartbeat.
It would change the status of the Node to initializing until the next
heartbeat occured.
2018-02-16 17:49:31 -08:00
Alex Dadgar
eff4455c68
Fix original client server list behavior
2018-02-15 16:04:53 -08:00
Alex Dadgar
f9cf642436
Client tls
2018-02-15 15:22:57 -08:00
Alex Dadgar
e685211892
Code review feedback
2018-02-15 13:59:02 -08:00
Alex Dadgar
2c0ad26374
New RPC Modes and basic setup for streaming RPC handlers
2018-02-15 13:59:01 -08:00
Alex Dadgar
9bc75f0ad4
Fix manager tests and make testagent recover from port conflicts
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f1f8604bb
initial round of comment review
2018-02-15 13:59:01 -08:00
Alex Dadgar
c8c1284bc3
SetServer command actually returns an error if given an invalid server
2018-02-15 13:59:01 -08:00
Alex Dadgar
3f786b904b
use server manager
2018-02-15 13:59:01 -08:00
Alex Dadgar
6dd1c9f49d
Refactor
2018-02-15 13:59:00 -08:00
Alex Dadgar
1472b943d6
Stats Endpoint
2018-02-15 13:59:00 -08:00
Chelsea Holland Komlo
4a26959825
code review feedback
2018-02-07 18:10:55 -05:00
Chelsea Holland Komlo
d626d24488
remove dependency on client for fingerprint manager
2018-02-07 18:10:45 -05:00
Chelsea Holland Komlo
e012e5ab8a
add fingerprint manager
2018-02-07 18:10:33 -05:00
Chelsea Holland Komlo
b21233fe23
update log message
2018-02-01 19:46:57 -05:00
Chelsea Holland Komlo
6f9c0ab361
req/resp should be within config locks; rename for detected fingerprints
...
changelog
2018-02-01 19:00:39 -05:00
Chelsea Holland Komlo
b8e8064835
code review fixup
2018-01-31 18:34:03 -05:00
Chelsea Holland Komlo
7b53474a6e
add applicable boolean to fingerprint response
...
public fields and remove getter functions
2018-01-31 13:21:45 -05:00
Chelsea Holland Komlo
9482c322b7
locks for fingerprint reads/writes
2018-01-30 11:32:45 -05:00
Chelsea Holland Komlo
7c19de797c
create safe getters and setters for fingerprint response
2018-01-26 11:22:05 -05:00
Chelsea Holland Komlo
896d6f8058
fixups from code review
2018-01-26 07:04:32 -05:00
Chelsea Holland Komlo
9a8344333b
refactor Fingerprint to request/response construct
2018-01-24 11:54:02 -05:00
Chelsea Holland Komlo
649f86f094
refactor creating a new tls configuration
2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo
6c9f9c8ac3
adding additional test assertions; differentiate reloading agent and http server
2018-01-16 07:34:39 -05:00
Chelsea Holland Komlo
214d128eb9
reload raft transport layer
...
fix up linting
2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo
0708d34135
call reload on agent, client, and server separately
2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo
9741097406
reloading tls config should be atomic for clients/servers
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
ae7fc4695e
fixups from code review
...
Revert "close raft long-lived connections"
This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.
reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
acd3d1b162
fix up downgrading client to plaintext
...
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo
c0ad9a4627
add ability to upgrade/downgrade nomad agents tls configurations via sighup
2018-01-08 09:21:06 -05:00
Alex Dadgar
91ffbbb517
Review feedback
2017-12-07 16:10:57 -08:00
Alex Dadgar
02baa6c52b
Handle race between fingerprinters and registration
2017-12-07 13:09:37 -08:00
Alex Dadgar
4409fdacc0
Drop trace logging
2017-12-06 18:02:24 -08:00
Alex Dadgar
cd9a7f14b8
Add logging around heartbeats
2017-12-06 17:57:50 -08:00
Chelsea Komlo
2dfda33703
Nomad agent reload TLS configuration on SIGHUP ( #3479 )
...
* Allow server TLS configuration to be reloaded via SIGHUP
* dynamic tls reloading for nomad agents
* code cleanup and refactoring
* ensure keyloader is initialized, add comments
* allow downgrading from TLS
* initalize keyloader if necessary
* integration test for tls reload
* fix up test to assert success on reloaded TLS configuration
* failure in loading a new TLS config should remain at current
Reload only the config if agent is already using TLS
* reload agent configuration before specific server/client
lock keyloader before loading/caching a new certificate
* introduce a get-or-set method for keyloader
* fixups from code review
* fix up linting errors
* fixups from code review
* add lock for config updates; improve copy of tls config
* GetCertificate only reloads certificates dynamically for the server
* config updates/copies should be on agent
* improve http integration test
* simplify agent reloading storing a local copy of config
* reuse the same keyloader when reloading
* Test that server and client get reloaded but keep keyloader
* Keyloader exposes GetClientCertificate as well for outgoing connections
* Fix spelling
* correct changelog style
2017-11-14 17:53:23 -08:00
Michael Schurter
1769db98b7
Fix regression by returning error on unknown alloc
2017-11-01 15:16:38 -05:00
Michael Schurter
73e9b57908
Trigger GCs after alloc changes
...
GC much more aggressively by triggering GCs when allocations become
terminal as well as after new allocations are added.
2017-11-01 15:16:38 -05:00
Michael Schurter
2a81160dcd
Fix GC'd alloc tracking
...
The Client.allocs map now contains all AllocRunners again, not just
un-GC'd AllocRunners. Client.allocs is only pruned when the server GCs
allocs.
Also stops logging "marked for GC" twice.
2017-11-01 15:16:38 -05:00
Alex Dadgar
4831380e57
Node access is done using locked Node copy
...
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Michael Schurter
15b991e039
base64 migrate token
...
HTTP header values must be ASCII.
Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo
e1c4701a43
fix up build warnings
2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo
b018ca4d46
fixing up code review comments
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
410adaf726
Add functionality for authenticated volumes
2017-10-11 17:09:20 -07:00
Michael Schurter
a66c53d45a
Remove structs
import from api
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo
b26454cf99
Move setGaugeForAllocationStats to emitClientMetrics
2017-09-25 16:05:49 +00:00
Alex Dadgar
d306da846c
changelog and feedback
2017-09-14 14:08:58 -07:00
Alex Dadgar
07ed83fdd5
Non-locked accessors to common Node fields
...
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.
Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo
848af92183
fix panic in emitting tagged metrics
2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo
0ef43c3c5f
final code review fixups
2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
a8cbd0b559
fixups from code review
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f72e4aad13
labels depend on full setup of client beforehand
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
87a814397d
refactor to use baseLabels
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
b2953d905a
pass in commonly used values
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
c634043069
create base labels to be used in every metric
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f5ea83da8d
emit metrics using labels, add option for backwards compatibility
2017-09-05 14:12:57 +00:00
Armon Dadgar
76a03f2d8e
Address @dadgar feedback
2017-09-04 13:05:53 -07:00
Armon Dadgar
688897561b
client: adding token cache for ACL resolution
2017-09-04 13:05:36 -07:00
Armon Dadgar
c2e72e8a9c
client: create ACL and Policy cache
2017-09-04 13:05:35 -07:00
Michael Schurter
7342e23669
Move migrating state into prevAllocWatcher
2017-08-14 16:02:28 -07:00
Michael Schurter
e41a654917
switch from alloc blocker to new interface
...
interface has 3 implementations:
1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter
ee04717a0b
initial attempt at refactoring blocked/migrating
2017-08-11 16:21:35 -07:00
Alex Dadgar
ecee5e370e
initial watcher
2017-07-07 12:07:08 -07:00
Michael Schurter
644f0cfaa4
Consistently quote alloc ids in client logs
2017-07-06 10:24:52 -07:00
Michael Schurter
4fd9ef6a8c
Tiny client race condition fix
...
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
596727230b
Suggest wiping out alloc dir too
2017-07-03 12:29:21 -07:00
Michael Schurter
11f68bfca2
Add more logging to restore state errors
2017-07-03 11:58:41 -07:00
Mark Mickan
c196d320f8
Add tests for migrating symlinks in alloc and local directories
2017-06-04 15:56:22 +09:30
Mark Mickan
236f24c9a4
Include symlinks in snapshots when migrating disks
...
Fixes #2685
2017-06-04 00:36:18 +09:30
Alex Dadgar
b1eea2269a
Fix deadlock
2017-05-31 14:05:47 -07:00
Michael Schurter
ffc2b36dc7
Merge pull request #2636 from hashicorp/f-gc-alloc-limit
...
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter
dd51aa1cb9
Merge pull request #2654 from hashicorp/f-env-consul
...
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Alex Dadgar
28aef447e9
Fix perms to just set exec bit
2017-05-25 14:44:13 -07:00
Michael Schurter
fd9bef768f
Move task env into execcontext
...
Also inject PATH into rkt commands since we're no longer appending host
env vars for it.
2017-05-23 13:53:34 -07:00
Michael Schurter
3841692138
gc_max_allocs should include blocked & migrating
2017-05-12 16:03:22 -07:00
Michael Schurter
0453c2709c
Add new gc_max_allocs tuneable
...
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar
68c3a2bd98
Fix vet errors
2017-05-11 13:08:08 -07:00
Alex Dadgar
843bc26e5d
Respond to comments
2017-05-09 10:50:24 -07:00
Alex Dadgar
e00f9c9413
Restore state + upgrade path
2017-05-02 18:21:49 -07:00
Alex Dadgar
ec101b4760
Revert "metrics"
...
This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.
2017-05-02 09:28:11 -07:00
Alex Dadgar
8e516b5dc2
Async and sync saving of client state
2017-05-01 16:16:53 -07:00
Alex Dadgar
a7fd08d42a
perf
2017-05-01 16:01:50 -07:00
Alex Dadgar
e010fdf8c0
metrics
2017-05-01 14:51:27 -07:00
Alex Dadgar
b94f855326
boltDB database for client state
2017-05-01 14:50:34 -07:00
Michael Schurter
e204a287ed
Refactor Consul Syncer into new ServiceClient
...
Fixes #2478 #2474 #1995 #2294
The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.
The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.
Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.
Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
2321e8a4a0
Hash host ID so its stable and well distributed
...
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.
Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.
Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar
81b78f77e1
Track task start/finish time & improve logs errors
...
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
5e7e19de4b
Merge pull request #2461 from hashicorp/b-groups
...
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar
4ecebe7d8c
Proper reference counting through task restarts
...
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar
a171a014b3
Various fixes for setting user/group of task
...
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.
Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar
70e4feb045
Limit parallelism during garbage collection
...
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar
9011a7984c
Add metrics to show allocations on the client
...
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal
Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
5be806a3df
Fix vet script and fix vet problems
...
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
6910678c21
Allow random UUID
2017-02-27 13:42:37 -08:00
Alex Dadgar
7203dee7ab
Add allocated/unallocated metrics to client
2017-02-16 18:28:11 -08:00
Sean Chittenden
c4c321c770
Unconditionally lowercase the node ID read from disk.
2017-02-06 16:20:17 -08:00
Sean Chittenden
adb5be23ef
Add better verification of a host's HostID.
2017-02-02 16:24:32 -08:00
Sean Chittenden
bb4347e277
Slight mis-merge: secret-id in dev mode is random and needs to be returned.
2017-02-01 22:20:52 -08:00
Sean Chittenden
bb422a2258
Generate a durable NodeID if possible, otherwise fall back to a random HostID.
2017-02-01 22:11:33 -08:00
Diptanu Choudhury
11d7cb1230
Making the GC related fields tunable
2017-01-31 15:51:20 -08:00
Diptanu Choudhury
84a491f85a
Locking appropriately before closing the channel to indicate migration
2017-01-23 10:46:57 -08:00
Michael Schurter
054ee8df59
Fix index we get allocs by
2017-01-20 16:30:40 -08:00
Diptanu Choudhury
1999b7eebb
Merge pull request #2159 from hashicorp/b-consul-config
...
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury
e927de02d2
Moved functions to helper from structs
2017-01-18 15:55:14 -08:00
Alex Dadgar
5d2b56b387
Random wait
2017-01-11 13:24:23 -08:00
Alex Dadgar
c19985244a
GetAllocs uses a blocking query
...
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.
This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153 .
The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations. However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.
The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter
86fcf96f72
Put a logger in AllocDir/TaskDir
2017-01-05 16:31:56 -08:00
Diptanu Choudhury
247bda9a88
Unlocking if we return before adding a new alloc runner
2017-01-05 13:18:48 -08:00
Diptanu Choudhury
9721a1ab04
Fixed how alloc lock is held
2017-01-05 13:06:56 -08:00
Michael Schurter
13064768ac
Fix race when shutting down in dev mode
...
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.
Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.
Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00
Michael Schurter
4a9a574d9d
Merge pull request #2054 from hashicorp/f-prestart
...
Add Driver.Prestart method
2016-12-20 16:18:56 -08:00
Diptanu Choudhury
b6120e2fc8
Removing the alloc runner from GC if it is destroyed by the server
2016-12-20 11:14:22 -08:00
Diptanu Choudhury
6e6e0d364a
Added comments
2016-12-20 10:49:48 -08:00
Diptanu Choudhury
36b5545d6b
Making the gc allocator understand real disk usage
2016-12-16 18:34:59 -08:00
Diptanu Choudhury
7aef9bcabe
Added the stats collector to GC
2016-12-14 15:11:11 -08:00
Diptanu Choudhury
e855cd587b
Refactored hoststats collector
2016-12-14 15:07:42 -08:00
Diptanu Choudhury
0ffd92668d
GC-ing before we start a new allocation
2016-12-14 15:04:06 -08:00
Diptanu Choudhury
afdaa979f7
Added a garbage collector for allocations
2016-12-14 15:01:12 -08:00
Alex Dadgar
648ad2ebc5
Merge pull request #2096 from hashicorp/b-addAlloc
...
Fix race and remove panic
2016-12-13 13:50:17 -08:00
Diptanu Choudhury
53fb09023c
cancelling waiting for remote allocation if the alloc doesn't need migration
2016-12-13 13:06:33 -08:00
Alex Dadgar
3cbd237512
Fix race and remove panic
2016-12-13 12:34:23 -08:00
Christoffer Kylvåg
6a1f32b8ba
#1680 : Continue after not being able to stat a mountpoint
2016-12-13 12:28:57 +01:00
Diptanu Choudhury
cbf73908ff
Setting the appropriate file permissions which un-archiving compressed alloc dir
2016-12-05 17:04:43 -08:00
Diptanu Choudhury
bc17cacca0
Merge pull request #2017 from hashicorp/b-sticky
...
Not moving alloc data when sticky is turned off
2016-12-05 14:11:45 -08:00
Diptanu Choudhury
21f49564d3
Not moving alloc data when sticky is turned off
2016-12-05 14:00:01 -08:00
Michael Schurter
770ed703d0
Add Driver.Prestart method
...
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.
Currently Prestart does two things:
* Any driver specific task environment building
* Download Docker images
This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar
86ed1fb2e5
Disallow stale queries when deriving Vault tokens
...
This PR disallows stale queries when deriving a Vault token. Allowing
stale queries could result in the allocation not existing on the server
that is servicing the request.
2016-12-01 11:13:36 -08:00
Alex Dadgar
ec4d6936ff
add debug panic
2016-11-29 15:57:40 -08:00
Diptanu Choudhury
f67217297c
Ensuring allocs are not added multiple times to blocking queue
2016-11-29 11:19:37 -08:00
Alex Dadgar
88c7e04348
Check for Ephemeral Disk being nil
2016-11-15 10:03:06 -08:00
Alex Dadgar
ee921ccbb2
Merge pull request #1949 from carlpett/blacklist-fingerprints-and-drivers
...
Support blacklisting fingerprinters
2016-11-09 10:31:17 -08:00