Alex Dadgar
4831380e57
Node access is done using locked Node copy
...
Fixes https://github.com/hashicorp/nomad/issues/3454
Reliably reproduced the data race before by having a fingerprinter
change the nodes attributes every millisecond and syncing at the same
rate. With fix, did not ever panic.
2017-10-27 13:27:24 -07:00
Michael Schurter
15b991e039
base64 migrate token
...
HTTP header values must be ASCII.
Also constant time compare tokens and test the generate and compare
helper functions.
2017-10-13 10:59:13 -07:00
Chelsea Holland Komlo
e1c4701a43
fix up build warnings
2017-10-11 17:11:57 -07:00
Chelsea Holland Komlo
b018ca4d46
fixing up code review comments
2017-10-11 17:09:20 -07:00
Chelsea Holland Komlo
410adaf726
Add functionality for authenticated volumes
2017-10-11 17:09:20 -07:00
Michael Schurter
a66c53d45a
Remove structs
import from api
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Chelsea Holland Komlo
b26454cf99
Move setGaugeForAllocationStats to emitClientMetrics
2017-09-25 16:05:49 +00:00
Alex Dadgar
d306da846c
changelog and feedback
2017-09-14 14:08:58 -07:00
Alex Dadgar
07ed83fdd5
Non-locked accessors to common Node fields
...
This PR removes locking around commonly accessed node attributes that do
not need to be locked. The locking could cause nodes to TTL as the
heartbeat code path was acquiring a lock that could be held for an
excessively long time. An example of this is when Vault is inaccessible,
since the fingerprint is run with a lock held but the Vault
fingerprinter makes the API calls with a large timeout.
Fixes https://github.com/hashicorp/nomad/issues/2689
2017-09-14 14:08:26 -07:00
Chelsea Holland Komlo
848af92183
fix panic in emitting tagged metrics
2017-09-11 15:32:37 +00:00
Chelsea Holland Komlo
0ef43c3c5f
final code review fixups
2017-09-05 18:47:44 +00:00
Chelsea Holland Komlo
a8cbd0b559
fixups from code review
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f72e4aad13
labels depend on full setup of client beforehand
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
87a814397d
refactor to use baseLabels
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
b2953d905a
pass in commonly used values
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
c634043069
create base labels to be used in every metric
2017-09-05 14:13:34 +00:00
Chelsea Holland Komlo
f5ea83da8d
emit metrics using labels, add option for backwards compatibility
2017-09-05 14:12:57 +00:00
Armon Dadgar
76a03f2d8e
Address @dadgar feedback
2017-09-04 13:05:53 -07:00
Armon Dadgar
688897561b
client: adding token cache for ACL resolution
2017-09-04 13:05:36 -07:00
Armon Dadgar
c2e72e8a9c
client: create ACL and Policy cache
2017-09-04 13:05:35 -07:00
Michael Schurter
7342e23669
Move migrating state into prevAllocWatcher
2017-08-14 16:02:28 -07:00
Michael Schurter
e41a654917
switch from alloc blocker to new interface
...
interface has 3 implementations:
1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
2017-08-11 16:21:35 -07:00
Michael Schurter
ee04717a0b
initial attempt at refactoring blocked/migrating
2017-08-11 16:21:35 -07:00
Alex Dadgar
ecee5e370e
initial watcher
2017-07-07 12:07:08 -07:00
Michael Schurter
644f0cfaa4
Consistently quote alloc ids in client logs
2017-07-06 10:24:52 -07:00
Michael Schurter
4fd9ef6a8c
Tiny client race condition fix
...
Plus some logging improvements that may help with #2563
2017-07-05 16:15:19 -07:00
Michael Schurter
596727230b
Suggest wiping out alloc dir too
2017-07-03 12:29:21 -07:00
Michael Schurter
11f68bfca2
Add more logging to restore state errors
2017-07-03 11:58:41 -07:00
Mark Mickan
c196d320f8
Add tests for migrating symlinks in alloc and local directories
2017-06-04 15:56:22 +09:30
Mark Mickan
236f24c9a4
Include symlinks in snapshots when migrating disks
...
Fixes #2685
2017-06-04 00:36:18 +09:30
Alex Dadgar
b1eea2269a
Fix deadlock
2017-05-31 14:05:47 -07:00
Michael Schurter
ffc2b36dc7
Merge pull request #2636 from hashicorp/f-gc-alloc-limit
...
Add new gc_max_allocs tuneable
2017-05-30 16:14:09 -07:00
Michael Schurter
dd51aa1cb9
Merge pull request #2654 from hashicorp/f-env-consul
...
Add envconsul-like support and refactor environment handling
2017-05-30 14:40:14 -07:00
Alex Dadgar
28aef447e9
Fix perms to just set exec bit
2017-05-25 14:44:13 -07:00
Michael Schurter
fd9bef768f
Move task env into execcontext
...
Also inject PATH into rkt commands since we're no longer appending host
env vars for it.
2017-05-23 13:53:34 -07:00
Michael Schurter
3841692138
gc_max_allocs should include blocked & migrating
2017-05-12 16:03:22 -07:00
Michael Schurter
0453c2709c
Add new gc_max_allocs tuneable
...
More than gc_max_allocs may be running on a node, but terminal allocs
will be garbage collected to try to keep the total number below the
limit.
2017-05-11 17:18:02 -07:00
Alex Dadgar
68c3a2bd98
Fix vet errors
2017-05-11 13:08:08 -07:00
Alex Dadgar
843bc26e5d
Respond to comments
2017-05-09 10:50:24 -07:00
Alex Dadgar
e00f9c9413
Restore state + upgrade path
2017-05-02 18:21:49 -07:00
Alex Dadgar
ec101b4760
Revert "metrics"
...
This reverts commit 4d6a012c6fb6f1fba6c62985d091b1a20c3198e7.
2017-05-02 09:28:11 -07:00
Alex Dadgar
8e516b5dc2
Async and sync saving of client state
2017-05-01 16:16:53 -07:00
Alex Dadgar
a7fd08d42a
perf
2017-05-01 16:01:50 -07:00
Alex Dadgar
e010fdf8c0
metrics
2017-05-01 14:51:27 -07:00
Alex Dadgar
b94f855326
boltDB database for client state
2017-05-01 14:50:34 -07:00
Michael Schurter
e204a287ed
Refactor Consul Syncer into new ServiceClient
...
Fixes #2478 #2474 #1995 #2294
The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.
The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.
Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.
Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar
2321e8a4a0
Hash host ID so its stable and well distributed
...
This PR takes the host ID and runs it through a hash so that it is well
distributed. This makes it so that machines that report similar host IDs
are easily distinguished.
Instances of similar IDs occur on EC2 where the ID is prefixed and on
motherboards created in the same batch.
Fixes https://github.com/hashicorp/nomad/issues/2534
2017-04-10 11:44:51 -07:00
Alex Dadgar
81b78f77e1
Track task start/finish time & improve logs errors
...
This PR adds tracking to when a task starts and finishes and the logs
API takes advantage of this and returns better errors when asking for
logs that do not exist.
2017-03-31 16:14:11 -07:00
Alex Dadgar
5e7e19de4b
Merge pull request #2461 from hashicorp/b-groups
...
Various fixes for setting user/group of task
2017-03-28 11:13:27 -07:00
Alex Dadgar
4ecebe7d8c
Proper reference counting through task restarts
...
This PR fixes an issue in which the reference count on a Docker image
would become inflated through task restarts.
2017-03-25 17:05:53 -07:00
Alex Dadgar
a171a014b3
Various fixes for setting user/group of task
...
This PR fixes two issues:
* Folder permissions in -dev mode were incorrect and not suitable for
running as a particular user.
* Was not setting the group membership properly for the launched
process.
Fixes https://github.com/hashicorp/nomad/issues/2160
2017-03-20 14:21:13 -07:00
Alex Dadgar
70e4feb045
Limit parallelism during garbage collection
...
This PR introduces a parallelism limit during garbage collection. This
is used to avoid large resource usage spikes if garbage collecting many
allocations at once.
2017-03-10 16:27:00 -08:00
Alex Dadgar
9011a7984c
Add metrics to show allocations on the client
...
This PR adds the following metrics to the client:
client.allocations.migrating
client.allocations.blocked
client.allocations.pending
client.allocations.running
client.allocations.terminal
Also adds some missing fields to the API version of the evaluation.
2017-03-09 12:37:41 -08:00
Alex Dadgar
5be806a3df
Fix vet script and fix vet problems
...
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
6910678c21
Allow random UUID
2017-02-27 13:42:37 -08:00
Alex Dadgar
7203dee7ab
Add allocated/unallocated metrics to client
2017-02-16 18:28:11 -08:00
Sean Chittenden
c4c321c770
Unconditionally lowercase the node ID read from disk.
2017-02-06 16:20:17 -08:00
Sean Chittenden
adb5be23ef
Add better verification of a host's HostID.
2017-02-02 16:24:32 -08:00
Sean Chittenden
bb4347e277
Slight mis-merge: secret-id in dev mode is random and needs to be returned.
2017-02-01 22:20:52 -08:00
Sean Chittenden
bb422a2258
Generate a durable NodeID if possible, otherwise fall back to a random HostID.
2017-02-01 22:11:33 -08:00
Diptanu Choudhury
11d7cb1230
Making the GC related fields tunable
2017-01-31 15:51:20 -08:00
Diptanu Choudhury
84a491f85a
Locking appropriately before closing the channel to indicate migration
2017-01-23 10:46:57 -08:00
Michael Schurter
054ee8df59
Fix index we get allocs by
2017-01-20 16:30:40 -08:00
Diptanu Choudhury
1999b7eebb
Merge pull request #2159 from hashicorp/b-consul-config
...
Fixed merging consul config
2017-01-18 16:14:54 -08:00
Diptanu Choudhury
e927de02d2
Moved functions to helper from structs
2017-01-18 15:55:14 -08:00
Alex Dadgar
5d2b56b387
Random wait
2017-01-11 13:24:23 -08:00
Alex Dadgar
c19985244a
GetAllocs uses a blocking query
...
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.
This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153 .
The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations. However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.
The logging has been improved to make the problem more clear as well.
2017-01-10 13:30:35 -08:00
Michael Schurter
86fcf96f72
Put a logger in AllocDir/TaskDir
2017-01-05 16:31:56 -08:00
Diptanu Choudhury
247bda9a88
Unlocking if we return before adding a new alloc runner
2017-01-05 13:18:48 -08:00
Diptanu Choudhury
9721a1ab04
Fixed how alloc lock is held
2017-01-05 13:06:56 -08:00
Michael Schurter
13064768ac
Fix race when shutting down in dev mode
...
Client.Shutdown holds the allocLock when destroying alloc runners in dev
mode.
Client.updateAllocStatus can be called during AllocRunner shutdown and
calls getAllocRunners which tries to acquire allocLock.RLock. This
deadlocks since Client.Shutdown already has the write lock.
Switching Client.Shutdown to use getAllocRunners and not hold a lock
during AllocRunner shutdown is the solution.
2017-01-03 17:21:50 -08:00
Michael Schurter
4a9a574d9d
Merge pull request #2054 from hashicorp/f-prestart
...
Add Driver.Prestart method
2016-12-20 16:18:56 -08:00
Diptanu Choudhury
b6120e2fc8
Removing the alloc runner from GC if it is destroyed by the server
2016-12-20 11:14:22 -08:00
Diptanu Choudhury
6e6e0d364a
Added comments
2016-12-20 10:49:48 -08:00
Diptanu Choudhury
36b5545d6b
Making the gc allocator understand real disk usage
2016-12-16 18:34:59 -08:00
Diptanu Choudhury
7aef9bcabe
Added the stats collector to GC
2016-12-14 15:11:11 -08:00
Diptanu Choudhury
e855cd587b
Refactored hoststats collector
2016-12-14 15:07:42 -08:00
Diptanu Choudhury
0ffd92668d
GC-ing before we start a new allocation
2016-12-14 15:04:06 -08:00
Diptanu Choudhury
afdaa979f7
Added a garbage collector for allocations
2016-12-14 15:01:12 -08:00
Alex Dadgar
648ad2ebc5
Merge pull request #2096 from hashicorp/b-addAlloc
...
Fix race and remove panic
2016-12-13 13:50:17 -08:00
Diptanu Choudhury
53fb09023c
cancelling waiting for remote allocation if the alloc doesn't need migration
2016-12-13 13:06:33 -08:00
Alex Dadgar
3cbd237512
Fix race and remove panic
2016-12-13 12:34:23 -08:00
Christoffer Kylvåg
6a1f32b8ba
#1680 : Continue after not being able to stat a mountpoint
2016-12-13 12:28:57 +01:00
Diptanu Choudhury
cbf73908ff
Setting the appropriate file permissions which un-archiving compressed alloc dir
2016-12-05 17:04:43 -08:00
Diptanu Choudhury
bc17cacca0
Merge pull request #2017 from hashicorp/b-sticky
...
Not moving alloc data when sticky is turned off
2016-12-05 14:11:45 -08:00
Diptanu Choudhury
21f49564d3
Not moving alloc data when sticky is turned off
2016-12-05 14:00:01 -08:00
Michael Schurter
770ed703d0
Add Driver.Prestart method
...
The Driver.Prestart method currently does very little but lays the
foundation for where lifecycle plugins can interleave execution _after_
task environment setup but _before_ the task starts.
Currently Prestart does two things:
* Any driver specific task environment building
* Download Docker images
This change also attaches a TaskEvent emitter to Drivers, so they can
emit events during task initialization.
2016-12-02 11:03:48 -08:00
Alex Dadgar
86ed1fb2e5
Disallow stale queries when deriving Vault tokens
...
This PR disallows stale queries when deriving a Vault token. Allowing
stale queries could result in the allocation not existing on the server
that is servicing the request.
2016-12-01 11:13:36 -08:00
Alex Dadgar
ec4d6936ff
add debug panic
2016-11-29 15:57:40 -08:00
Diptanu Choudhury
f67217297c
Ensuring allocs are not added multiple times to blocking queue
2016-11-29 11:19:37 -08:00
Alex Dadgar
88c7e04348
Check for Ephemeral Disk being nil
2016-11-15 10:03:06 -08:00
Alex Dadgar
ee921ccbb2
Merge pull request #1949 from carlpett/blacklist-fingerprints-and-drivers
...
Support blacklisting fingerprinters
2016-11-09 10:31:17 -08:00
Calle Pettersson
4304755c12
Address comments from PR
2016-11-09 11:50:16 +01:00
Calle Pettersson
8632696e2d
Add blacklisting of drivers
2016-11-08 18:30:07 +01:00
Calle Pettersson
b603bb007e
Add blacklisting of fingerprinters
2016-11-08 18:29:44 +01:00
Alex Dadgar
9015e79aaa
Add compatibility code for secret ID while upgrading cluster in both server/client mode on single nodes
2016-11-07 16:52:08 -08:00
Diptanu Choudhury
1a8fa8c8d5
Making Nomad TLS configs region aware
2016-11-01 11:55:29 -07:00
Diptanu Choudhury
4079545a92
Making the client use tls if the node from which migration has to be made has enabled tls
2016-10-31 10:20:04 -07:00
Michael Schurter
cc115fe984
Swap log line classifiers to be consistent
2016-10-28 14:59:48 -07:00
Diptanu Choudhury
3182d0454f
Adding the alloc if we can't find the TG
2016-10-27 15:45:10 -07:00
Diptanu Choudhury
0682a1a113
Not blocking for remote alloc if the alloc is not sticky
2016-10-27 12:04:55 -07:00
Alex Dadgar
150b678a6b
Merge pull request #1806 from hashicorp/f-docker4mac-fixes
...
A couple fixes to make Docker For Mac work
2016-10-27 09:29:40 -07:00
Diptanu Choudhury
50ca5e1e9d
Merge pull request #1853 from hashicorp/f-rpc-http-tls
...
TLS support for http and RPC
2016-10-25 16:14:43 -07:00
Diptanu Choudhury
7c61e115bd
Moved tlsutil into helpers
2016-10-25 16:05:37 -07:00
Diptanu Choudhury
353e7fc7f1
Moving the certs into tlsutil package
2016-10-25 16:01:53 -07:00
Diptanu Choudhury
cf35aeac84
Moving the TLSConfig to structs
2016-10-25 15:57:38 -07:00
Alex Dadgar
03eba049ed
Merge pull request #1848 from hashicorp/f-vault-error
...
Thread through whether DeriveToken error is recoverable or not
2016-10-24 15:01:18 -07:00
Alex Dadgar
692a809919
Merge pull request #1842 from hashicorp/f-version-and-id
...
Print the version and client node ID
2016-10-24 10:13:33 -07:00
Diptanu Choudhury
2e3118e69c
Implemented TLS support for http and rpc
2016-10-23 22:22:00 -07:00
Alex Dadgar
ede3a814ba
Small fixes
2016-10-22 18:20:50 -07:00
Alex Dadgar
0070178741
Thread through whether DeriveToken error is recoverable or not
2016-10-22 18:08:30 -07:00
Michael Schurter
285e80ac0f
Remove disk usage enforcement
...
Many thanks to @iverberk for the original PR (#1609 ), but we ended up
not wanting to ship this implementation with 0.5.
We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
2016-10-21 13:55:51 -07:00
Alex Dadgar
aa0d8d0d8d
Print the version and client node ID
2016-10-20 17:46:04 -07:00
Evan Phoenix
e7a98d5500
Make EvalSymlink errors more verbose
2016-10-12 17:07:21 -07:00
Evan Phoenix
f8a65a3b9d
Resolve alloc/state directories to make Docker For Mac happy
...
* In -dev mode, `ioutil.TempDir` is used for the alloc and state
directories.
* `TempDir` uses `$TMPDIR`, which os OS X contains a per user
directory which is under `/var/folder`.
* `/var` is actually a symlink to `/private/var`
* Docker For Mac validates the directories that are passed to bind and on
OS X. That whitelist contains `/private`, but not `/var`. It does not
expand the path, and so any paths in `$TMPDIR` fail the whitelist check.
And thusly, by expanding the alloc/state directories the value passed
for binding does contain `/private` and Docker For Mac is happy.
2016-10-12 17:06:25 -07:00
Michael Schurter
6dea6df919
Restore lost chan inits
2016-10-03 14:56:50 -07:00
Diptanu Choudhury
d50c395421
Getting snapshot of allocation from remote node ( #1741 )
...
* Added the alloc dir move
* Moving allocdirs when starting allocations
* Added the migrate flag to ephemeral disk
* Stopping migration if the allocation doesn't need migration any more
* Added the GetAllocDir method
* refactored code
* Added a test for alloc runner
* Incorporated review comments
2016-10-03 09:59:57 -07:00
Michael Schurter
b117725dc9
Only log consul errors once since last succesful run
2016-09-28 17:18:45 -07:00
Michael Schurter
d486de3804
Remove unused const
2016-09-27 16:04:01 -07:00
Michael Schurter
2e696c5e61
Fix lies found in comments by fact checkers
2016-09-26 16:51:53 -07:00
Michael Schurter
11cf9686a6
No need to put reaper ticker on the struct
2016-09-26 16:15:19 -07:00
Michael Schurter
2eb0062959
Drop clumsy timeout on discovery notifications
...
It's better to just let goroutines fallback to their longer retry
intervals then try to be clever here.
2016-09-26 16:05:21 -07:00
Michael Schurter
307e674eca
Flip disco chan; clarify method names/comments
2016-09-26 15:52:40 -07:00
Michael Schurter
888ee21270
Return csv of servers from Stats, not just count
2016-09-26 15:40:26 -07:00
Michael Schurter
7dc0079dd2
doDisco -> triggerDiscoveryCh; discovered -> serversDiscoveredCh
...
Also fix log line formatting
2016-09-26 15:21:28 -07:00
Michael Schurter
434e4be97c
noServers -> noServersErr
2016-09-26 15:12:35 -07:00
Michael Schurter
b2ddb85a78
consul -> Consul
2016-09-26 15:06:57 -07:00
Michael Schurter
37cfb2769c
Replace periodic handlers with event driven disco
...
Remove use of periodic consul handlers in the client and just use
goroutines. Consul Discovery is now triggered with a chan instead of
using a timer and deadline to trigger.
Once discovery is complete a chan is ticked so all goroutines waiting
for servers will run.
Should speed up bootstraping and recovery while decreasing spinning on
timers.
2016-09-23 17:02:48 -07:00
Michael Schurter
2ab5264595
Retry all servers on RPC call failure
...
rpcproxy is refactored into serverlist which prioritizes good servers
over servers in a remote DC or who have had a failure.
Registration, heartbeating, and alloc status updating will retry faster
when new servers are discovered.
Consul discovery will be retried more quickly when no servers are
available (eg on startup or an outage).
2016-09-23 11:44:48 -07:00
Alex Dadgar
50efdb00e9
Merge pull request #1713 from hashicorp/f-alloc-runner-vault
...
Vault integration in client
2016-09-20 16:15:55 -07:00
Alex Dadgar
64de46432a
Merge pull request #1677 from hashicorp/f-vault-implicit-constraint
...
Vault implicit Task Group constraint + allow root tokens
2016-09-20 16:15:32 -07:00
Alex Dadgar
ec152a6d12
Clean up vault client
2016-09-14 18:10:56 -07:00
Alex Dadgar
6702a29071
Vault token threaded
2016-09-14 13:30:01 -07:00
Robert Neumayer
8dc19dbd10
Log adding of servers at INFO level
2016-09-14 22:24:17 +02:00
Alex Dadgar
2c8dd8bbd3
Revert "Introduce a Secret/ directory"
2016-09-01 17:23:15 -07:00
Alex Dadgar
b0adaa5301
Allow root token
2016-09-01 12:05:08 -07:00
Alex Dadgar
1ed454dd60
Merge pull request #1671 from hashicorp/f-secret-dir2
...
Introduce a Secret/ directory
2016-09-01 09:56:17 -07:00
Alex Dadgar
9fa23e3536
Symlink on windows
2016-08-31 21:41:44 -07:00
Alex Dadgar
5d3b47e648
Address comments and reserve
2016-08-31 18:11:02 -07:00
vishalnayak
55a6f06e15
Addressed review feedback
2016-08-30 13:08:13 -04:00
vishalnayak
3808dd0ff8
Return only fatal error to renewal error channel
2016-08-30 12:46:59 -04:00
vishalnayak
a0dbfe25b3
Fix tests
2016-08-29 21:30:06 -04:00
vishalnayak
82f6209e97
tokenDeriver function pointer to derive tokens.
...
Remove rpc*, connPool, node and region from vaultclient.
2016-08-29 20:32:05 -04:00
Alex Dadgar
14b7126511
Secret dir, hello world
2016-08-29 15:41:52 -07:00
vishalnayak
56e42cf03d
Employ DeriveVaultToken API and flesh-up DeriveToken
2016-08-24 12:29:59 -04:00
vishalnayak
6002e596c4
VaultClient for Nomad Client
2016-08-24 09:43:45 -04:00
Diptanu Choudhury
1e1eef56a1
Putting the mock driver behind a build flag
2016-08-22 15:02:28 -05:00
Diptanu Choudhury
4ca623bcfe
blocking chained allocations until previous allocation hasn't terminated
2016-08-22 11:34:24 -05:00
Alex Dadgar
a90dafe9ab
handle the upgrade case
2016-08-18 19:01:24 -07:00
Alex Dadgar
895c31f605
Nodes generate Secret ID and used for retrieving allocations and registering
2016-08-17 16:31:47 -07:00
Alex Dadgar
84820db86f
If the client detects that a heartbeat has failed because it is not registered, reregister
2016-08-15 17:24:09 -07:00
Diptanu Choudhury
28b3f511e0
Fixed some error messages
2016-08-10 15:17:32 -07:00
Kenjiro Nakayama
6a810e6f1e
Update after review
2016-08-09 08:57:26 +09:00
Kenjiro Nakayama
5c621b74e5
tiny: Return fmt.Errorf instead of duplicated error messages
2016-08-09 08:57:26 +09:00
Diptanu Choudhury
41b540fbc8
Allow operators to opt into publishing node and alloc metrics
2016-08-01 19:52:20 -07:00
Cameron Davison
777bdf4a1e
fix setup consul syncer error message
2016-07-28 22:14:52 -05:00
Alex Dadgar
ebac5cb283
Node.Register handles the case of transistioning to ready and creating evals
2016-07-21 15:22:02 -07:00
Diptanu Choudhury
5b39a5db40
Fixed a debug message
2016-07-09 00:12:53 -07:00
Sean Chittenden
03c571c61b
Consolidate fingerprinters into a single map
.
2016-07-08 23:37:14 -07:00
Sean Chittenden
8bdb38d016
Code golf
...
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden
df4fe2e502
Fix the shuffling of remote datacenters.
...
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden
9a60999100
Pass a logger arg to NewClient
and NewServer
2016-06-16 23:29:23 -07:00
Sean Chittenden
fd18eb7fdb
Only register the Client services reaper when consul.auto_advertise
is enabled
2016-06-16 18:24:58 -07:00
Sean Chittenden
952b6ce7b5
Only auto-join clients if client_auto_join
is true
2016-06-16 14:47:21 -07:00
Sean Chittenden
af55b74114
Merge pull request #1276 from hashicorp/f-consul-server-autojoin
...
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Sean Chittenden
008d75184b
Use the %+q
verb in log messages (vs %q
).
2016-06-16 11:03:51 -07:00
Alex Dadgar
7375d828e1
remove trace
2016-06-15 15:47:59 -07:00
Sean Chittenden
5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
...
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs. With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden
2123460cf0
Bump various Consul search limits
...
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.
This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Alex Dadgar
cf99fc3173
Use Status.Peers instead of Status.Ping
2016-06-15 12:00:20 -07:00
Alex Dadgar
4b04e503f3
address comments
2016-06-13 17:32:18 -07:00
Alex Dadgar
8bbf4a55e5
Fix IDs and domain scoping
2016-06-13 16:30:58 -07:00
Diptanu Choudhury
d019d8ef8e
implemented reconciliation of unwanted services
2016-06-13 14:52:26 +02:00
Alex Dadgar
a82c2bb058
Do not reconcile in client and cleanup executor a bit
2016-06-12 18:22:07 -07:00
Alex Dadgar
8e231fa382
Rename ConsulService back to Service
2016-06-12 16:36:49 -07:00
Alex Dadgar
fdda90229f
only support latest and remove ring buffer
2016-06-12 09:32:38 -07:00
Alex Dadgar
e952540f6f
Allocation resources returned in a struct
2016-06-11 21:04:10 -07:00
Sean Chittenden
2f036231e5
Merge pull request #1201 from hashicorp/f-dyn-server-list
...
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Sean Chittenden
92e2cfb0ad
Walk the DCs from nearest to most remote.
2016-06-11 18:52:21 -04:00
Sean Chittenden
2968545201
Walk the DCs from nearest to most remote, no limit on the search.
2016-06-11 18:23:06 -04:00
Sean Chittenden
917766a3df
Prefer %+q
over %q
in log messages.
2016-06-11 18:17:20 -04:00
Diptanu Choudhury
fd60cfd585
Emitting client resource usage metrics as guages instead of k/v pairs
2016-06-11 22:17:32 +02:00
Sean Chittenden
bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC)
2016-06-10 23:38:28 -04:00
Sean Chittenden
bc771d35df
Query for the Nomad service across multiple Consul datacenters.
2016-06-10 23:05:14 -04:00
Sean Chittenden
26b1e826d7
golint(1) police
2016-06-10 15:54:39 -04:00
Sean Chittenden
f139d0c68b
Properly guard consulPullHeartbeatDeadline behind heartbeatLock
2016-06-10 15:54:39 -04:00
Sean Chittenden
ed29946f5e
Populate the RPC Proxy's server list if heartbeat did not include a leader.
...
It's possible that a Nomad Client is heartbeating with a Nomad server that
has become issolated from the quorum of Nomad Servers. When 3x the
heartbeatTTL has been exceeded, append the Consul server list to the primary
primary server list. When the next RPCProxy rebalance occurs, there is a
chance one of the servers discovered from Consul will be in the majority.
When client reattaches to a Nomad Server in the majority, it will include
a heartbeat and will reset the TTLs *AND* will clear the primary server list
to include only values from the heartbeat.
2016-06-10 15:54:39 -04:00
Sean Chittenden
9a223936bb
Generate and sync Consul ServiceIDs consistently
2016-06-10 15:54:39 -04:00
Sean Chittenden
7956eb0c80
Rename structs.Task's Service
attribute to ConsulService
2016-06-10 15:54:39 -04:00
Sean Chittenden
8c813630e6
Move package client/consul/sync to command/agent/consul.
...
This has been done to allow the Server and Client to reuse the same
Syncer because the Agent may be running Client, Server, or both
simultaneously and we only want one Syncer object alive in the agent.
2016-06-10 15:54:39 -04:00
Sean Chittenden
fda03c5c9e
Change the signature of the PeriodicCallback to return an error
...
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
2016-06-10 15:54:39 -04:00
Sean Chittenden
555f4fe135
Change client/consul.NewSyncer() to accept a shutdown channel
...
In addition to the API changing, consul.Syncer can now be signaled
to shutdown via the Shutdown() method, which will call the Run()'ing
sync task to exit gracefully.
2016-06-10 15:54:39 -04:00
Sean Chittenden
484816f5e0
Ensure that all accesses to Client.alloc are wrapped by allocLock.
2016-06-10 15:50:11 -04:00
Sean Chittenden
08cab4fdfa
Use client.getAllocRunners() where appropriate.
2016-06-10 15:50:11 -04:00
Sean Chittenden
f9d0b9da32
Line wrap long line.
2016-06-10 15:50:11 -04:00
Sean Chittenden
0d201631a3
Rename rpcproxy.UpdateFromNodeUpdateResponse to RefreshServerLists
...
While breaking the API within this PR, break out the individual
arguments to RefreshServerLists. The servers parameter is reusing
`structs.NodeServerInfo` for the time being, but this can be revisited
if the needs of the strucutre diverge in the future.
2016-06-10 15:50:11 -04:00
Sean Chittenden
0997fb1669
Fix up the comments
...
Pointed out by: @dadgar
2016-06-10 15:50:11 -04:00
Sean Chittenden
aaa7d6bf40
Make the locking protocol more explicit in client.NewClient
...
With an over abundance of caution, preevnt future copy/pasta by
using the right locks when bootstrapping a Client. Strictly speaking
this is not necessary, but it makes explicit the locking semantics
and guards against future concurrent or parallel initialization.
2016-06-10 15:50:11 -04:00
Sean Chittenden
525554c008
Use the client configCopy and lock appropriately.
2016-06-10 15:50:11 -04:00
Sean Chittenden
3060d6b33c
Flesh out the comment re: the client.rpcproxy.Run() task.
...
Requested by: Alex
2016-06-10 15:50:11 -04:00
Sean Chittenden
b1ee131db8
Rename backupServerDeadline
to consulPullHeartbeatDeadline
...
Suggested by: @alex
2016-06-10 15:50:11 -04:00
Sean Chittenden
b9adfcecf5
Remove unused variable
2016-06-10 15:50:11 -04:00
Sean Chittenden
f15eeb8f27
Clean up some docs and comments to be more accurate
2016-06-10 15:50:11 -04:00
Sean Chittenden
c78b0a6567
Remove unused constants
2016-06-10 15:50:11 -04:00
Sean Chittenden
39fb0f2469
Change the endpoint for /v1/agent/servers
and fix tests.
...
When an agent is running a server, the list of servers includes the
Raft peers. When the agent is running a client (which is always the
case?), include a list of the servers found in the Client's RpcProxy.
Dedupe and provide a unique list back to the caller.
2016-06-10 15:50:11 -04:00
Sean Chittenden
cb80e93a6b
Move client.DefaultConfig() to client/config.DefaultConfig()
...
Resolves an import cycle in testing and is more appropriate because
the default should reside next to its struct definition.
2016-06-10 15:50:11 -04:00
Sean Chittenden
bff57a0dce
Reconcile, clean up, and centralize API version numbers (major and minor).
...
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).
Background: hashicorp/consul/issues/1346#issuecomment-151663152
2016-06-10 15:50:11 -04:00
Sean Chittenden
36340c654d
Improve language re: fingerprinting
2016-06-10 15:50:11 -04:00
Sean Chittenden
438becb28b
Pass the datacenter name in the heartbeat
...
Servers that are part of a different datacenter are added as backup
servers instead of primary servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden
d914f262f9
Consolidate all consul sync periodic go routines to handlers.
...
Only one pump and periodic loop now.
2016-06-10 15:50:11 -04:00
Sean Chittenden
9fb0104def
Teach Client to reuse an Agent's consulSyncer.
...
"There can be only one."
2016-06-10 15:50:11 -04:00
Sean Chittenden
47891fb559
Register two services each for clients and servers, http and rpc.
...
In order to give clients a fighting chance to talk to the right port,
differentiate RPC services from HTTP services by registering two
services with different tags. This yields
`rpc.nomad-server.service.consul` and
`http.nomad-server.service.consul` which is immensely more useful to
clients attempting to bootstrap their world.
2016-06-10 15:50:11 -04:00
Sean Chittenden
861f900225
s/RpcVersion/RPCVersion/g
2016-06-10 15:50:11 -04:00
Sean Chittenden
9f8d6c9f67
Move struct member to reduce diff context
2016-06-10 15:50:11 -04:00
Sean Chittenden
410d85cc78
Rename the package from client/rpc_proxy
to client/rpcproxy
...
Also rename `NewRpcProxy()` to just `New()` to avoid package stutter.
2016-06-10 15:50:11 -04:00
Sean Chittenden
ec7c78f533
Only poll Consul for servers when Nomad heartbeats begin to fail
...
When a deadline timer of 2x Server's last requested TTL expires,
begin polling Consul for Nomad Servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden
17116fc5a7
Rebalance Nomad client RPCs among different Nomad servers.
...
Implement client/rpc_proxy.RpcProxy.
2016-06-10 15:50:11 -04:00
Sean Chittenden
88f3422d7c
Rename NewConsulService to NewSyncer
2016-06-10 15:49:37 -04:00
Sean Chittenden
768aab015d
Rename client/consul/sync.ConsulService to client/consul/sync.Syncer
...
Syncer describes the responsibility and actions of the type.
2016-06-10 15:49:37 -04:00
Sean Chittenden
73e173673e
Rename client/config/config's ConsulConfig to ConsulAgentConfig
...
A follow up commit to the previous rename. More to come.
2016-06-10 15:48:36 -04:00
Sean Chittenden
e36686a17d
Use consul/lib's RandomStagger
...
Removes four redundant copies of the method in the process.
2016-06-10 15:48:36 -04:00
Diptanu Choudhury
6bf38bb213
Fixed the svc id generation scheme while pruning services
2016-06-07 08:14:11 -07:00
Alex Dadgar
ba1a92eb8c
Handle errors during stats collection
2016-06-03 14:23:18 -07:00
Diptanu Choudhury
35e31c1b81
Enqueing metrics only if they are not nil
2016-06-02 17:14:15 -04:00
Diptanu Choudhury
7efde782fa
Sending metrics for tasks as well
2016-06-01 16:42:16 +02:00
Diptanu Choudhury
7f172f314f
Pushing host stats
2016-05-31 04:05:47 +02:00
Diptanu Choudhury
c0dc6cfbf2
Changing the api of the stats endpoints
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
a0c279f3b2
comments
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
4c85f02072
Stopping the metrics collector timers using defer and starting to collect host stats right away
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
b2e8495971
Fixing the alloc runner tests
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
f93820f40c
Added comments
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
c2da19bf11
Refactored the api for NewHostStatsCollector
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
c46400597e
Making the stats collection interval and number of data points to keep in memory configurable
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
b6e5227e7a
Renamed monitorUsage method
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
666b419dba
Acquiring locks before iterating allocations and tasks
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
73fa700a88
Showing host resource usage stats
2016-05-28 19:59:20 -07:00
Diptanu Choudhury
9a96dddc07
Added some docs to resource stats endpoint
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
b9feae89ce
Making the conversion to Stats simpler
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
7569b1af2e
Collecting host stats
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
f3d0aecafe
Reporting time series of stats
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
0fb0e0237f
Added a client API to display resource usage of an allocation
2016-05-28 19:42:34 -07:00
Diptanu Choudhury
2a99a2cfb6
Removing addition of the client service while reconciling services
2016-05-13 10:34:21 -07:00
Diptanu Choudhury
439d7bf326
Fixed an agent test
2016-05-11 17:26:53 -07:00
Diptanu Choudhury
347cb890d2
Removed allocID and task name from consul service
2016-05-11 16:26:41 -07:00
Diptanu Choudhury
7287376eca
Using consul config from client config instead of reading from client options
2016-05-11 16:10:57 -07:00
Diptanu Choudhury
83fed62a0a
Implemented registering client and server services
2016-05-11 16:07:02 -07:00
Diptanu Choudhury
2f8a3532ad
Refactored the signature of NewConsulService
2016-05-11 15:22:58 -07:00
Sean Chittenden
53d4681b61
Emit various debugging information with the results of the fingerprinter
2016-05-09 12:21:51 -07:00
Diptanu Choudhury
2941b26244
Reading consul attr from copy of node attributes
2016-04-11 20:13:28 -04:00
Alex Dadgar
f3d9ecf354
When reserving ports don't reserve network interface speed
2016-04-07 15:47:02 -07:00
Diptanu Choudhury
57b0bbcb8b
Watching for node updates after registration completes
2016-04-01 13:41:52 -07:00
Diptanu Choudhury
c6e80582a6
Making the drivers fingerprint periodically if they are configured to do so
2016-03-31 15:15:00 -07:00
Diptanu Choudhury
e677c43667
Client not syncing services with consul until fingerprinting succeeds
2016-03-30 21:51:50 -07:00
Diptanu Choudhury
b886636f6f
Using tickers instead of creating new timers
2016-03-25 14:18:04 -07:00
Diptanu Choudhury
91db8f44f1
Changing the logic of keep services
2016-03-24 19:19:13 -07:00
Diptanu Choudhury
2a9e522ed4
Added an impl for Nomad Checks
2016-03-24 19:00:24 -07:00
Diptanu Choudhury
6a62d4f452
Fixing check registration in perform sync
2016-03-24 14:12:09 -07:00
Diptanu Choudhury
bf554992a4
Using a helper method to copy taskStates
2016-03-23 19:11:54 -07:00
Diptanu Choudhury
12ac0b4a33
Reworded the log line
2016-03-23 19:04:59 -07:00
Diptanu Choudhury
e98f5e4ee3
Locking the task states
2016-03-23 19:02:29 -07:00
Diptanu Choudhury
31baa6ce4b
Renamed vars and methods
2016-03-23 18:21:27 -07:00
Diptanu Choudhury
092f23a646
Locking on alloc runners before syncing with consul
2016-03-23 17:54:32 -07:00
Diptanu Choudhury
62242595fc
Using the name of the task and the alloc id in the service name
2016-03-23 17:35:29 -07:00
Diptanu Choudhury
ab35c187b3
Added comments
2016-03-23 15:39:25 -07:00
Diptanu Choudhury
7dab719a66
Client sync with consul and removed unwanted services
2016-03-23 15:28:55 -07:00
Diptanu Choudhury
f6a932194f
Removing references to old consul services and adding consul config to executor context
2016-03-23 12:19:19 -07:00
Alex Dadgar
719f5d34ed
Merge pull request #910 from hashicorp/f-reserved-resources
...
Reserve Client Resources + Config Validation
2016-03-15 21:09:13 -07:00
Dmitry Smirnov
7c3bb51cfa
codespell: minor spelling corrections
...
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2016-03-16 05:28:31 +11:00
Alex Dadgar
7d4c19ed99
reserve resources on the node
2016-03-13 19:05:41 -07:00
Alex Dadgar
75d5aad888
client: fix bug where pushing allocs is skipped
2016-03-10 16:18:20 -08:00
Alex Dadgar
22f4fbd652
up cached connection time
2016-03-09 10:37:56 -08:00
Alex Dadgar
cd889df20a
client: send correct node id
2016-02-22 22:43:55 -08:00
Alex Dadgar
a40f734b77
Send NodeID when updating client allocation
2016-02-22 17:25:11 -08:00
Alex Dadgar
51bacf674e
address feedback
2016-02-21 21:32:32 -08:00
Alex Dadgar
281e2ca198
Batch client allocation updates to the server
2016-02-21 21:15:02 -08:00
Alex Dadgar
2ec5d7de76
undo async update
2016-02-19 22:34:52 -08:00
Alex Dadgar
c08e3dbee8
Make updating alloc status async
2016-02-19 21:44:23 -08:00
Alex Dadgar
13e5597ca2
Reduce alloc lock contention in client
2016-02-19 19:51:55 -08:00
Alex Dadgar
d47935b455
Don't re-register as initializing
2016-02-18 23:02:28 -08:00
Alex Dadgar
96fd272422
Increase Alloc channel buffers
2016-02-18 20:43:48 -08:00
Alex Dadgar
5473b6ae63
Extract the heartbeat and saveState into their own go routines
2016-02-17 11:32:17 -08:00
Alex Dadgar
0e68c7c949
Initialize the config copy after client init
2016-02-10 19:01:57 -08:00
Alex Dadgar
e6c2b6ae9d
Slightly less node copying
2016-02-10 14:09:23 -08:00
Alex Dadgar
0c4c3fc4ee
safe but slow
2016-02-10 13:44:53 -08:00
Alex Dadgar
071216a730
Fix concurrent r/w to heartbeat time
2016-02-09 22:43:16 -08:00
Alex Dadgar
913f98f738
Make fingerprinting thread safe
2016-02-09 22:14:24 -08:00
Ivo Verberk
73ab620f61
Add comments to hasNodeChanged and remove superfluous else block
2016-02-04 08:19:34 +01:00
Ivo Verberk
d5a67aba86
Reregister node when periodic fingerprint changes node properties
2016-02-03 21:10:58 +01:00
Alex Dadgar
b5260fc14e
Fix locks and use task runners state not alloc state
2016-02-01 15:43:59 -08:00
Alex Dadgar
2d98c0eadd
Fix double pull with introduction of AllocModifyIndex
2016-02-01 15:43:59 -08:00
Ranjib Dey
4527257647
allow group and others to have executable permissions
2016-01-31 10:54:32 -08:00
Jake Champlin
e053511232
Use net.JoinHostPort
2016-01-29 05:39:28 -05:00
Jake Champlin
78814cba28
Spelling
2016-01-29 05:11:50 -05:00
Jake Champlin
9a6bd0d7fe
Updates from comments, fix tests
2016-01-28 23:11:13 -05:00
Jake Champlin
ee1be79093
Allow ports to be optional when adding servers
...
When updating a clients servers, as nomad does not use the gossip
protocol over a specified port for clients, it was required to specify
ports along with server addresses.
Now specifying ports are optional, and if unspecified the default `4647`
port is used, reflecting a notice back to the user.
2016-01-28 22:08:28 -05:00
Ivo Verberk
9c46eceeac
Cleanup code and add comments
2016-01-20 00:02:17 +01:00
Ivo Verberk
149c55252d
Merge branch 'master' into f-cli-short-ids
2016-01-15 09:19:53 +01:00
Diptanu Choudhury
e18f9d787e
Added the node id to agent info
2016-01-14 15:42:30 -08:00
Diptanu Choudhury
39b263ed7f
Refactoring some comments and test names
2016-01-14 15:07:24 -08:00