Commit Graph

255 Commits

Author SHA1 Message Date
Sean Chittenden 008d75184b
Use the `%+q` verb in log messages (vs `%q`). 2016-06-16 11:03:51 -07:00
Alex Dadgar 7375d828e1 remove trace 2016-06-15 15:47:59 -07:00
Sean Chittenden 5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs.  With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden 2123460cf0
Bump various Consul search limits
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.

This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Alex Dadgar cf99fc3173 Use Status.Peers instead of Status.Ping 2016-06-15 12:00:20 -07:00
Alex Dadgar 4b04e503f3 address comments 2016-06-13 17:32:18 -07:00
Alex Dadgar 8bbf4a55e5 Fix IDs and domain scoping 2016-06-13 16:30:58 -07:00
Diptanu Choudhury d019d8ef8e implemented reconciliation of unwanted services 2016-06-13 14:52:26 +02:00
Alex Dadgar a82c2bb058 Do not reconcile in client and cleanup executor a bit 2016-06-12 18:22:07 -07:00
Alex Dadgar 8e231fa382 Rename ConsulService back to Service 2016-06-12 16:36:49 -07:00
Alex Dadgar fdda90229f only support latest and remove ring buffer 2016-06-12 09:32:38 -07:00
Alex Dadgar e952540f6f Allocation resources returned in a struct 2016-06-11 21:04:10 -07:00
Sean Chittenden 2f036231e5 Merge pull request #1201 from hashicorp/f-dyn-server-list
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Sean Chittenden 92e2cfb0ad
Walk the DCs from nearest to most remote. 2016-06-11 18:52:21 -04:00
Sean Chittenden 2968545201
Walk the DCs from nearest to most remote, no limit on the search. 2016-06-11 18:23:06 -04:00
Sean Chittenden 917766a3df
Prefer `%+q` over `%q` in log messages. 2016-06-11 18:17:20 -04:00
Diptanu Choudhury fd60cfd585 Emitting client resource usage metrics as guages instead of k/v pairs 2016-06-11 22:17:32 +02:00
Sean Chittenden bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC) 2016-06-10 23:38:28 -04:00
Sean Chittenden bc771d35df
Query for the Nomad service across multiple Consul datacenters. 2016-06-10 23:05:14 -04:00
Sean Chittenden 26b1e826d7
golint(1) police 2016-06-10 15:54:39 -04:00
Sean Chittenden f139d0c68b
Properly guard consulPullHeartbeatDeadline behind heartbeatLock 2016-06-10 15:54:39 -04:00
Sean Chittenden ed29946f5e
Populate the RPC Proxy's server list if heartbeat did not include a leader.
It's possible that a Nomad Client is heartbeating with a Nomad server that
has become issolated from the quorum of Nomad Servers.  When 3x the
heartbeatTTL has been exceeded, append the Consul server list to the primary
primary server list.  When the next RPCProxy rebalance occurs, there is a
chance one of the servers discovered from Consul will be in the majority.
When client reattaches to a Nomad Server in the majority, it will include
a heartbeat and will reset the TTLs *AND* will clear the primary server list
to include only values from the heartbeat.
2016-06-10 15:54:39 -04:00
Sean Chittenden 9a223936bb
Generate and sync Consul ServiceIDs consistently 2016-06-10 15:54:39 -04:00
Sean Chittenden 7956eb0c80
Rename structs.Task's `Service` attribute to `ConsulService` 2016-06-10 15:54:39 -04:00
Sean Chittenden 8c813630e6
Move package client/consul/sync to command/agent/consul.
This has been done to allow the Server and Client to reuse the same
Syncer because the Agent may be running Client, Server, or both
simultaneously and we only want one Syncer object alive in the agent.
2016-06-10 15:54:39 -04:00
Sean Chittenden fda03c5c9e
Change the signature of the PeriodicCallback to return an error
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
2016-06-10 15:54:39 -04:00
Sean Chittenden 555f4fe135
Change client/consul.NewSyncer() to accept a shutdown channel
In addition to the API changing, consul.Syncer can now be signaled
to shutdown via the Shutdown() method, which will call the Run()'ing
sync task to exit gracefully.
2016-06-10 15:54:39 -04:00
Sean Chittenden 484816f5e0
Ensure that all accesses to Client.alloc are wrapped by allocLock. 2016-06-10 15:50:11 -04:00
Sean Chittenden 08cab4fdfa
Use client.getAllocRunners() where appropriate. 2016-06-10 15:50:11 -04:00
Sean Chittenden f9d0b9da32
Line wrap long line. 2016-06-10 15:50:11 -04:00
Sean Chittenden 0d201631a3
Rename rpcproxy.UpdateFromNodeUpdateResponse to RefreshServerLists
While breaking the API within this PR, break out the individual
arguments to RefreshServerLists.  The servers parameter is reusing
`structs.NodeServerInfo` for the time being, but this can be revisited
if the needs of the strucutre diverge in the future.
2016-06-10 15:50:11 -04:00
Sean Chittenden 0997fb1669
Fix up the comments
Pointed out by: @dadgar
2016-06-10 15:50:11 -04:00
Sean Chittenden aaa7d6bf40
Make the locking protocol more explicit in client.NewClient
With an over abundance of caution, preevnt future copy/pasta by
using the right locks when bootstrapping a Client.  Strictly speaking
this is not necessary, but it makes explicit the locking semantics
and guards against future concurrent or parallel initialization.
2016-06-10 15:50:11 -04:00
Sean Chittenden 525554c008
Use the client configCopy and lock appropriately. 2016-06-10 15:50:11 -04:00
Sean Chittenden 3060d6b33c
Flesh out the comment re: the client.rpcproxy.Run() task.
Requested by: Alex
2016-06-10 15:50:11 -04:00
Sean Chittenden b1ee131db8
Rename `backupServerDeadline` to `consulPullHeartbeatDeadline`
Suggested by: @alex
2016-06-10 15:50:11 -04:00
Sean Chittenden b9adfcecf5
Remove unused variable 2016-06-10 15:50:11 -04:00
Sean Chittenden f15eeb8f27
Clean up some docs and comments to be more accurate 2016-06-10 15:50:11 -04:00
Sean Chittenden c78b0a6567
Remove unused constants 2016-06-10 15:50:11 -04:00
Sean Chittenden 39fb0f2469
Change the endpoint for `/v1/agent/servers` and fix tests.
When an agent is running a server, the list of servers includes the
Raft peers.  When the agent is running a client (which is always the
case?), include a list of the servers found in the Client's RpcProxy.
Dedupe and provide a unique list back to the caller.
2016-06-10 15:50:11 -04:00
Sean Chittenden cb80e93a6b
Move client.DefaultConfig() to client/config.DefaultConfig()
Resolves an import cycle in testing and is more appropriate because
the default should reside next to its struct definition.
2016-06-10 15:50:11 -04:00
Sean Chittenden bff57a0dce
Reconcile, clean up, and centralize API version numbers (major and minor).
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).

Background: hashicorp/consul/issues/1346#issuecomment-151663152
2016-06-10 15:50:11 -04:00
Sean Chittenden 36340c654d
Improve language re: fingerprinting 2016-06-10 15:50:11 -04:00
Sean Chittenden 438becb28b
Pass the datacenter name in the heartbeat
Servers that are part of a different datacenter are added as backup
servers instead of primary servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden d914f262f9
Consolidate all consul sync periodic go routines to handlers.
Only one pump and periodic loop now.
2016-06-10 15:50:11 -04:00
Sean Chittenden 9fb0104def
Teach Client to reuse an Agent's consulSyncer.
"There can be only one."
2016-06-10 15:50:11 -04:00
Sean Chittenden 47891fb559
Register two services each for clients and servers, http and rpc.
In order to give clients a fighting chance to talk to the right port,
differentiate RPC services from HTTP services by registering two
services with different tags.  This yields
`rpc.nomad-server.service.consul` and
`http.nomad-server.service.consul` which is immensely more useful to
clients attempting to bootstrap their world.
2016-06-10 15:50:11 -04:00
Sean Chittenden 861f900225
s/RpcVersion/RPCVersion/g 2016-06-10 15:50:11 -04:00
Sean Chittenden 9f8d6c9f67
Move struct member to reduce diff context 2016-06-10 15:50:11 -04:00
Sean Chittenden 410d85cc78
Rename the package from `client/rpc_proxy` to `client/rpcproxy`
Also rename `NewRpcProxy()` to just `New()` to avoid package stutter.
2016-06-10 15:50:11 -04:00
Sean Chittenden ec7c78f533
Only poll Consul for servers when Nomad heartbeats begin to fail
When a deadline timer of 2x Server's last requested TTL expires,
begin polling Consul for Nomad Servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden 17116fc5a7
Rebalance Nomad client RPCs among different Nomad servers.
Implement client/rpc_proxy.RpcProxy.
2016-06-10 15:50:11 -04:00
Sean Chittenden 88f3422d7c
Rename NewConsulService to NewSyncer 2016-06-10 15:49:37 -04:00
Sean Chittenden 768aab015d
Rename client/consul/sync.ConsulService to client/consul/sync.Syncer
Syncer describes the responsibility and actions of the type.
2016-06-10 15:49:37 -04:00
Sean Chittenden 73e173673e
Rename client/config/config's ConsulConfig to ConsulAgentConfig
A follow up commit to the previous rename.  More to come.
2016-06-10 15:48:36 -04:00
Sean Chittenden e36686a17d
Use consul/lib's RandomStagger
Removes four redundant copies of the method in the process.
2016-06-10 15:48:36 -04:00
Diptanu Choudhury 6bf38bb213 Fixed the svc id generation scheme while pruning services 2016-06-07 08:14:11 -07:00
Alex Dadgar ba1a92eb8c Handle errors during stats collection 2016-06-03 14:23:18 -07:00
Diptanu Choudhury 35e31c1b81 Enqueing metrics only if they are not nil 2016-06-02 17:14:15 -04:00
Diptanu Choudhury 7efde782fa Sending metrics for tasks as well 2016-06-01 16:42:16 +02:00
Diptanu Choudhury 7f172f314f Pushing host stats 2016-05-31 04:05:47 +02:00
Diptanu Choudhury c0dc6cfbf2 Changing the api of the stats endpoints 2016-05-28 19:59:20 -07:00
Diptanu Choudhury a0c279f3b2 comments 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 4c85f02072 Stopping the metrics collector timers using defer and starting to collect host stats right away 2016-05-28 19:59:20 -07:00
Diptanu Choudhury b2e8495971 Fixing the alloc runner tests 2016-05-28 19:59:20 -07:00
Diptanu Choudhury f93820f40c Added comments 2016-05-28 19:59:20 -07:00
Diptanu Choudhury c2da19bf11 Refactored the api for NewHostStatsCollector 2016-05-28 19:59:20 -07:00
Diptanu Choudhury c46400597e Making the stats collection interval and number of data points to keep in memory configurable 2016-05-28 19:59:20 -07:00
Diptanu Choudhury b6e5227e7a Renamed monitorUsage method 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 666b419dba Acquiring locks before iterating allocations and tasks 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 73fa700a88 Showing host resource usage stats 2016-05-28 19:59:20 -07:00
Diptanu Choudhury 9a96dddc07 Added some docs to resource stats endpoint 2016-05-28 19:42:34 -07:00
Diptanu Choudhury b9feae89ce Making the conversion to Stats simpler 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 7569b1af2e Collecting host stats 2016-05-28 19:42:34 -07:00
Diptanu Choudhury f3d0aecafe Reporting time series of stats 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 0fb0e0237f Added a client API to display resource usage of an allocation 2016-05-28 19:42:34 -07:00
Diptanu Choudhury 2a99a2cfb6 Removing addition of the client service while reconciling services 2016-05-13 10:34:21 -07:00
Diptanu Choudhury 439d7bf326 Fixed an agent test 2016-05-11 17:26:53 -07:00
Diptanu Choudhury 347cb890d2 Removed allocID and task name from consul service 2016-05-11 16:26:41 -07:00
Diptanu Choudhury 7287376eca Using consul config from client config instead of reading from client options 2016-05-11 16:10:57 -07:00
Diptanu Choudhury 83fed62a0a Implemented registering client and server services 2016-05-11 16:07:02 -07:00
Diptanu Choudhury 2f8a3532ad Refactored the signature of NewConsulService 2016-05-11 15:22:58 -07:00
Sean Chittenden 53d4681b61
Emit various debugging information with the results of the fingerprinter 2016-05-09 12:21:51 -07:00
Diptanu Choudhury 2941b26244 Reading consul attr from copy of node attributes 2016-04-11 20:13:28 -04:00
Alex Dadgar f3d9ecf354 When reserving ports don't reserve network interface speed 2016-04-07 15:47:02 -07:00
Diptanu Choudhury 57b0bbcb8b Watching for node updates after registration completes 2016-04-01 13:41:52 -07:00
Diptanu Choudhury c6e80582a6 Making the drivers fingerprint periodically if they are configured to do so 2016-03-31 15:15:00 -07:00
Diptanu Choudhury e677c43667 Client not syncing services with consul until fingerprinting succeeds 2016-03-30 21:51:50 -07:00
Diptanu Choudhury b886636f6f Using tickers instead of creating new timers 2016-03-25 14:18:04 -07:00
Diptanu Choudhury 91db8f44f1 Changing the logic of keep services 2016-03-24 19:19:13 -07:00
Diptanu Choudhury 2a9e522ed4 Added an impl for Nomad Checks 2016-03-24 19:00:24 -07:00
Diptanu Choudhury 6a62d4f452 Fixing check registration in perform sync 2016-03-24 14:12:09 -07:00
Diptanu Choudhury bf554992a4 Using a helper method to copy taskStates 2016-03-23 19:11:54 -07:00
Diptanu Choudhury 12ac0b4a33 Reworded the log line 2016-03-23 19:04:59 -07:00
Diptanu Choudhury e98f5e4ee3 Locking the task states 2016-03-23 19:02:29 -07:00
Diptanu Choudhury 31baa6ce4b Renamed vars and methods 2016-03-23 18:21:27 -07:00
Diptanu Choudhury 092f23a646 Locking on alloc runners before syncing with consul 2016-03-23 17:54:32 -07:00
Diptanu Choudhury 62242595fc Using the name of the task and the alloc id in the service name 2016-03-23 17:35:29 -07:00
Diptanu Choudhury ab35c187b3 Added comments 2016-03-23 15:39:25 -07:00
Diptanu Choudhury 7dab719a66 Client sync with consul and removed unwanted services 2016-03-23 15:28:55 -07:00