Commit graph

922 commits

Author SHA1 Message Date
Alex Dadgar 6bc7009f8c Allow count 0 on system jobs 2016-07-13 13:50:08 -06:00
Alex Dadgar c8e7b909c7 Merge pull request #1404 from hashicorp/f-streaming
Implement a streaming API and tail in the fs command
2016-07-12 17:23:04 -06:00
Alex Dadgar b87cf12f6f Merge pull request #1403 from hashicorp/f-hold-rpc
Gracefully handle short lived outages by holding RPC calls
2016-07-12 13:52:33 -06:00
Diptanu Choudhury 4ea9ceee38 Handling allocations with client state pending 2016-07-12 11:29:23 -06:00
Diptanu Choudhury 2cf2ed6758 Changing the state of an allocation to lost if the node on which it was running was marked as down 2016-07-11 18:24:04 -06:00
Diptanu Choudhury bc0bfc3ae5 Merge pull request #1398 from hashicorp/b-check-timeout
Fixed the validation logic for check timeout and interval
2016-07-10 12:16:50 -07:00
Alex Dadgar 51ae7ace25 initial tail impl 2016-07-10 13:57:04 -04:00
Armon Dadgar 75abbc74a5 nomad: modify forward RPC to hold when no known leader 2016-07-10 13:36:55 -04:00
Armon Dadgar 699c4fc68c nomad: Add RPCHoldTimeout to tune RPC hold interval 2016-07-10 13:35:48 -04:00
Diptanu Choudhury b4fe764f07 Added a test 2016-07-08 22:33:04 -07:00
Diptanu Choudhury 19f0867816 Fixed the validation logic for check timeout 2016-07-08 22:26:03 -07:00
Diptanu Choudhury 48b9684b1e Using net.JoinHostPort instead of handcrafting addrs 2016-07-08 16:45:14 -07:00
Diptanu Choudhury b180223f4b Allowing ports to be overriden in check definitions 2016-07-08 14:14:25 -07:00
Alex Dadgar 099cee067d comments 2016-06-28 10:02:06 -07:00
Alex Dadgar 3f0a47f9e4 Disallow EvalGC to reap batch jobs evals/allocs and make JobGC only oneshot GCs everything 2016-06-27 22:54:03 -07:00
Alex Dadgar 6ca552c451 Reblock test 2016-06-24 10:26:13 -07:00
Alex Dadgar fd3e469d5e Remove requeue because it is a subset of EnqueueAll now 2016-06-24 10:14:34 -07:00
Alex Dadgar 2f8bb4b235 When enqueuing into eval broker always pass blocked eval's token 2016-06-23 22:40:22 -07:00
Alex Dadgar ccf93d7e44 UnblockFailed needs to untrack the job 2016-06-23 15:35:21 -07:00
Alex Dadgar b1c2a9ddb9 UnblockFailed needs to untrack the job 2016-06-23 15:26:26 -07:00
Alex Dadgar 3a8a27bcff refresh index eval id in log 2016-06-22 13:48:41 -07:00
Diptanu Choudhury e43c460534 Fixed name of a test 2016-06-22 13:04:54 -07:00
Diptanu Choudhury 0fe8746692 GC-ing dead batch jobs 2016-06-22 11:40:27 -07:00
Alex Dadgar 8ceb7ead20 Do not use snapshot 2016-06-22 09:33:15 -07:00
Alex Dadgar 91f6976423 tighter index bound when creating GC evals 2016-06-22 09:11:25 -07:00
Alex Dadgar 25decca3ca Worker waitForIndex uses StateStore index, not Raft Applied Index 2016-06-22 09:04:22 -07:00
Sean Chittenden 8bdb38d016
Code golf
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden df4fe2e502
Fix the shuffling of remote datacenters.
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden 9e287858de Merge pull request #1310 from hashicorp/b-logger
Create and pass only one `logger` object around per Agent
2016-06-17 12:16:35 -07:00
Sean Chittenden 46e2d54acf
Provide nomad.Config with a default LogOutput of os.StdErr 2016-06-17 06:44:10 -07:00
Sean Chittenden 9a60999100
Pass a logger arg to NewClient and NewServer 2016-06-16 23:29:23 -07:00
Sean Chittenden 871a31a8ec
Teach config.ConsulConfig how to construct a consulapi TLS client.
Said differently, centralize the creation of consul's client config
in one place and use it everywhere.
2016-06-16 22:51:06 -07:00
Sean Chittenden d17af396ca
Create config.DefaultConsulConfig() 2016-06-16 20:41:05 -07:00
Sean Chittenden a658299235
Misc typos 2016-06-16 16:17:17 -07:00
Sean Chittenden ec77a1869e
Test for errors 2016-06-16 14:43:46 -07:00
Sean Chittenden 31313b68cf
Don't assign to an atomic w/o using atomic setter func 2016-06-16 14:43:46 -07:00
Sean Chittenden af55b74114 Merge pull request #1276 from hashicorp/f-consul-server-autojoin
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Sean Chittenden 7c24487850
Fix up various error handling 2016-06-16 14:40:09 -07:00
Sean Chittenden 71cd9984ae
Immediately query Consul upon initialization if we have no peers.
Also don't attempt to join the Server with itself.
2016-06-16 14:27:10 -07:00
Sean Chittenden 65319252b9
Rework server_auto_join to use a timer instead of the peer count.
It is perfectly viable for an admin to downsize a Nomad Server cluster
down to 1, 2, or `num % 2 == 0` (however ill-advised such activities
may be).  And instead of using `bootstrap_expect`, use a timeout-based
strategy.  If the `bootstrapFn` hasn't observed a leader in 15s it will
fall back to Consul and will poll every ~60s until it sees a leader.
2016-06-16 12:14:03 -07:00
Sean Chittenden b0fecbefc1
Define BootstrapExepct as an int32 so it can be manipulated atomically. 2016-06-16 12:00:15 -07:00
Alex Dadgar ea5d11e628 remove consul reference 2016-06-15 17:23:02 -07:00
Alex Dadgar bf14fd355f plan displays launch time of periodic jobs 2016-06-15 13:34:45 -07:00
Sean Chittenden 14f9d2a947
Use the config's log output 2016-06-15 12:40:51 -07:00
Sean Chittenden 5b0def194a
Namespace the log messages 2016-06-15 12:40:51 -07:00
Sean Chittenden bffc82d668
Do not consider the number of Serf members when considering falling back to Consul. 2016-06-15 12:40:51 -07:00
Sean Chittenden 324af8d7f1
Guard the auto-join functionality behind its consul.server_auto_join tunable 2016-06-15 12:40:51 -07:00
Sean Chittenden 5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs.  With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden 2123460cf0
Bump various Consul search limits
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.

This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Sean Chittenden e8d1264dbc
Short-circuit the bootstrapFn if we have a leader 2016-06-15 12:40:51 -07:00