Marin
8fc52974a3
fix initial status tests
2016-08-16 14:34:36 -07:00
Marin
69bc3a8fc8
Add support for initial check status
2016-08-16 12:05:15 -07:00
Diptanu Choudhury
c1a455983d
Added the chained alloc for system scheduler
2016-08-16 10:49:45 -07:00
Alex Dadgar
ce0b78525d
inclusive range
2016-08-15 13:13:04 -07:00
Diptanu Choudhury
761cc40cd2
Fixed a make vet warning
2016-08-12 12:09:44 -07:00
Diptanu Choudhury
dd7e69006e
Not running tests parallal
2016-08-11 21:53:27 -07:00
Diptanu Choudhury
01e08a64ee
Merge pull request #1569 from hashicorp/fix-network-port-collisions
...
Fix network port collisions when asking for dyn ports
2016-08-11 16:19:45 -07:00
Diptanu Choudhury
d81b20c1a6
Fix network port collisions when asking for dyn ports
2016-08-11 16:18:45 -07:00
Alex Dadgar
007a538515
Fix core scheduler tests
2016-08-11 14:36:22 -07:00
Alex Dadgar
6e5c47a315
Merge pull request #1526 from hashicorp/b-random-ports
...
Set difference when picking random ports
2016-08-10 16:37:57 -07:00
Alex Dadgar
5a37e720c5
Fixes plus address feedback
2016-08-10 16:37:26 -07:00
Alex Dadgar
b8fd989d3a
Try stochastic and fallback to precise
2016-08-10 11:47:20 -07:00
Diptanu Choudhury
ab94c8eed9
Marking allocations which are not terminal and are on down nodes as lost
2016-08-09 13:11:58 -07:00
Diptanu Choudhury
3cc684211a
Added a test to ensure summaries are correct when a node goes down
2016-08-09 10:16:17 -07:00
Diptanu Choudhury
c63a78b9a3
Removing the check related to checking version of server before reconciling in leader
2016-08-05 17:48:37 -07:00
Diptanu Choudhury
1518f23d0a
Making servers reconcile job summaries when they acquire leadership
2016-08-05 16:47:36 -07:00
Alex Dadgar
1b620bcdd8
Add a test
2016-08-05 16:23:41 -07:00
Alex Dadgar
9089a279a1
Set difference when picking random ports
2016-08-05 16:08:35 -07:00
Diptanu Choudhury
6dc5b1972c
Setting job's create index as summary create index during reconciliation
2016-08-04 15:14:01 -07:00
Alex Dadgar
2fb67fefb5
Merge pull request #1516 from hashicorp/f-lost-state-sched
...
Make scheduler mark allocations as lost
2016-08-04 11:36:02 -07:00
Diptanu Choudhury
88d383c47f
Updated tests and comments
2016-08-04 11:29:36 -07:00
Alex Dadgar
e33bda76bf
test sched doesn't mark complete as lost + core_sched tests
2016-08-04 11:24:17 -07:00
Diptanu Choudhury
c24e8ba7d8
Not updating summary if job is de-registered
2016-08-03 17:00:08 -07:00
Diptanu Choudhury
74caed0c7a
Added an endpoint for users to reconcile job summaries
2016-08-03 16:12:47 -07:00
Alex Dadgar
ac3328e812
Make scheduler mark allocations as lost
2016-08-03 15:57:46 -07:00
Diptanu Choudhury
1b60e0823a
Added a test for restoring the summaries in fsm
2016-08-03 11:58:36 -07:00
Alex Dadgar
4197e62e78
Remove old way of marking lost
2016-08-03 11:20:56 -07:00
Diptanu Choudhury
b95cf91ee3
using the job associated with the alloc to determine if job is present
2016-08-02 19:14:05 -07:00
Diptanu Choudhury
6f8c40fca7
Not updating summary if create index of summary not same as job's create index
2016-08-02 18:59:45 -07:00
Diptanu Choudhury
b2d388bcba
Merge pull request #1508 from hashicorp/b-dont-update-job
...
Do not update the job of allocations that are being stopped
2016-08-02 18:58:39 -07:00
Alex Dadgar
2332a58944
Do not update the job of allocations that are being stopped
2016-08-02 17:53:31 -07:00
Diptanu Choudhury
87fdeb5393
Updated the logic to update job summary
2016-08-02 16:08:20 -07:00
Diptanu Choudhury
3966a46996
Updating the summary after we have updated the current allocation when client updates the alloc
2016-08-02 15:06:39 -07:00
Diptanu Choudhury
92e32e46f1
Updating the summary after we have updated the current allocation
2016-08-02 14:59:41 -07:00
Diptanu Choudhury
3aa4f39094
Checking if a job is nil before updating the allocation
2016-08-01 17:05:48 -07:00
Diptanu Choudhury
b69b7129a6
Using the parnet transaction to query the allocation while updating summary
2016-08-01 16:46:05 -07:00
Diptanu Choudhury
b0e1f02e26
Not updating job summaries if jobs are not present
2016-07-28 15:24:27 -07:00
Diptanu Choudhury
0dd8a84de0
Marking the desired state of an allocation as stop if the node on which it runs disconnects
2016-07-27 17:07:08 -07:00
Diptanu Choudhury
b857d7c6c1
Copying job summary before mutating it
2016-07-27 14:46:46 -07:00
Diptanu Choudhury
1bab053490
Updated some tests
2016-07-26 15:11:48 -07:00
Diptanu Choudhury
10a5c06a5a
Running the tests in verbose mode
2016-07-26 14:02:47 -07:00
Diptanu Choudhury
9943053239
Fixed a test
2016-07-25 22:22:55 -07:00
Diptanu Choudhury
d1a6bdb4ba
Making the queued allocations bind late
2016-07-25 22:11:11 -07:00
Diptanu Choudhury
5bded8d54d
Setting the right indexes while creating Job Summary
2016-07-25 17:51:20 -07:00
Diptanu Choudhury
3089833397
Reconciling the queued allocations during restore
2016-07-25 17:31:40 -07:00
Diptanu Choudhury
cc37ec33cf
Renamed Job.GetJobSummary to Job.Summary
2016-07-25 17:31:40 -07:00
Diptanu Choudhury
6193529040
Fixed more tests
2016-07-25 17:31:40 -07:00
Diptanu Choudhury
de2c79f421
Added test for blocking query of job summary endpoint
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
cce5f483ae
Added some more tests
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
f1c9427c37
Added code to create missing job summaries
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
dabb83063b
Review comments
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
50842b88c7
Fixed some bugs
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
1405687a88
Fixed some error messages and conditions
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
ef97956333
Added support for retreiving job summary in api
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
632ced5eb2
Adding the summary to the Job Stub
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
a5bb0ca6fc
Moved the job endpoint around
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
7bafb7c675
Updating the job summary while mutating jobs and allocation objects
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
109b05cb29
Applying changes to job updates via FSM
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
affbf5b6e4
Updating the job summary table only if an evaluation has any Queued Allocations
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
2ff2acbfc0
Added the job summary related endpoints
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
1cc0bc392b
Setting the number of queued allocations per task group
2016-07-25 17:26:38 -07:00
Alex Dadgar
e26f826189
fix job gc tests
2016-07-25 14:56:23 -07:00
Alex Dadgar
0db55c1dce
Revert "Fix job gc tests"
...
This reverts commit 4be50ac8c78b09d603d9680064391d449b268436.
2016-07-25 14:53:07 -07:00
Alex Dadgar
e61aa2484a
Fix job gc tests
2016-07-25 14:49:57 -07:00
Alex Dadgar
42df093939
Merge pull request #1456 from hashicorp/b-system-job
...
Node Register handles transistioning to ready and creating evals
2016-07-25 12:46:35 -07:00
Alex Dadgar
c4d7f62189
add down to up test
2016-07-25 12:46:18 -07:00
Alex Dadgar
90748cedad
Add killing event and mark task as not running when killed
2016-07-21 15:49:54 -07:00
Alex Dadgar
ebac5cb283
Node.Register handles the case of transistioning to ready and creating evals
2016-07-21 15:22:02 -07:00
Alex Dadgar
af09ef0832
fix validation tests
2016-07-20 16:43:20 -07:00
Alex Dadgar
e0114fee05
InitFields to Canonicalize
2016-07-20 16:08:52 -07:00
Alex Dadgar
0634eeb3e0
Sanatize incoming slices/maps
2016-07-20 16:00:02 -07:00
Diptanu Choudhury
c8a52f36d5
Merge pull request #1429 from nak3/default-resources
...
Update comments for the DefaultResources and DefaultLogConfig
2016-07-20 10:09:36 -07:00
Diptanu Choudhury
d7e397d3f9
Merge pull request #1439 from nak3/fix-error-message
...
Add missed service name of the error message for the invalid port
2016-07-20 10:08:45 -07:00
Kenjiro Nakayama
473eb6561a
Stop using index for task and task group validation
2016-07-20 22:23:35 +09:00
Kenjiro Nakayama
c24e886a5f
Add missed service name of the error message for the invalid port
2016-07-20 20:41:24 +09:00
Kenjiro Nakayama
466d7ac1ec
Update comments for the DefaultResources and DefaultLogConfig
2016-07-19 15:37:54 +09:00
Alex Dadgar
c28027bc9e
Merge pull request #1421 from hashicorp/f-system-count-zero
...
Allow count 0 on system jobs
2016-07-13 14:39:23 -06:00
Alex Dadgar
6bc7009f8c
Allow count 0 on system jobs
2016-07-13 13:50:08 -06:00
Diptanu Choudhury
3836d6e54e
Merge pull request #1383 from hashicorp/f-job-summary
...
Job Summary - Part 1
2016-07-13 13:34:27 -06:00
Diptanu Choudhury
e35369ec83
Fixed typos in comments
2016-07-13 13:25:07 -06:00
Diptanu Choudhury
487c66b84d
Removing the queued state of Job Summary and alloc desired status false
2016-07-13 13:20:46 -06:00
Alex Dadgar
c8e7b909c7
Merge pull request #1404 from hashicorp/f-streaming
...
Implement a streaming API and tail in the fs command
2016-07-12 17:23:04 -06:00
Diptanu Choudhury
daa83a4f3e
Renamed jobsummary to job_summary
2016-07-12 16:00:35 -06:00
Diptanu Choudhury
5d782abd50
Refactored the test
2016-07-12 14:37:51 -06:00
Diptanu Choudhury
00b9b4c6e8
Accounting lost state of allocations
2016-07-12 14:27:45 -06:00
Alex Dadgar
b87cf12f6f
Merge pull request #1403 from hashicorp/f-hold-rpc
...
Gracefully handle short lived outages by holding RPC calls
2016-07-12 13:52:33 -06:00
Diptanu Choudhury
e8d1aee3f4
Added a method for listing jobs whose id matches a prefix
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
313d7aa7f5
Added a test to ensure client alloc updates are happening properly
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
91b828d299
Updated logic to handle change in desired status of allocation when client status is still pending
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
6937c0f7f3
Added test for job summary restore
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
5e6f9ef69e
Added methods to save and restore job summary snapshots
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
ba71757dfb
handled the logic of task group count going up
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
67953b1583
Added a test to ensure correctness of job summary when client updates alloc
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
837b70f285
Added test to make sure summary gets deleted when job gets deleted
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
0606840080
Implemented logic to update the job summary when allocs are inserted
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
083f81d17f
Implemented job state accounting logic for upsert job
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
ebf9fbf1d6
Added a schema for summarizing status of jobs
2016-07-12 11:41:13 -06:00
Diptanu Choudhury
4ea9ceee38
Handling allocations with client state pending
2016-07-12 11:29:23 -06:00
Diptanu Choudhury
2cf2ed6758
Changing the state of an allocation to lost if the node on which it was running was marked as down
2016-07-11 18:24:04 -06:00
Diptanu Choudhury
bc0bfc3ae5
Merge pull request #1398 from hashicorp/b-check-timeout
...
Fixed the validation logic for check timeout and interval
2016-07-10 12:16:50 -07:00
Alex Dadgar
51ae7ace25
initial tail impl
2016-07-10 13:57:04 -04:00
Armon Dadgar
75abbc74a5
nomad: modify forward RPC to hold when no known leader
2016-07-10 13:36:55 -04:00
Armon Dadgar
699c4fc68c
nomad: Add RPCHoldTimeout to tune RPC hold interval
2016-07-10 13:35:48 -04:00
Diptanu Choudhury
b4fe764f07
Added a test
2016-07-08 22:33:04 -07:00
Diptanu Choudhury
19f0867816
Fixed the validation logic for check timeout
2016-07-08 22:26:03 -07:00
Diptanu Choudhury
48b9684b1e
Using net.JoinHostPort instead of handcrafting addrs
2016-07-08 16:45:14 -07:00
Diptanu Choudhury
b180223f4b
Allowing ports to be overriden in check definitions
2016-07-08 14:14:25 -07:00
Alex Dadgar
099cee067d
comments
2016-06-28 10:02:06 -07:00
Alex Dadgar
3f0a47f9e4
Disallow EvalGC to reap batch jobs evals/allocs and make JobGC only oneshot GCs everything
2016-06-27 22:54:03 -07:00
Alex Dadgar
6ca552c451
Reblock test
2016-06-24 10:26:13 -07:00
Alex Dadgar
fd3e469d5e
Remove requeue because it is a subset of EnqueueAll now
2016-06-24 10:14:34 -07:00
Alex Dadgar
2f8bb4b235
When enqueuing into eval broker always pass blocked eval's token
2016-06-23 22:40:22 -07:00
Alex Dadgar
ccf93d7e44
UnblockFailed needs to untrack the job
2016-06-23 15:35:21 -07:00
Alex Dadgar
b1c2a9ddb9
UnblockFailed needs to untrack the job
2016-06-23 15:26:26 -07:00
Alex Dadgar
3a8a27bcff
refresh index eval id in log
2016-06-22 13:48:41 -07:00
Diptanu Choudhury
e43c460534
Fixed name of a test
2016-06-22 13:04:54 -07:00
Diptanu Choudhury
0fe8746692
GC-ing dead batch jobs
2016-06-22 11:40:27 -07:00
Alex Dadgar
8ceb7ead20
Do not use snapshot
2016-06-22 09:33:15 -07:00
Alex Dadgar
91f6976423
tighter index bound when creating GC evals
2016-06-22 09:11:25 -07:00
Alex Dadgar
25decca3ca
Worker waitForIndex uses StateStore index, not Raft Applied Index
2016-06-22 09:04:22 -07:00
Sean Chittenden
8bdb38d016
Code golf
...
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden
df4fe2e502
Fix the shuffling of remote datacenters.
...
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden
9e287858de
Merge pull request #1310 from hashicorp/b-logger
...
Create and pass only one `logger` object around per Agent
2016-06-17 12:16:35 -07:00
Sean Chittenden
46e2d54acf
Provide nomad.Config
with a default LogOutput
of os.StdErr
2016-06-17 06:44:10 -07:00
Sean Chittenden
9a60999100
Pass a logger arg to NewClient
and NewServer
2016-06-16 23:29:23 -07:00
Sean Chittenden
871a31a8ec
Teach config.ConsulConfig how to construct a consulapi TLS client.
...
Said differently, centralize the creation of consul's client config
in one place and use it everywhere.
2016-06-16 22:51:06 -07:00
Sean Chittenden
d17af396ca
Create config.DefaultConsulConfig()
2016-06-16 20:41:05 -07:00
Sean Chittenden
a658299235
Misc typos
2016-06-16 16:17:17 -07:00
Sean Chittenden
ec77a1869e
Test for errors
2016-06-16 14:43:46 -07:00
Sean Chittenden
31313b68cf
Don't assign to an atomic w/o using atomic setter func
2016-06-16 14:43:46 -07:00
Sean Chittenden
af55b74114
Merge pull request #1276 from hashicorp/f-consul-server-autojoin
...
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Sean Chittenden
7c24487850
Fix up various error handling
2016-06-16 14:40:09 -07:00
Sean Chittenden
71cd9984ae
Immediately query Consul upon initialization if we have no peers.
...
Also don't attempt to join the Server with itself.
2016-06-16 14:27:10 -07:00
Sean Chittenden
65319252b9
Rework server_auto_join
to use a timer instead of the peer count.
...
It is perfectly viable for an admin to downsize a Nomad Server cluster
down to 1, 2, or `num % 2 == 0` (however ill-advised such activities
may be). And instead of using `bootstrap_expect`, use a timeout-based
strategy. If the `bootstrapFn` hasn't observed a leader in 15s it will
fall back to Consul and will poll every ~60s until it sees a leader.
2016-06-16 12:14:03 -07:00
Sean Chittenden
b0fecbefc1
Define BootstrapExepct
as an int32
so it can be manipulated atomically.
2016-06-16 12:00:15 -07:00
Alex Dadgar
ea5d11e628
remove consul reference
2016-06-15 17:23:02 -07:00
Alex Dadgar
bf14fd355f
plan displays launch time of periodic jobs
2016-06-15 13:34:45 -07:00
Sean Chittenden
14f9d2a947
Use the config's log output
2016-06-15 12:40:51 -07:00
Sean Chittenden
5b0def194a
Namespace the log messages
2016-06-15 12:40:51 -07:00
Sean Chittenden
bffc82d668
Do not consider the number of Serf members when considering falling back to Consul.
2016-06-15 12:40:51 -07:00
Sean Chittenden
324af8d7f1
Guard the auto-join functionality behind its consul.server_auto_join
tunable
2016-06-15 12:40:51 -07:00
Sean Chittenden
5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
...
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs. With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden
2123460cf0
Bump various Consul search limits
...
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.
This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Sean Chittenden
e8d1264dbc
Short-circuit the bootstrapFn if we have a leader
2016-06-15 12:40:51 -07:00
Sean Chittenden
f05514335b
Teach Nomad servers how to fall back to Consul.
2016-06-15 12:40:51 -07:00
Alex Dadgar
aea21affdb
Document consul configuration
2016-06-14 15:21:57 -07:00
Sean Chittenden
6e22b680ce
Disambiguate auto_join
from auto_register
, rename reg to auto_advertise
.
...
Provide an option that describes the value to the user vs the
operation performed by the software. Momentarily introducing
`auto_join`
2016-06-14 12:11:38 -07:00
Sean Chittenden
4f14d51013
Fix up validation and allow existing unset timeouts to continue to be unset
2016-06-13 18:55:15 -07:00
Sean Chittenden
c3a3fdc230
Upon further review, the Timeout needs to be validate for more than script checks.
...
This value is used for Consul HTTP and TCP checks.
2016-06-13 18:28:27 -07:00
Sean Chittenden
baac19cad6
Remove diff check for ServiceID, may it R.I.P.
2016-06-13 18:22:53 -07:00