Commit graph

1231 commits

Author SHA1 Message Date
Alex Dadgar 7d899b6c60 Pass Vault config to client 2016-08-17 16:23:29 -07:00
Alex Dadgar 14b4312502 Add vault struct 2016-08-17 16:23:29 -07:00
Alex Dadgar eac2675faf Add enabled field 2016-08-17 16:23:29 -07:00
Alex Dadgar c913e4396f Add Vault config to server 2016-08-17 16:23:29 -07:00
Alex Dadgar 1584cfe93e small fixes 2016-08-17 16:23:29 -07:00
Alex Dadgar 0ca4a9fa4f Change token/role names 2016-08-17 16:23:29 -07:00
Alex Dadgar adb3ce847f change config variable names to match vault 2016-08-17 16:23:29 -07:00
Alex Dadgar fab7893774 vendor + api 2016-08-17 16:23:29 -07:00
Alex Dadgar b32128aa23 Initial config block 2016-08-17 16:23:29 -07:00
Alex Dadgar de6e662eb4 Fix service validate test 2016-08-17 11:09:40 -07:00
Alex Dadgar a3bcc1cbb1 Fix network dynamic port test 2016-08-17 11:08:21 -07:00
Alex Dadgar be51f1b265 Fix TaskDiff test 2016-08-17 11:07:11 -07:00
Kenjiro Nakayama b6c39349b7 struct: tiny: output case number of the diff test 2016-08-17 19:15:59 +09:00
Diptanu Choudhury 2e22fea61d Merge pull request #1599 from hoffoo/initial_check_state
Add support for initial check status
2016-08-16 15:16:47 -07:00
Alex Dadgar 8a23780aee Fix bitmap test and check bitmap bounds 2016-08-16 15:16:35 -07:00
Marin 8fc52974a3 fix initial status tests 2016-08-16 14:34:36 -07:00
Marin 69bc3a8fc8 Add support for initial check status 2016-08-16 12:05:15 -07:00
Diptanu Choudhury c1a455983d Added the chained alloc for system scheduler 2016-08-16 10:49:45 -07:00
Alex Dadgar ce0b78525d inclusive range 2016-08-15 13:13:04 -07:00
Diptanu Choudhury 761cc40cd2 Fixed a make vet warning 2016-08-12 12:09:44 -07:00
Diptanu Choudhury dd7e69006e Not running tests parallal 2016-08-11 21:53:27 -07:00
Diptanu Choudhury 01e08a64ee Merge pull request #1569 from hashicorp/fix-network-port-collisions
Fix network port collisions when asking for dyn ports
2016-08-11 16:19:45 -07:00
Diptanu Choudhury d81b20c1a6 Fix network port collisions when asking for dyn ports 2016-08-11 16:18:45 -07:00
Alex Dadgar 007a538515 Fix core scheduler tests 2016-08-11 14:36:22 -07:00
Alex Dadgar 6e5c47a315 Merge pull request #1526 from hashicorp/b-random-ports
Set difference when picking random ports
2016-08-10 16:37:57 -07:00
Alex Dadgar 5a37e720c5 Fixes plus address feedback 2016-08-10 16:37:26 -07:00
Alex Dadgar b8fd989d3a Try stochastic and fallback to precise 2016-08-10 11:47:20 -07:00
Diptanu Choudhury ab94c8eed9 Marking allocations which are not terminal and are on down nodes as lost 2016-08-09 13:11:58 -07:00
Diptanu Choudhury 3cc684211a Added a test to ensure summaries are correct when a node goes down 2016-08-09 10:16:17 -07:00
Diptanu Choudhury c63a78b9a3 Removing the check related to checking version of server before reconciling in leader 2016-08-05 17:48:37 -07:00
Diptanu Choudhury 1518f23d0a Making servers reconcile job summaries when they acquire leadership 2016-08-05 16:47:36 -07:00
Alex Dadgar 1b620bcdd8 Add a test 2016-08-05 16:23:41 -07:00
Alex Dadgar 9089a279a1 Set difference when picking random ports 2016-08-05 16:08:35 -07:00
Diptanu Choudhury 6dc5b1972c Setting job's create index as summary create index during reconciliation 2016-08-04 15:14:01 -07:00
Alex Dadgar 2fb67fefb5 Merge pull request #1516 from hashicorp/f-lost-state-sched
Make scheduler mark allocations as lost
2016-08-04 11:36:02 -07:00
Diptanu Choudhury 88d383c47f Updated tests and comments 2016-08-04 11:29:36 -07:00
Alex Dadgar e33bda76bf test sched doesn't mark complete as lost + core_sched tests 2016-08-04 11:24:17 -07:00
Diptanu Choudhury c24e8ba7d8 Not updating summary if job is de-registered 2016-08-03 17:00:08 -07:00
Diptanu Choudhury 74caed0c7a Added an endpoint for users to reconcile job summaries 2016-08-03 16:12:47 -07:00
Alex Dadgar ac3328e812 Make scheduler mark allocations as lost 2016-08-03 15:57:46 -07:00
Diptanu Choudhury 1b60e0823a Added a test for restoring the summaries in fsm 2016-08-03 11:58:36 -07:00
Alex Dadgar 4197e62e78 Remove old way of marking lost 2016-08-03 11:20:56 -07:00
Diptanu Choudhury b95cf91ee3 using the job associated with the alloc to determine if job is present 2016-08-02 19:14:05 -07:00
Diptanu Choudhury 6f8c40fca7 Not updating summary if create index of summary not same as job's create index 2016-08-02 18:59:45 -07:00
Diptanu Choudhury b2d388bcba Merge pull request #1508 from hashicorp/b-dont-update-job
Do not update the job of allocations that are being stopped
2016-08-02 18:58:39 -07:00
Alex Dadgar 2332a58944 Do not update the job of allocations that are being stopped 2016-08-02 17:53:31 -07:00
Diptanu Choudhury 87fdeb5393 Updated the logic to update job summary 2016-08-02 16:08:20 -07:00
Diptanu Choudhury 3966a46996 Updating the summary after we have updated the current allocation when client updates the alloc 2016-08-02 15:06:39 -07:00
Diptanu Choudhury 92e32e46f1 Updating the summary after we have updated the current allocation 2016-08-02 14:59:41 -07:00
Diptanu Choudhury 3aa4f39094 Checking if a job is nil before updating the allocation 2016-08-01 17:05:48 -07:00
Diptanu Choudhury b69b7129a6 Using the parnet transaction to query the allocation while updating summary 2016-08-01 16:46:05 -07:00
Diptanu Choudhury b0e1f02e26 Not updating job summaries if jobs are not present 2016-07-28 15:24:27 -07:00
Diptanu Choudhury 0dd8a84de0 Marking the desired state of an allocation as stop if the node on which it runs disconnects 2016-07-27 17:07:08 -07:00
Diptanu Choudhury b857d7c6c1 Copying job summary before mutating it 2016-07-27 14:46:46 -07:00
Diptanu Choudhury 1bab053490 Updated some tests 2016-07-26 15:11:48 -07:00
Diptanu Choudhury 10a5c06a5a Running the tests in verbose mode 2016-07-26 14:02:47 -07:00
Diptanu Choudhury 9943053239 Fixed a test 2016-07-25 22:22:55 -07:00
Diptanu Choudhury d1a6bdb4ba Making the queued allocations bind late 2016-07-25 22:11:11 -07:00
Diptanu Choudhury 5bded8d54d Setting the right indexes while creating Job Summary 2016-07-25 17:51:20 -07:00
Diptanu Choudhury 3089833397 Reconciling the queued allocations during restore 2016-07-25 17:31:40 -07:00
Diptanu Choudhury cc37ec33cf Renamed Job.GetJobSummary to Job.Summary 2016-07-25 17:31:40 -07:00
Diptanu Choudhury 6193529040 Fixed more tests 2016-07-25 17:31:40 -07:00
Diptanu Choudhury de2c79f421 Added test for blocking query of job summary endpoint 2016-07-25 17:26:38 -07:00
Diptanu Choudhury cce5f483ae Added some more tests 2016-07-25 17:26:38 -07:00
Diptanu Choudhury f1c9427c37 Added code to create missing job summaries 2016-07-25 17:26:38 -07:00
Diptanu Choudhury dabb83063b Review comments 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 50842b88c7 Fixed some bugs 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 1405687a88 Fixed some error messages and conditions 2016-07-25 17:26:38 -07:00
Diptanu Choudhury ef97956333 Added support for retreiving job summary in api 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 632ced5eb2 Adding the summary to the Job Stub 2016-07-25 17:26:38 -07:00
Diptanu Choudhury a5bb0ca6fc Moved the job endpoint around 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 7bafb7c675 Updating the job summary while mutating jobs and allocation objects 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 109b05cb29 Applying changes to job updates via FSM 2016-07-25 17:26:38 -07:00
Diptanu Choudhury affbf5b6e4 Updating the job summary table only if an evaluation has any Queued Allocations 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 2ff2acbfc0 Added the job summary related endpoints 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 1cc0bc392b Setting the number of queued allocations per task group 2016-07-25 17:26:38 -07:00
Alex Dadgar e26f826189 fix job gc tests 2016-07-25 14:56:23 -07:00
Alex Dadgar 0db55c1dce Revert "Fix job gc tests"
This reverts commit 4be50ac8c78b09d603d9680064391d449b268436.
2016-07-25 14:53:07 -07:00
Alex Dadgar e61aa2484a Fix job gc tests 2016-07-25 14:49:57 -07:00
Alex Dadgar 42df093939 Merge pull request #1456 from hashicorp/b-system-job
Node Register handles transistioning to ready and creating evals
2016-07-25 12:46:35 -07:00
Alex Dadgar c4d7f62189 add down to up test 2016-07-25 12:46:18 -07:00
Alex Dadgar 90748cedad Add killing event and mark task as not running when killed 2016-07-21 15:49:54 -07:00
Alex Dadgar ebac5cb283 Node.Register handles the case of transistioning to ready and creating evals 2016-07-21 15:22:02 -07:00
Alex Dadgar af09ef0832 fix validation tests 2016-07-20 16:43:20 -07:00
Alex Dadgar e0114fee05 InitFields to Canonicalize 2016-07-20 16:08:52 -07:00
Alex Dadgar 0634eeb3e0 Sanatize incoming slices/maps 2016-07-20 16:00:02 -07:00
Diptanu Choudhury c8a52f36d5 Merge pull request #1429 from nak3/default-resources
Update comments for the DefaultResources and DefaultLogConfig
2016-07-20 10:09:36 -07:00
Diptanu Choudhury d7e397d3f9 Merge pull request #1439 from nak3/fix-error-message
Add missed service name of the error message for the invalid port
2016-07-20 10:08:45 -07:00
Kenjiro Nakayama 473eb6561a Stop using index for task and task group validation 2016-07-20 22:23:35 +09:00
Kenjiro Nakayama c24e886a5f Add missed service name of the error message for the invalid port 2016-07-20 20:41:24 +09:00
Kenjiro Nakayama 466d7ac1ec Update comments for the DefaultResources and DefaultLogConfig 2016-07-19 15:37:54 +09:00
Alex Dadgar c28027bc9e Merge pull request #1421 from hashicorp/f-system-count-zero
Allow count 0 on system jobs
2016-07-13 14:39:23 -06:00
Alex Dadgar 6bc7009f8c Allow count 0 on system jobs 2016-07-13 13:50:08 -06:00
Diptanu Choudhury 3836d6e54e Merge pull request #1383 from hashicorp/f-job-summary
Job Summary - Part 1
2016-07-13 13:34:27 -06:00
Diptanu Choudhury e35369ec83 Fixed typos in comments 2016-07-13 13:25:07 -06:00
Diptanu Choudhury 487c66b84d Removing the queued state of Job Summary and alloc desired status false 2016-07-13 13:20:46 -06:00
Alex Dadgar c8e7b909c7 Merge pull request #1404 from hashicorp/f-streaming
Implement a streaming API and tail in the fs command
2016-07-12 17:23:04 -06:00
Diptanu Choudhury daa83a4f3e Renamed jobsummary to job_summary 2016-07-12 16:00:35 -06:00
Diptanu Choudhury 5d782abd50 Refactored the test 2016-07-12 14:37:51 -06:00
Diptanu Choudhury 00b9b4c6e8 Accounting lost state of allocations 2016-07-12 14:27:45 -06:00
Alex Dadgar b87cf12f6f Merge pull request #1403 from hashicorp/f-hold-rpc
Gracefully handle short lived outages by holding RPC calls
2016-07-12 13:52:33 -06:00
Diptanu Choudhury e8d1aee3f4 Added a method for listing jobs whose id matches a prefix 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 313d7aa7f5 Added a test to ensure client alloc updates are happening properly 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 91b828d299 Updated logic to handle change in desired status of allocation when client status is still pending 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 6937c0f7f3 Added test for job summary restore 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 5e6f9ef69e Added methods to save and restore job summary snapshots 2016-07-12 11:41:13 -06:00
Diptanu Choudhury ba71757dfb handled the logic of task group count going up 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 67953b1583 Added a test to ensure correctness of job summary when client updates alloc 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 837b70f285 Added test to make sure summary gets deleted when job gets deleted 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 0606840080 Implemented logic to update the job summary when allocs are inserted 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 083f81d17f Implemented job state accounting logic for upsert job 2016-07-12 11:41:13 -06:00
Diptanu Choudhury ebf9fbf1d6 Added a schema for summarizing status of jobs 2016-07-12 11:41:13 -06:00
Diptanu Choudhury 4ea9ceee38 Handling allocations with client state pending 2016-07-12 11:29:23 -06:00
Diptanu Choudhury 2cf2ed6758 Changing the state of an allocation to lost if the node on which it was running was marked as down 2016-07-11 18:24:04 -06:00
Diptanu Choudhury bc0bfc3ae5 Merge pull request #1398 from hashicorp/b-check-timeout
Fixed the validation logic for check timeout and interval
2016-07-10 12:16:50 -07:00
Alex Dadgar 51ae7ace25 initial tail impl 2016-07-10 13:57:04 -04:00
Armon Dadgar 75abbc74a5 nomad: modify forward RPC to hold when no known leader 2016-07-10 13:36:55 -04:00
Armon Dadgar 699c4fc68c nomad: Add RPCHoldTimeout to tune RPC hold interval 2016-07-10 13:35:48 -04:00
Diptanu Choudhury b4fe764f07 Added a test 2016-07-08 22:33:04 -07:00
Diptanu Choudhury 19f0867816 Fixed the validation logic for check timeout 2016-07-08 22:26:03 -07:00
Diptanu Choudhury 48b9684b1e Using net.JoinHostPort instead of handcrafting addrs 2016-07-08 16:45:14 -07:00
Diptanu Choudhury b180223f4b Allowing ports to be overriden in check definitions 2016-07-08 14:14:25 -07:00
Alex Dadgar 099cee067d comments 2016-06-28 10:02:06 -07:00
Alex Dadgar 3f0a47f9e4 Disallow EvalGC to reap batch jobs evals/allocs and make JobGC only oneshot GCs everything 2016-06-27 22:54:03 -07:00
Alex Dadgar 6ca552c451 Reblock test 2016-06-24 10:26:13 -07:00
Alex Dadgar fd3e469d5e Remove requeue because it is a subset of EnqueueAll now 2016-06-24 10:14:34 -07:00
Alex Dadgar 2f8bb4b235 When enqueuing into eval broker always pass blocked eval's token 2016-06-23 22:40:22 -07:00
Alex Dadgar ccf93d7e44 UnblockFailed needs to untrack the job 2016-06-23 15:35:21 -07:00
Alex Dadgar b1c2a9ddb9 UnblockFailed needs to untrack the job 2016-06-23 15:26:26 -07:00
Alex Dadgar 3a8a27bcff refresh index eval id in log 2016-06-22 13:48:41 -07:00
Diptanu Choudhury e43c460534 Fixed name of a test 2016-06-22 13:04:54 -07:00
Diptanu Choudhury 0fe8746692 GC-ing dead batch jobs 2016-06-22 11:40:27 -07:00
Alex Dadgar 8ceb7ead20 Do not use snapshot 2016-06-22 09:33:15 -07:00
Alex Dadgar 91f6976423 tighter index bound when creating GC evals 2016-06-22 09:11:25 -07:00
Alex Dadgar 25decca3ca Worker waitForIndex uses StateStore index, not Raft Applied Index 2016-06-22 09:04:22 -07:00
Sean Chittenden 8bdb38d016
Code golf
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden df4fe2e502
Fix the shuffling of remote datacenters.
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden 9e287858de Merge pull request #1310 from hashicorp/b-logger
Create and pass only one `logger` object around per Agent
2016-06-17 12:16:35 -07:00
Sean Chittenden 46e2d54acf
Provide nomad.Config with a default LogOutput of os.StdErr 2016-06-17 06:44:10 -07:00
Sean Chittenden 9a60999100
Pass a logger arg to NewClient and NewServer 2016-06-16 23:29:23 -07:00
Sean Chittenden 871a31a8ec
Teach config.ConsulConfig how to construct a consulapi TLS client.
Said differently, centralize the creation of consul's client config
in one place and use it everywhere.
2016-06-16 22:51:06 -07:00
Sean Chittenden d17af396ca
Create config.DefaultConsulConfig() 2016-06-16 20:41:05 -07:00
Sean Chittenden a658299235
Misc typos 2016-06-16 16:17:17 -07:00
Sean Chittenden ec77a1869e
Test for errors 2016-06-16 14:43:46 -07:00
Sean Chittenden 31313b68cf
Don't assign to an atomic w/o using atomic setter func 2016-06-16 14:43:46 -07:00
Sean Chittenden af55b74114 Merge pull request #1276 from hashicorp/f-consul-server-autojoin
Teach Nomad servers how to fall back to Consul.
2016-06-16 14:40:45 -07:00
Sean Chittenden 7c24487850
Fix up various error handling 2016-06-16 14:40:09 -07:00
Sean Chittenden 71cd9984ae
Immediately query Consul upon initialization if we have no peers.
Also don't attempt to join the Server with itself.
2016-06-16 14:27:10 -07:00
Sean Chittenden 65319252b9
Rework server_auto_join to use a timer instead of the peer count.
It is perfectly viable for an admin to downsize a Nomad Server cluster
down to 1, 2, or `num % 2 == 0` (however ill-advised such activities
may be).  And instead of using `bootstrap_expect`, use a timeout-based
strategy.  If the `bootstrapFn` hasn't observed a leader in 15s it will
fall back to Consul and will poll every ~60s until it sees a leader.
2016-06-16 12:14:03 -07:00
Sean Chittenden b0fecbefc1
Define BootstrapExepct as an int32 so it can be manipulated atomically. 2016-06-16 12:00:15 -07:00
Alex Dadgar ea5d11e628 remove consul reference 2016-06-15 17:23:02 -07:00
Alex Dadgar bf14fd355f plan displays launch time of periodic jobs 2016-06-15 13:34:45 -07:00
Sean Chittenden 14f9d2a947
Use the config's log output 2016-06-15 12:40:51 -07:00
Sean Chittenden 5b0def194a
Namespace the log messages 2016-06-15 12:40:51 -07:00
Sean Chittenden bffc82d668
Do not consider the number of Serf members when considering falling back to Consul. 2016-06-15 12:40:51 -07:00
Sean Chittenden 324af8d7f1
Guard the auto-join functionality behind its consul.server_auto_join tunable 2016-06-15 12:40:51 -07:00
Sean Chittenden 5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs.  With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden 2123460cf0
Bump various Consul search limits
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.

This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Sean Chittenden e8d1264dbc
Short-circuit the bootstrapFn if we have a leader 2016-06-15 12:40:51 -07:00
Sean Chittenden f05514335b
Teach Nomad servers how to fall back to Consul. 2016-06-15 12:40:51 -07:00
Alex Dadgar aea21affdb Document consul configuration 2016-06-14 15:21:57 -07:00
Sean Chittenden 6e22b680ce
Disambiguate auto_join from auto_register, rename reg to auto_advertise.
Provide an option that describes the value to the user vs the
operation performed by the software.  Momentarily introducing
`auto_join`
2016-06-14 12:11:38 -07:00
Sean Chittenden 4f14d51013
Fix up validation and allow existing unset timeouts to continue to be unset 2016-06-13 18:55:15 -07:00
Sean Chittenden c3a3fdc230
Upon further review, the Timeout needs to be validate for more than script checks.
This value is used for Consul HTTP and TCP checks.
2016-06-13 18:28:27 -07:00
Sean Chittenden baac19cad6
Remove diff check for ServiceID, may it R.I.P. 2016-06-13 18:22:53 -07:00
Sean Chittenden 79c675cf72
Guard against an interval and timeout being less than 1s 2016-06-13 18:19:40 -07:00
Sean Chittenden af8db7ec18
Don't export ServiceCheck validate 2016-06-13 18:17:43 -07:00
Sean Chittenden 08c88102a7
There is no "docker" check type 2016-06-13 18:15:07 -07:00
Alex Dadgar 8bbf4a55e5 Fix IDs and domain scoping 2016-06-13 16:30:58 -07:00
Alex Dadgar 8e231fa382 Rename ConsulService back to Service 2016-06-12 16:36:49 -07:00
Diptanu Choudhury 3024c080e8 Removing artifact check for java and qemu drivers 2016-06-12 12:57:35 +02:00
Alex Dadgar 480a281031 Merge pull request #1243 from hashicorp/f-run-modify-index
Add check-index flag to nomad run
2016-06-11 16:12:53 -07:00
Sean Chittenden 2f036231e5 Merge pull request #1201 from hashicorp/f-dyn-server-list
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Alex Dadgar 59b0a7b3f6 Merge pull request #1256 from hashicorp/b-node-gc
Improve partial garbage collection of allocations
2016-06-11 15:41:00 -07:00
Sean Chittenden bbd8dfa798
goling(1) compliance pass (e.g. Rpc* -> RPC) 2016-06-10 23:38:28 -04:00
Alex Dadgar 98bf249625 Partial GC allocations 2016-06-10 18:32:37 -07:00
Alex Dadgar 7ccc7d20a0 test 2016-06-10 15:48:59 -07:00
Alex Dadgar b064b392fc Only unblock if missed class was added after eval snapshot index 2016-06-10 15:24:06 -07:00
Sean Chittenden 948663c89a
Fix another unit test not expecting ServiceID 2016-06-10 16:50:35 -04:00
Sean Chittenden d99467ef5e
Always create a consul.Syncer. Use a default Consul Config if necessary. 2016-06-10 15:55:27 -04:00
Sean Chittenden 3d64daafd9
Fold RaftPeers() into its only call site now 2016-06-10 15:54:39 -04:00
Sean Chittenden 0ba1da9c9c
Always pass in a snapshot before calling constructNodeServerInfoResponse() 2016-06-10 15:54:39 -04:00
Sean Chittenden 1df6fc253f
Rename updateNodeUpdateResponse to constructNodeServerInfoResponse 2016-06-10 15:54:39 -04:00
Sean Chittenden 077203fe93
Update the structure of ConsulService to match reality.
ConsulService is the configuration for a Consul Service
2016-06-10 15:54:39 -04:00
Sean Chittenden 197feae679
Sync services with Consul by comparing the AgentServiceReg w/ ConsulService
The source of truth is the local Nomad Agent.  Any services not local that
have a matching prefix are removed.  Changed services are re-registered
and missing services are re-added.
2016-06-10 15:54:39 -04:00
Sean Chittenden 9a223936bb
Generate and sync Consul ServiceIDs consistently 2016-06-10 15:54:39 -04:00
Sean Chittenden 95c9d1a63e
Per-comment, remove structs.Allocation's Services attribute.
Nuke PopulateServiceIDs() now that it's also no longer needed.
2016-06-10 15:54:39 -04:00
Sean Chittenden 7956eb0c80
Rename structs.Task's Service attribute to ConsulService 2016-06-10 15:54:39 -04:00
Sean Chittenden fda03c5c9e
Change the signature of the PeriodicCallback to return an error
I *KNEW* I should have done this when I wrote it, but didn't want to
go back and audit the handlers to include the appropriate return
handling, but now that the code is taking shape, make this change.
2016-06-10 15:54:39 -04:00
Sean Chittenden 4973ec32bb
Rename structs.Services to structs.ConsulServices 2016-06-10 15:54:39 -04:00
Sean Chittenden 060300007e
Use a monotonically incrementing number to create unique node names.
Also remove the space from the "name" of the node
2016-06-10 15:50:11 -04:00
Sean Chittenden 1ec7d6c266
Push down the server list even on node registration and evaluation
Be mindful of the cost of taking a snapshot from the statestore and
reuse the snapshot if one has already been taken.
2016-06-10 15:50:11 -04:00
Sean Chittenden bff57a0dce
Reconcile, clean up, and centralize API version numbers (major and minor).
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).

Background: hashicorp/consul/issues/1346#issuecomment-151663152
2016-06-10 15:50:11 -04:00
Sean Chittenden dde6a4074d
Nuke trace-level logging in heartbeats 2016-06-10 15:50:11 -04:00
Sean Chittenden d76c042a13
Invert error handling logic 2016-06-10 15:50:11 -04:00
Sean Chittenden 1fe979a5e4
Remove types.ShutdownChannel and replace with chan struct{} 2016-06-10 15:50:11 -04:00
Sean Chittenden 438becb28b
Pass the datacenter name in the heartbeat
Servers that are part of a different datacenter are added as backup
servers instead of primary servers.
2016-06-10 15:50:11 -04:00
Sean Chittenden 89168b0c51
Invert check definition so the error is first 2016-06-10 15:50:11 -04:00
Sean Chittenden dc78baedfd
Fix typo in the comment to reflect the actual function name. 2016-06-10 15:50:11 -04:00
Sean Chittenden 410d85cc78
Rename the package from client/rpc_proxy to client/rpcproxy
Also rename `NewRpcProxy()` to just `New()` to avoid package stutter.
2016-06-10 15:50:11 -04:00
Sean Chittenden 1aefdb1e15
Use the correctly typed rand.Int* variant 2016-06-10 15:50:11 -04:00
Sean Chittenden 3a1dc9a194
Use rand.Int*n() where appropriate 2016-06-10 15:50:11 -04:00
Sean Chittenden e727fd8c3c
Centralize the creation of a consul/api.Config struct.
While documented, the consul.timeout parameter wasn't ever set
except one-off in the Consul fingerprinter.
2016-06-10 15:50:11 -04:00
Sean Chittenden f695d6d70d
Reconcile consul's address configuration section.
There were conflicting directives previously, both consul.addr and
consul.address were required to achieve the desired behavior.  The
documentation said `consul.address` was the canonical name for the
parameter, so consolidate configuration parameters to `consul.address`.
2016-06-10 15:50:11 -04:00
Sean Chittenden e60580b279
Define a type for the PeriodicCallback handlers and ShutdownChannel 2016-06-10 15:50:11 -04:00
Sean Chittenden 17116fc5a7
Rebalance Nomad client RPCs among different Nomad servers.
Implement client/rpc_proxy.RpcProxy.
2016-06-10 15:50:11 -04:00
Sean Chittenden b509da2d0c
Create a nomad/structs/config to break an import cycle.
Flattening and normalizing the various Consul config structures and
services has led to an import cycle.  Break this by creating a new package
that is intended to be terminal in the import DAG.
2016-06-10 15:48:36 -04:00
Sean Chittenden 6d162e1e03
Fix copy pasta comment.
These parameters are used to bootstrap Nomad servers, not Consul servers.
2016-06-10 15:48:36 -04:00
Sean Chittenden 4e2835d5ff
Use the correctly typed rand.Int* variant 2016-06-10 15:48:36 -04:00
Sean Chittenden 49deaae2ae
Seed random once in main 2016-06-10 15:48:36 -04:00
Sean Chittenden db97a88f94
Fix small typo 2016-06-10 15:48:36 -04:00
Sean Chittenden 66b4b2a99f
Use rand.Int*n() where appropriate 2016-06-10 15:48:36 -04:00
Sean Chittenden e36686a17d
Use consul/lib's RandomStagger
Removes four redundant copies of the method in the process.
2016-06-10 15:48:36 -04:00
Sean Chittenden e0e7d94450
Use consul/lib's RateScaledInterval 2016-06-10 15:48:36 -04:00
Alex Dadgar 527afa5119 Merge pull request #1244 from hashicorp/b-eval-reblock-test-hardening
Don't dequeue requeued evals in tests
2016-06-09 11:35:42 -07:00
Alex Dadgar 5d181d203c Add check-index flag to nomad run 2016-06-08 17:56:32 -07:00
Alex Dadgar b7e3a45fef fix channel being nil on restore 2016-06-07 15:03:08 -07:00
Alex Dadgar ecdce9a641 don't dequeue 2016-06-07 09:51:20 -07:00
Alex Dadgar cc95d5d332 GC Nodes even if they have terminal allocations 2016-06-03 16:24:41 -07:00
Alex Dadgar 5f3e27ecd8 Fix case in periodic dispatch and blocked evals where lock was not released 2016-06-03 13:46:57 -07:00
Alex Dadgar 3100b4a086 Change eval_endpoint test to not retry but block longer 2016-06-03 12:02:49 -07:00
Alex Dadgar 299a0bb4b3 up timeout for dequeue in test 2016-06-03 11:36:50 -07:00
Alex Dadgar 0f84d8968b Merge pull request #1221 from hashicorp/b-nil-wait
fix wait result being nil and some panics in the cli
2016-05-31 16:50:38 -07:00
Alex Dadgar 629542f64e flaky test 2016-05-31 23:50:14 +00:00
Alex Dadgar 7196133f0a Merge pull request #1220 from hashicorp/f-plan-failure-reasons
plan shows failure reasons and ordered annotations
2016-05-31 15:32:22 -07:00
Alex Dadgar b1298bb658 plan shows failure reasons and ordered annotations 2016-05-31 21:51:23 +00:00
Alex Dadgar 13f0ff03c1 Merge pull request #1209 from hashicorp/b-blocked-eval-fixes
Fix race condition in which a reblocked evaluation could be dropped
2016-05-31 13:26:58 -07:00
Alex Dadgar 060318845f Comments addressed 2016-05-31 11:39:03 -07:00
Alex Dadgar 75bd7a50f7 changelog 2016-05-27 17:43:20 -07:00
Alex Dadgar cc00a66e38 validate that tasks don't contain slashes 2016-05-27 17:17:10 -07:00
Alex Dadgar 1f9f015c1b Fix race condition in which a reblocked evaluation could be dropped 2016-05-27 16:53:10 -07:00
Alex Dadgar 6a236872b4 address comment 2016-05-25 10:30:47 -07:00
Alex Dadgar a3336b7761 test fixes and delete 2016-05-24 20:20:06 -07:00
Alex Dadgar 3fd51ecece Periodically unblock failed evaluations 2016-05-24 20:10:56 -07:00
Alex Dadgar bfdd5846e1 Track unblock indexes and check evals on block to see if they missed an update while in the scheduler 2016-05-24 20:10:56 -07:00
Alex Dadgar 15936822a4 Worker annotates evals with their snapshot index 2016-05-24 20:10:56 -07:00
Alex Dadgar 18d9e89065 Reuse the same evaluation and reblock it until there is no more work to do 2016-05-24 20:10:56 -07:00
Alex Dadgar 3cbb89c61e Merge pull request #1188 from hashicorp/f-no-failed-allocs
Failed Allocation Metrics stored in Evaluation
2016-05-24 20:06:28 -07:00
Alex Dadgar fcc57fbc66 rename SpawnedBlockedEval and simplify map safety check 2016-05-24 18:12:59 -07:00
Alex Dadgar b5ad18a7ea Dont restart successfully finished batch allocations 2016-05-24 17:23:18 -07:00
Alex Dadgar 1feb57b047 Evals track blocked evals they create 2016-05-19 13:09:52 -07:00
Alex Dadgar 8f5f12ae81 Scheduler no longer produces failed allocations; failed alloc metrics stored in evaluation 2016-05-18 18:11:40 -07:00
Alex Dadgar 1c6d3e129a EnqueueAll inserts all evaluations before unblocking dequeue calls 2016-05-18 12:13:59 -07:00
Alex Dadgar 045f7807e0 eval_broker.Enqueue no longer returns an error 2016-05-18 11:35:15 -07:00
Alex Dadgar 0c653c3c8f Fix determining whether a job is edited 2016-05-17 15:48:35 -07:00
Alex Dadgar a5ab96d40e Merge pull request #1168 from hashicorp/f-plan-endpoint
Job.Plan endpoint
2016-05-16 13:15:40 -07:00
Alex Dadgar a231f6f998 Switch to using the harness 2016-05-16 12:49:18 -07:00
Alex Dadgar 5085c25f8b Rename Cas to JobModifyIndex 2016-05-16 11:48:44 -07:00
Sean Chittenden dc28ab0cb5
Speling police 2016-05-15 09:41:34 -07:00
Diptanu Choudhury 2e2e2e500e Using a helper method to create service identifiers 2016-05-14 00:43:25 -07:00