Commit Graph

2574 Commits

Author SHA1 Message Date
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Preetha Appan ca5758741b
Update serf to pick up graceful leave fix 2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo 620558c107 log error if unable to create TLS configuration 2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
Preetha Appan 1b8d8b2186
Fix logic inversion in force rescheduling 2018-05-08 20:00:06 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Preetha Appan c7edbd5f41
newlines in test 2018-05-07 14:55:01 -05:00
Preetha Appan 4e75456beb
Fix deadlock in deadline timer logic when progress deadline is passed and the deployment is updated. 2018-05-07 14:55:01 -05:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Preetha Appan 19b096d203
Set modify time for allocs in unit test, and define current time in one spot 2018-05-07 14:55:01 -05:00
Preetha Appan 4c377b112e
Fix panic in deployment watcher when deployment is not in the state store due to a gc 2018-05-07 14:55:01 -05:00
Preetha 02d63432b4
Fix typo 2018-05-07 14:55:01 -05:00
Alex Dadgar 738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment 2018-05-07 14:55:01 -05:00
Michael Schurter e90d051c43
consul: change hashed canary bytes 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter a3038cefb4
typo: transistion -> transition 2018-05-07 14:50:01 -05:00
Alex Dadgar bd38675365
Fix tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f4af30fbb5
Canary tags structs 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Preetha Appan b2b773e696
better comments and remove commented code 2018-05-07 14:50:01 -05:00
Preetha Appan 90a2311cef
Fix deadlock in deployment watcher when deployment starts with no allocations and eventually has failed allocations 2018-05-07 14:50:01 -05:00
Alex Dadgar 224b3092ae
change default to 10m and docs 2018-05-07 14:50:01 -05:00
Alex Dadgar c91ce5cc38
Fix not enqueuing eval 2018-05-07 14:50:01 -05:00
Alex Dadgar 8d50955054
Fix typos 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 8a81038cdb
Set Reschedule from deployment watcher 2018-05-07 14:50:01 -05:00
Alex Dadgar a510774451
Use UpdateAllocDesiredTransistion instead of UpsertEval but no transistions yet 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar e5caaf3358
Small test fix 2018-05-07 14:50:01 -05:00
Alex Dadgar 9bff9024b3
add latest eval back 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar c49b5f9949
Handle progressed deployments and tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 9e75ea0a11
Deployment watcher based on deployment having progress deadline 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Alex Dadgar 55b483709f
Fix tests 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Michael Schurter a4caf8208b tests: fix grpc fields in task diff 2018-05-04 11:08:45 -07:00
Michael Schurter f6a4713141 consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00
Michael Schurter 382caec1e1 consul: initial grpc implementation
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Preetha Appan 52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value 2018-05-03 17:04:36 -05:00
Preetha Appan 274bed1892
Add RescheduleTracker to allocs list stub struct 2018-05-01 14:53:47 -05:00
Alex Dadgar de4af37249 version bump and remove generated 2018-04-27 11:10:00 -07:00
Alex Dadgar 845a43864a generated files 2018-04-27 10:45:40 -07:00
Alex Dadgar d03c881802 small cleanup and logging 2018-04-27 10:36:28 -07:00
Alex Dadgar da3a552d8d Fix issue where node connection map wasn't being pruned 2018-04-27 10:16:03 -07:00
Alex Dadgar 35e06ddb31 Remove generated and version bump 2018-04-26 16:49:19 -07:00
Alex Dadgar 43192cefae generated files 2018-04-26 16:28:58 -07:00
Alex Dadgar 265a6d4f8b
Merge pull request #4224 from hashicorp/b-cron-parse
Handle potential panic in cron parsing
2018-04-26 16:22:37 -07:00
Alex Dadgar 05eccb063f Merge branch 'b-cron-parse' of github.com:hashicorp/nomad into b-cron-parse 2018-04-26 15:51:56 -07:00
Alex Dadgar ea24513d38 Allow nomad to restore bad periodic job 2018-04-26 15:51:47 -07:00
Chelsea Holland Komlo ce1c3e0c2d add unit tests for panic cron parsing bug
add comments for cron parsing wrapper
2018-04-26 18:47:08 -04:00
Alex Dadgar 15ad3f94af Fix command line 2018-04-26 15:46:22 -07:00
Alex Dadgar dc2907c2c9 Codecgen full package 2018-04-26 15:24:53 -07:00
Alex Dadgar d0f237086b UX touchups 2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo fca0169dbc handle potential panic in cron parsing 2018-04-26 16:57:45 -04:00
Alex Dadgar ff7e2b960f Add test 2018-04-26 13:28:24 -07:00
Alex Dadgar 4a23307baf Track all client connections 2018-04-26 13:22:09 -07:00
Alex Dadgar 5320205853 Sort signals in implicit constraint
Fixes https://github.com/hashicorp/nomad/issues/4212
2018-04-26 10:12:47 -07:00
Alex Dadgar 79844f1d01 Safety guard 2018-04-25 16:00:56 -07:00
Alex Dadgar d45f39f24e Fix detecting drain strategy on GC'd node 2018-04-25 16:00:56 -07:00
Nick Ethier 2e6c95f511
Merge pull request #4138 from hashicorp/i-hcl-json-endpoint
HCL to JSON api endpoint
2018-04-19 14:18:34 -04:00
Alex Dadgar eeb85299ff gofmt -s nomad/structs/structs_test.go 2018-04-17 13:39:32 -07:00
Chelsea Holland Komlo 788b23e17e add test for node copy 2018-04-17 12:58:07 -04:00
Nick Ethier 31da01856a
command/agent: add HCL mock for parse endpoint 2018-04-16 19:21:09 -04:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar adaf4fa7e0 Remove generated structs 2018-04-12 16:35:31 -07:00
Alex Dadgar 663c4d0433 Version bump and generated files 2018-04-12 16:21:50 -07:00
Preetha bdc17ebf10
Merge pull request #4139 from hashicorp/b-reschedule-invalid-system-jobs
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 20:01:19 -05:00
Preetha Appan 9f84e17bfd
dont print reschedule policy in error message 2018-04-11 17:07:14 -05:00
Preetha Appan fa90f036c6
Fix more tests 2018-04-11 15:51:24 -05:00
Preetha Appan 81f856e7c9
Fix one more failing test 2018-04-11 15:49:23 -05:00
Preetha 0b6fbb8e16
Merge pull request #4131 from hashicorp/b-rescheduling-fix-gc
Update garbage collection logic to make sure allocs with pending evals are not GCed
2018-04-11 15:44:36 -05:00
Preetha Appan 1da4d88f3d
Make test descriptions better 2018-04-11 15:12:23 -05:00
Preetha Appan a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza 2018-04-11 14:56:20 -05:00
Preetha Appan 688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals 2018-04-11 13:58:02 -05:00
Charlie Voiselle ba88f00ccb Changed "til" to "until"
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan 59cce1d620
Fix unit test for core scheduler GC 2018-04-10 17:12:06 -05:00
Preetha Appan 7040884002
Simplify and update allocation gc eligibility logic 2018-04-10 16:08:37 -05:00
Preetha c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust 2018-04-10 15:30:15 -05:00
Preetha Appan d17bfd8045
Make leader election test run on all three protocol versions 2018-04-10 14:20:02 -05:00
Preetha Appan b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test. 2018-04-10 14:00:07 -05:00
Preetha Appan 6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted. 2018-04-10 08:47:33 -05:00
Preetha Appan 216c053742
Remove debug print statements 2018-04-10 08:16:50 -05:00
Alex Dadgar d179a09b83 WIP: Not setting node id properlperly 2018-04-09 18:01:28 -07:00
Preetha Appan 868f4f19f4
Unit tests for rolling upgrade and killing a leader 2018-04-09 17:42:30 -05:00
Preetha Appan 24203ae2f7
Remove duplicate commit 2018-04-09 15:08:09 -05:00
Preetha Appan d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID 2018-04-09 14:05:31 -05:00
Michael Schurter d086f17708 rpc: wrap up old version check in a helper
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter e1cbcf0b3c rpc: give min rpc version variable a better name 2018-04-09 11:09:05 -07:00
Michael Schurter 88a9409f8e rpc: only attempt NodeRpc for nodes>=0.8
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha 6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan 5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Michael Schurter b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar 4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar af1b185ce4 Fix flaky deadline tests 2018-04-03 16:51:57 -07:00
Michael Schurter ba6628a1b6 drain: return on first error
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar 2b14371db5 Fix spelling 2018-04-03 15:58:03 -07:00
Alex Dadgar 9617a13a2b Correctly handle the upgrade path of a node being drained when applying Raft logs 2018-04-03 15:32:44 -07:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Michael Schurter edc4891283 drain: improve tests and fix spelling
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter 6840becf46 drain: refactor batch_future into its own file
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter 44a749a7cc drain: fix double-close panic on drain future 2018-04-02 16:39:18 -07:00
Alex Dadgar 86f9044676 remove generated files 2018-03-30 16:52:49 -07:00
Alex Dadgar af81349dbe Generated files 2018-03-30 16:14:40 -07:00
Alex Dadgar 23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar 7f28cfcdfe small cleanup 2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo a77dd08dd9 prevent double close due to error in creating listener 2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo 402a026c88 add further error handling for rpc connection handling 2018-03-30 17:03:36 -04:00
Alex Dadgar e8809f40dc Test transistion from both infinite and a future deadline to force 2018-03-30 11:24:39 -07:00
Alex Dadgar 32a673a7e1 Fix force deadline notification 2018-03-30 09:58:29 -07:00
Alex Dadgar 1aa415b0d8 Integration test 2018-03-30 09:33:23 -07:00
Alex Dadgar dc03fab29b Canonicalize migrate 2018-03-29 17:42:58 -07:00
Alex Dadgar e458ab9031
Merge branch 'master' into b-drain-batch 2018-03-29 17:10:34 -07:00
Michael Schurter 62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar 301704091b Handle upgrade where Node doesn't have eligiblity
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar 7d2aae2c11 test handleTaskGroup 2018-03-29 16:38:47 -07:00
Alex Dadgar 049a9213d2 Watch batch jobs 2018-03-29 16:07:51 -07:00
Preetha 9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar f12194328c Integration test for batch complete case 2018-03-29 13:51:04 -07:00
Preetha 81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan c8317532ff
Use time from task events if task state does not have FinishedAt set 2018-03-29 14:05:56 -05:00
Alex Dadgar b194f93f2f Disallow Update stanza on Batch 2018-03-29 11:28:56 -07:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo 607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Preetha Appan 8776f4b942
Fix failing test 2018-03-29 07:59:38 -05:00
Preetha Appan 2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic 2018-03-29 07:42:58 -05:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Chelsea Holland Komlo b33d909bf9 add test to assert invalid files return error 2018-03-28 18:31:35 -04:00
Chelsea Holland Komlo 58ada9bc42 return error when setting checksum; don't reload 2018-03-28 18:15:50 -04:00
Chelsea Holland Komlo 2d5af7ff4d set TLS checksum when parsing config
Refactor checksum comparison, always set checksum if it is empty
2018-03-28 09:56:11 -04:00
Michael Schurter 65ddae86f8
Merge pull request #4054 from hashicorp/b-drainer-index-fix
drainer: reset index when new job registered
2018-03-27 16:28:25 -07:00
Michael Schurter 79a2781585
Merge pull request #4053 from hashicorp/b-drain-sys-jobs-2
drain: fix draining of system jobs
2018-03-27 16:26:45 -07:00
Alex Dadgar de4b3772f1 Create evals for system jobs when drain is unset
This PR creates evals for system jobs when:

* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo dd5f627feb set server configuration checksum on reload 2018-03-27 18:03:52 -04:00
Michael Schurter ec60a1d3e3 drain: improve comments 2018-03-27 14:27:09 -07:00
Michael Schurter e5dfb7e487 drain: unittest draining node logic 2018-03-27 14:24:01 -07:00
Michael Schurter a1ed305a24 test: add mock batch and system allocs
Since the BatchJob helper had a different task group than the Alloc
helper, it was difficult to create a valid batch alloc.
2018-03-27 14:24:01 -07:00
Michael Schurter 77bddc7941 drain: stop sys jobs after drain completes
System allocs should be drained when a node's deadline is hit or when
all other allocs on the node have stopped/migrated.
2018-03-27 14:24:01 -07:00
Michael Schurter fae77b874b drainer: reset index when new job registered 2018-03-27 14:12:59 -07:00
Chelsea Holland Komlo b522a0fadc fix up to string to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 31557cc44f move tests to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 6e6d6b7e33 check file contents when determining if agent should reload TLS configuration 2018-03-27 15:42:20 -04:00
Alex Dadgar 59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar 5dacb057b7 Only track nodes if the conn is from the node
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Chelsea Komlo 57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan 539114124e
Fix too long token test case 2018-03-26 16:28:33 -05:00
Preetha Appan 33e170c15d
s/linear/constant/g 2018-03-26 14:45:09 -05:00
Preetha Appan 7db930b3c3
Extra test case and better error message for ambiguous config 2018-03-26 13:30:09 -05:00
Chelsea Holland Komlo c2a95f9d7d add test for upgrading only RPC connections 2018-03-26 10:55:27 -04:00
Preetha Appan fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set) 2018-03-24 10:29:20 -05:00
Alex Dadgar 39987d5236
Merge branch 'master' into b-acl-name 2018-03-22 14:51:40 -07:00
Michael Schurter a7f627e34c eligbile -> eligible 2018-03-21 16:55:22 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 9f3086a268 test: must initialize jobResults with new func 2018-03-21 16:51:45 -07:00
Michael Schurter e432c9af55 test: disable node drainer during tests
Node drainer would throw off the index checks
2018-03-21 16:51:45 -07:00
Michael Schurter 5c8c4bce2a test: disable drain during fsm test
drainer was unsetting drain before fsm could read written value
2018-03-21 16:51:45 -07:00
Michael Schurter 341d87aa48 tests: use mock.BatchJob to fix tests 2018-03-21 16:51:45 -07:00
Michael Schurter 8b107acc06 mock: add BatchJob() helper 2018-03-21 16:51:45 -07:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 640ebdaef6 fix race in drain integration tests 2018-03-21 16:51:45 -07:00
Michael Schurter c401d5a098 Refactor assertOps into a helper func 2018-03-21 16:51:45 -07:00
Michael Schurter 187b0e1a48 Remove debug prints 2018-03-21 16:51:45 -07:00
Michael Schurter f67eca48ac Deregister garbage collected jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 922842546c JobNs -> NamespacedID
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter 8dc7d9fb6a drainer: RegisterJob -> RegisterJobs
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter 3116897099 Fix deadline heap triggering
Chan must be buffered to avoid skipping triggering altogether

Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Alex Dadgar 9d23c965da fix comment 2018-03-21 16:51:45 -07:00
Alex Dadgar fb4badf1bc sharding 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 92b636dd32 Fix deadline handling 2018-03-21 16:51:44 -07:00
Michael Schurter 9898edfa90 Switch to drainerv2 impl 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar ad80e655cc code review 2018-03-21 16:51:44 -07:00
Alex Dadgar 11f9fe4960 spelling fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar bc7385812d Comments 2018-03-21 16:51:44 -07:00
Alex Dadgar e87c677a42 handle empty node case 2018-03-21 16:51:44 -07:00
Alex Dadgar 405dab2253 integration test and basic fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar e63bcb474d Drainer 2018-03-21 16:51:44 -07:00
Alex Dadgar 4754366640 job watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 504bfabb4d Node's being untracked or having updated deadlines, updates the deadliner 2018-03-21 16:51:44 -07:00
Alex Dadgar 66eaaa6a4d node watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 527ac0b39d drain heap 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d4c193a0a Initial design 2018-03-21 16:51:44 -07:00
Alex Dadgar 33ca319080 System test runs on mac 2018-03-21 16:51:44 -07:00
Alex Dadgar f8d4a3a9e6 Fix file names 2018-03-21 16:51:44 -07:00
Michael Schurter 32a7649359 refactor main drainloop into 2 more methods 2018-03-21 16:51:44 -07:00
Michael Schurter 5e52f84bb7 drainer: refactor newStopAllocs, applyMigrations 2018-03-21 16:51:44 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Michael Schurter b7c993f0e5 drainer: convert fsm errors to go errors 2018-03-21 16:51:44 -07:00
Michael Schurter ab0de41884 drainer: factor job & node watchers out of drainer.go 2018-03-21 16:51:44 -07:00
Michael Schurter 5922aef623 Restart every time SetEnabled(true) is called 2018-03-21 16:51:44 -07:00
Michael Schurter 959d447d38 Remove unused context 2018-03-21 16:51:44 -07:00
Michael Schurter 8b41e9b2e1 drainer: drainer should shutdown with server 2018-03-21 16:51:44 -07:00
Michael Schurter 0a17076ad2 refactor drainer into a subpkg 2018-03-21 16:51:44 -07:00
Alex Dadgar 93871c18f8 Fix retaining the drain 2018-03-21 16:51:44 -07:00
Alex Dadgar 010a6b8ca5 Unblock evals once eligible 2018-03-21 16:51:44 -07:00
Alex Dadgar 8289cc3c6f HTTP and API 2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar b3d2346419 Upgrade path 2018-03-21 16:51:43 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 0965c9ed28 Fix tests 2018-03-21 16:51:43 -07:00
Alex Dadgar 010228577e Drain cli, api, http 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter 03d0e5b8a0 improve drain fsm/statestore tests 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Michael Schurter acf59ee75e drainer: switch to job based watching 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Komlo 6fc9231dac
Merge pull request #3856 from hashicorp/f-client-add-health-checks
Client driver health checks for Docker
2018-03-21 18:05:00 -04:00
Chelsea Holland Komlo 66e44cdb73 Allow TLS configurations for HTTP and RPC connections to be reloaded separately 2018-03-21 17:51:08 -04:00
Preetha 01898b2c25
Merge pull request #4007 from hashicorp/f-show-rescheduling-cli-job-status
Show a section on upcoming delayed evaluations when applicable
2018-03-21 14:37:38 -05:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c7fd0bd8a1 fix up scheduler mocks 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo a522da6994 fix up gofmt 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha 17f2f52f08
Merge pull request #3979 from hashicorp/b_update_compat_delete
Delete compatibility code for job level update stanza
2018-03-21 09:17:01 -05:00
Michael Schurter 70c370c6fe
Merge pull request #4003 from jrasell/f_gh_3988
Allow Nomads Consul health check names to be configurable.
2018-03-20 16:44:08 -07:00
James Rasell 121c3bc997 Update Consul check params from using health-check to check. 2018-03-20 16:03:58 +01:00
Preetha Appan 31a3c81c3b
Show a section on upcoming delayed evaluations when applicable 2018-03-19 21:42:37 -05:00
Preetha Appan 33a5a72323
Make suggested interval round to seconds, and more end to end test cases 2018-03-19 14:56:52 -05:00
James Rasell 15afef9b77 Allow Nomads Consul health checks to be configurable.
This change allows the client HTTP and the server HTTP, Serf and
RPC health check names within Consul to be configurable with the
defaults as previous. The configuration can be done via either a
config file or using CLI flags.

Closes #3988
2018-03-19 19:37:56 +01:00
Alex Dadgar 9e05c9a50e
Merge pull request #3997 from hashicorp/b-serf-addr
RPC Advertise used exclusively for Clients
2018-03-19 09:30:20 -07:00
Alex Dadgar 2baa1c38f2 clarify comment 2018-03-16 16:47:08 -07:00
Alex Dadgar b8607ad6d6 Heartbeat uses client rpc advertise and server defaults server rpc advertise addr 2018-03-16 16:47:08 -07:00
Alex Dadgar 52b7fb5361 Separate client and server rpc advertise addresses 2018-03-16 16:47:08 -07:00
Michael Schurter c3e8f6319c gofmt -s (simplify) files 2018-03-16 16:31:16 -07:00
Alex Dadgar b3ab063132
Merge pull request #3992 from hashicorp/f-vault-orphan
Allow and recommend Orphaned Vault tokens
2018-03-16 10:59:54 -07:00
Alex Dadgar 6a44e6092f Pull snapshotting out of loop 2018-03-16 10:54:26 -07:00
Alex Dadgar 7545c0053e job gc uses batch endpoint 2018-03-16 10:53:03 -07:00
Alex Dadgar 586ae36d13 Batch Deregister RPC 2018-03-16 10:53:03 -07:00
Alex Dadgar c152774997 Allow and recommend Orphaned Vault tokens
This PR removes enforcement that the Vault token role disallows orphaned
tokens and recommends orphaned tokens to simplify the
bootstrapping/upgrading of Nomad clusters. The requirement that Nomad's
Vault token never expire and be shared by all instances of Nomad servers
is not operationally friendly.
2018-03-15 15:32:08 -07:00
Alex Dadgar fc782d5942 List unblocks on summary changes 2018-03-15 10:22:03 -07:00