Commit graph

2489 commits

Author SHA1 Message Date
Chelsea Holland Komlo 38f611a7f2 refactor NewTLSConfiguration to pass in verifyIncoming/verifyOutgoing
add missing fields to TLS merge method
2018-05-23 18:35:30 -04:00
Alex Dadgar c268640c02 Fix noisy log 2018-05-22 14:45:34 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 1fe9cb4f00 update error message 2018-05-22 14:04:59 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar 86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Alex Dadgar b6ecb75af9 update error message 2018-05-22 14:01:43 -07:00
Preetha b409a3ed5b
Merge pull request #4313 from hashicorp/b-alloc-gc-desiredstate
Check allocation's desired state in GC eligibility logic
2018-05-21 16:49:49 -07:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Preetha Appan a9d63c0df3
Check allocation's desired state in GC eligibility logic in core scheduler 2018-05-21 13:28:31 -05:00
Chelsea Komlo 687c26093c
Merge pull request #4269 from hashicorp/f-tls-remove-weak-standards
Configurable TLS cipher suites and versions; disallow weak ciphers
2018-05-11 08:11:46 -04:00
Alex Dadgar 9a2237bdab Drain complete 2018-05-10 17:22:06 -07:00
Alex Dadgar 0cb31feb1f Add node event when draining is set/removed/updated 2018-05-10 16:54:43 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan bfa0937bbb
Code review feedback 2018-05-10 14:42:24 -05:00
Preetha Appan ca5758741b
Update serf to pick up graceful leave fix 2018-05-10 11:16:24 -05:00
Chelsea Holland Komlo 620558c107 log error if unable to create TLS configuration 2018-05-10 11:51:54 -04:00
Chelsea Holland Komlo 44f536f18e add support for configurable TLS minimum version 2018-05-09 18:07:12 -04:00
Chelsea Holland Komlo 796bae6f1b allow configurable cipher suites
disallow 3DES and RC4 ciphers

add documentation for tls_cipher_suites
2018-05-09 17:15:31 -04:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
Preetha Appan 1b8d8b2186
Fix logic inversion in force rescheduling 2018-05-08 20:00:06 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Preetha Appan c7edbd5f41
newlines in test 2018-05-07 14:55:01 -05:00
Preetha Appan 4e75456beb
Fix deadlock in deadline timer logic when progress deadline is passed and the deployment is updated. 2018-05-07 14:55:01 -05:00
Preetha Appan cba13e4ec5
Fix test set up to set ModifyTime for alloc 2018-05-07 14:55:01 -05:00
Preetha Appan 19b096d203
Set modify time for allocs in unit test, and define current time in one spot 2018-05-07 14:55:01 -05:00
Preetha Appan 4c377b112e
Fix panic in deployment watcher when deployment is not in the state store due to a gc 2018-05-07 14:55:01 -05:00
Preetha 02d63432b4
Fix typo 2018-05-07 14:55:01 -05:00
Alex Dadgar 738056634e
Fix the initial progress deadline calculation when the alloc is inplace updated to be part of a new deployment 2018-05-07 14:55:01 -05:00
Michael Schurter e90d051c43
consul: change hashed canary bytes 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Michael Schurter a3038cefb4
typo: transistion -> transition 2018-05-07 14:50:01 -05:00
Alex Dadgar bd38675365
Fix tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 319763a5d8
remove unnessary merge of DeploymentStatus.Timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar f4af30fbb5
Canary tags structs 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Preetha Appan b2b773e696
better comments and remove commented code 2018-05-07 14:50:01 -05:00
Preetha Appan 90a2311cef
Fix deadlock in deployment watcher when deployment starts with no allocations and eventually has failed allocations 2018-05-07 14:50:01 -05:00
Alex Dadgar 224b3092ae
change default to 10m and docs 2018-05-07 14:50:01 -05:00
Alex Dadgar c91ce5cc38
Fix not enqueuing eval 2018-05-07 14:50:01 -05:00
Alex Dadgar 8d50955054
Fix typos 2018-05-07 14:50:01 -05:00
Alex Dadgar 641ef81cbf
Test fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 8a81038cdb
Set Reschedule from deployment watcher 2018-05-07 14:50:01 -05:00
Alex Dadgar a510774451
Use UpdateAllocDesiredTransistion instead of UpsertEval but no transistions yet 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar e5caaf3358
Small test fix 2018-05-07 14:50:01 -05:00
Alex Dadgar 9bff9024b3
add latest eval back 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar c49b5f9949
Handle progressed deployments and tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 9e75ea0a11
Deployment watcher based on deployment having progress deadline 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Alex Dadgar 55b483709f
Fix tests 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Michael Schurter a4caf8208b tests: fix grpc fields in task diff 2018-05-04 11:08:45 -07:00
Michael Schurter f6a4713141 consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00
Michael Schurter 382caec1e1 consul: initial grpc implementation
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Preetha Appan 52b3b53181
Update ModifyIndex of alloc when setting NextAllocation value 2018-05-03 17:04:36 -05:00
Preetha Appan 274bed1892
Add RescheduleTracker to allocs list stub struct 2018-05-01 14:53:47 -05:00
Alex Dadgar de4af37249 version bump and remove generated 2018-04-27 11:10:00 -07:00
Alex Dadgar 845a43864a generated files 2018-04-27 10:45:40 -07:00
Alex Dadgar d03c881802 small cleanup and logging 2018-04-27 10:36:28 -07:00
Alex Dadgar da3a552d8d Fix issue where node connection map wasn't being pruned 2018-04-27 10:16:03 -07:00
Alex Dadgar 35e06ddb31 Remove generated and version bump 2018-04-26 16:49:19 -07:00
Alex Dadgar 43192cefae generated files 2018-04-26 16:28:58 -07:00
Alex Dadgar 265a6d4f8b
Merge pull request #4224 from hashicorp/b-cron-parse
Handle potential panic in cron parsing
2018-04-26 16:22:37 -07:00
Alex Dadgar 05eccb063f Merge branch 'b-cron-parse' of github.com:hashicorp/nomad into b-cron-parse 2018-04-26 15:51:56 -07:00
Alex Dadgar ea24513d38 Allow nomad to restore bad periodic job 2018-04-26 15:51:47 -07:00
Chelsea Holland Komlo ce1c3e0c2d add unit tests for panic cron parsing bug
add comments for cron parsing wrapper
2018-04-26 18:47:08 -04:00
Alex Dadgar 15ad3f94af Fix command line 2018-04-26 15:46:22 -07:00
Alex Dadgar dc2907c2c9 Codecgen full package 2018-04-26 15:24:53 -07:00
Alex Dadgar d0f237086b UX touchups 2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo fca0169dbc handle potential panic in cron parsing 2018-04-26 16:57:45 -04:00
Alex Dadgar ff7e2b960f Add test 2018-04-26 13:28:24 -07:00
Alex Dadgar 4a23307baf Track all client connections 2018-04-26 13:22:09 -07:00
Alex Dadgar 5320205853 Sort signals in implicit constraint
Fixes https://github.com/hashicorp/nomad/issues/4212
2018-04-26 10:12:47 -07:00
Alex Dadgar 79844f1d01 Safety guard 2018-04-25 16:00:56 -07:00
Alex Dadgar d45f39f24e Fix detecting drain strategy on GC'd node 2018-04-25 16:00:56 -07:00
Nick Ethier 2e6c95f511
Merge pull request #4138 from hashicorp/i-hcl-json-endpoint
HCL to JSON api endpoint
2018-04-19 14:18:34 -04:00
Alex Dadgar eeb85299ff gofmt -s nomad/structs/structs_test.go 2018-04-17 13:39:32 -07:00
Chelsea Holland Komlo 788b23e17e add test for node copy 2018-04-17 12:58:07 -04:00
Nick Ethier 31da01856a
command/agent: add HCL mock for parse endpoint 2018-04-16 19:21:09 -04:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Alex Dadgar adaf4fa7e0 Remove generated structs 2018-04-12 16:35:31 -07:00
Alex Dadgar 663c4d0433 Version bump and generated files 2018-04-12 16:21:50 -07:00
Preetha bdc17ebf10
Merge pull request #4139 from hashicorp/b-reschedule-invalid-system-jobs
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 20:01:19 -05:00
Preetha Appan 9f84e17bfd
dont print reschedule policy in error message 2018-04-11 17:07:14 -05:00
Preetha Appan fa90f036c6
Fix more tests 2018-04-11 15:51:24 -05:00
Preetha Appan 81f856e7c9
Fix one more failing test 2018-04-11 15:49:23 -05:00
Preetha 0b6fbb8e16
Merge pull request #4131 from hashicorp/b-rescheduling-fix-gc
Update garbage collection logic to make sure allocs with pending evals are not GCed
2018-04-11 15:44:36 -05:00
Preetha Appan 1da4d88f3d
Make test descriptions better 2018-04-11 15:12:23 -05:00
Preetha Appan a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza 2018-04-11 14:56:20 -05:00
Preetha Appan 688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals 2018-04-11 13:58:02 -05:00
Charlie Voiselle ba88f00ccb Changed "til" to "until"
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan 59cce1d620
Fix unit test for core scheduler GC 2018-04-10 17:12:06 -05:00
Preetha Appan 7040884002
Simplify and update allocation gc eligibility logic 2018-04-10 16:08:37 -05:00
Preetha c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust 2018-04-10 15:30:15 -05:00
Preetha Appan d17bfd8045
Make leader election test run on all three protocol versions 2018-04-10 14:20:02 -05:00
Preetha Appan b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test. 2018-04-10 14:00:07 -05:00
Preetha Appan 6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted. 2018-04-10 08:47:33 -05:00
Preetha Appan 216c053742
Remove debug print statements 2018-04-10 08:16:50 -05:00
Alex Dadgar d179a09b83 WIP: Not setting node id properlperly 2018-04-09 18:01:28 -07:00
Preetha Appan 868f4f19f4
Unit tests for rolling upgrade and killing a leader 2018-04-09 17:42:30 -05:00
Preetha Appan 24203ae2f7
Remove duplicate commit 2018-04-09 15:08:09 -05:00
Preetha Appan d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID 2018-04-09 14:05:31 -05:00
Michael Schurter d086f17708 rpc: wrap up old version check in a helper
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter e1cbcf0b3c rpc: give min rpc version variable a better name 2018-04-09 11:09:05 -07:00
Michael Schurter 88a9409f8e rpc: only attempt NodeRpc for nodes>=0.8
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha 6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan 5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Michael Schurter b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar 4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar af1b185ce4 Fix flaky deadline tests 2018-04-03 16:51:57 -07:00
Michael Schurter ba6628a1b6 drain: return on first error
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar 2b14371db5 Fix spelling 2018-04-03 15:58:03 -07:00
Alex Dadgar 9617a13a2b Correctly handle the upgrade path of a node being drained when applying Raft logs 2018-04-03 15:32:44 -07:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Michael Schurter edc4891283 drain: improve tests and fix spelling
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter 6840becf46 drain: refactor batch_future into its own file
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter 44a749a7cc drain: fix double-close panic on drain future 2018-04-02 16:39:18 -07:00
Alex Dadgar 86f9044676 remove generated files 2018-03-30 16:52:49 -07:00
Alex Dadgar af81349dbe Generated files 2018-03-30 16:14:40 -07:00
Alex Dadgar 23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar 7f28cfcdfe small cleanup 2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo a77dd08dd9 prevent double close due to error in creating listener 2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo 402a026c88 add further error handling for rpc connection handling 2018-03-30 17:03:36 -04:00
Alex Dadgar e8809f40dc Test transistion from both infinite and a future deadline to force 2018-03-30 11:24:39 -07:00
Alex Dadgar 32a673a7e1 Fix force deadline notification 2018-03-30 09:58:29 -07:00
Alex Dadgar 1aa415b0d8 Integration test 2018-03-30 09:33:23 -07:00
Alex Dadgar dc03fab29b Canonicalize migrate 2018-03-29 17:42:58 -07:00
Alex Dadgar e458ab9031
Merge branch 'master' into b-drain-batch 2018-03-29 17:10:34 -07:00
Michael Schurter 62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar 301704091b Handle upgrade where Node doesn't have eligiblity
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar 7d2aae2c11 test handleTaskGroup 2018-03-29 16:38:47 -07:00
Alex Dadgar 049a9213d2 Watch batch jobs 2018-03-29 16:07:51 -07:00
Preetha 9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar f12194328c Integration test for batch complete case 2018-03-29 13:51:04 -07:00
Preetha 81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan c8317532ff
Use time from task events if task state does not have FinishedAt set 2018-03-29 14:05:56 -05:00
Alex Dadgar b194f93f2f Disallow Update stanza on Batch 2018-03-29 11:28:56 -07:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo 607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00