Commit graph

2395 commits

Author SHA1 Message Date
Preetha Appan 1da4d88f3d
Make test descriptions better 2018-04-11 15:12:23 -05:00
Preetha Appan a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza 2018-04-11 14:56:20 -05:00
Preetha Appan 688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals 2018-04-11 13:58:02 -05:00
Charlie Voiselle ba88f00ccb Changed "til" to "until"
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan 59cce1d620
Fix unit test for core scheduler GC 2018-04-10 17:12:06 -05:00
Preetha Appan 7040884002
Simplify and update allocation gc eligibility logic 2018-04-10 16:08:37 -05:00
Preetha c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust 2018-04-10 15:30:15 -05:00
Preetha Appan d17bfd8045
Make leader election test run on all three protocol versions 2018-04-10 14:20:02 -05:00
Preetha Appan b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test. 2018-04-10 14:00:07 -05:00
Preetha Appan 6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted. 2018-04-10 08:47:33 -05:00
Preetha Appan 216c053742
Remove debug print statements 2018-04-10 08:16:50 -05:00
Alex Dadgar d179a09b83 WIP: Not setting node id properlperly 2018-04-09 18:01:28 -07:00
Preetha Appan 868f4f19f4
Unit tests for rolling upgrade and killing a leader 2018-04-09 17:42:30 -05:00
Preetha Appan 24203ae2f7
Remove duplicate commit 2018-04-09 15:08:09 -05:00
Preetha Appan d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID 2018-04-09 14:05:31 -05:00
Michael Schurter d086f17708 rpc: wrap up old version check in a helper
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter e1cbcf0b3c rpc: give min rpc version variable a better name 2018-04-09 11:09:05 -07:00
Michael Schurter 88a9409f8e rpc: only attempt NodeRpc for nodes>=0.8
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha 6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan 5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Michael Schurter b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar 4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar af1b185ce4 Fix flaky deadline tests 2018-04-03 16:51:57 -07:00
Michael Schurter ba6628a1b6 drain: return on first error
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar 2b14371db5 Fix spelling 2018-04-03 15:58:03 -07:00
Alex Dadgar 9617a13a2b Correctly handle the upgrade path of a node being drained when applying Raft logs 2018-04-03 15:32:44 -07:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Michael Schurter edc4891283 drain: improve tests and fix spelling
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter 6840becf46 drain: refactor batch_future into its own file
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter 44a749a7cc drain: fix double-close panic on drain future 2018-04-02 16:39:18 -07:00
Alex Dadgar 86f9044676 remove generated files 2018-03-30 16:52:49 -07:00
Alex Dadgar af81349dbe Generated files 2018-03-30 16:14:40 -07:00
Alex Dadgar 23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar 7f28cfcdfe small cleanup 2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo a77dd08dd9 prevent double close due to error in creating listener 2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo 402a026c88 add further error handling for rpc connection handling 2018-03-30 17:03:36 -04:00
Alex Dadgar e8809f40dc Test transistion from both infinite and a future deadline to force 2018-03-30 11:24:39 -07:00
Alex Dadgar 32a673a7e1 Fix force deadline notification 2018-03-30 09:58:29 -07:00
Alex Dadgar 1aa415b0d8 Integration test 2018-03-30 09:33:23 -07:00
Alex Dadgar dc03fab29b Canonicalize migrate 2018-03-29 17:42:58 -07:00
Alex Dadgar e458ab9031
Merge branch 'master' into b-drain-batch 2018-03-29 17:10:34 -07:00
Michael Schurter 62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar 301704091b Handle upgrade where Node doesn't have eligiblity
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar 7d2aae2c11 test handleTaskGroup 2018-03-29 16:38:47 -07:00
Alex Dadgar 049a9213d2 Watch batch jobs 2018-03-29 16:07:51 -07:00
Preetha 9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar f12194328c Integration test for batch complete case 2018-03-29 13:51:04 -07:00
Preetha 81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan c8317532ff
Use time from task events if task state does not have FinishedAt set 2018-03-29 14:05:56 -05:00
Alex Dadgar b194f93f2f Disallow Update stanza on Batch 2018-03-29 11:28:56 -07:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo 607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Preetha Appan 8776f4b942
Fix failing test 2018-03-29 07:59:38 -05:00
Preetha Appan 2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic 2018-03-29 07:42:58 -05:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Chelsea Holland Komlo b33d909bf9 add test to assert invalid files return error 2018-03-28 18:31:35 -04:00
Chelsea Holland Komlo 58ada9bc42 return error when setting checksum; don't reload 2018-03-28 18:15:50 -04:00
Chelsea Holland Komlo 2d5af7ff4d set TLS checksum when parsing config
Refactor checksum comparison, always set checksum if it is empty
2018-03-28 09:56:11 -04:00
Michael Schurter 65ddae86f8
Merge pull request #4054 from hashicorp/b-drainer-index-fix
drainer: reset index when new job registered
2018-03-27 16:28:25 -07:00
Michael Schurter 79a2781585
Merge pull request #4053 from hashicorp/b-drain-sys-jobs-2
drain: fix draining of system jobs
2018-03-27 16:26:45 -07:00
Alex Dadgar de4b3772f1 Create evals for system jobs when drain is unset
This PR creates evals for system jobs when:

* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo dd5f627feb set server configuration checksum on reload 2018-03-27 18:03:52 -04:00
Michael Schurter ec60a1d3e3 drain: improve comments 2018-03-27 14:27:09 -07:00
Michael Schurter e5dfb7e487 drain: unittest draining node logic 2018-03-27 14:24:01 -07:00
Michael Schurter a1ed305a24 test: add mock batch and system allocs
Since the BatchJob helper had a different task group than the Alloc
helper, it was difficult to create a valid batch alloc.
2018-03-27 14:24:01 -07:00
Michael Schurter 77bddc7941 drain: stop sys jobs after drain completes
System allocs should be drained when a node's deadline is hit or when
all other allocs on the node have stopped/migrated.
2018-03-27 14:24:01 -07:00
Michael Schurter fae77b874b drainer: reset index when new job registered 2018-03-27 14:12:59 -07:00
Chelsea Holland Komlo b522a0fadc fix up to string to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 31557cc44f move tests to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 6e6d6b7e33 check file contents when determining if agent should reload TLS configuration 2018-03-27 15:42:20 -04:00
Alex Dadgar 59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar 5dacb057b7 Only track nodes if the conn is from the node
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Chelsea Komlo 57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan 539114124e
Fix too long token test case 2018-03-26 16:28:33 -05:00
Preetha Appan 33e170c15d
s/linear/constant/g 2018-03-26 14:45:09 -05:00
Preetha Appan 7db930b3c3
Extra test case and better error message for ambiguous config 2018-03-26 13:30:09 -05:00
Chelsea Holland Komlo c2a95f9d7d add test for upgrading only RPC connections 2018-03-26 10:55:27 -04:00
Preetha Appan fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set) 2018-03-24 10:29:20 -05:00
Alex Dadgar 39987d5236
Merge branch 'master' into b-acl-name 2018-03-22 14:51:40 -07:00
Michael Schurter a7f627e34c eligbile -> eligible 2018-03-21 16:55:22 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 9f3086a268 test: must initialize jobResults with new func 2018-03-21 16:51:45 -07:00
Michael Schurter e432c9af55 test: disable node drainer during tests
Node drainer would throw off the index checks
2018-03-21 16:51:45 -07:00
Michael Schurter 5c8c4bce2a test: disable drain during fsm test
drainer was unsetting drain before fsm could read written value
2018-03-21 16:51:45 -07:00
Michael Schurter 341d87aa48 tests: use mock.BatchJob to fix tests 2018-03-21 16:51:45 -07:00
Michael Schurter 8b107acc06 mock: add BatchJob() helper 2018-03-21 16:51:45 -07:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 640ebdaef6 fix race in drain integration tests 2018-03-21 16:51:45 -07:00
Michael Schurter c401d5a098 Refactor assertOps into a helper func 2018-03-21 16:51:45 -07:00
Michael Schurter 187b0e1a48 Remove debug prints 2018-03-21 16:51:45 -07:00
Michael Schurter f67eca48ac Deregister garbage collected jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 922842546c JobNs -> NamespacedID
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter 8dc7d9fb6a drainer: RegisterJob -> RegisterJobs
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter 3116897099 Fix deadline heap triggering
Chan must be buffered to avoid skipping triggering altogether

Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Alex Dadgar 9d23c965da fix comment 2018-03-21 16:51:45 -07:00
Alex Dadgar fb4badf1bc sharding 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 92b636dd32 Fix deadline handling 2018-03-21 16:51:44 -07:00
Michael Schurter 9898edfa90 Switch to drainerv2 impl 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar ad80e655cc code review 2018-03-21 16:51:44 -07:00
Alex Dadgar 11f9fe4960 spelling fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar bc7385812d Comments 2018-03-21 16:51:44 -07:00
Alex Dadgar e87c677a42 handle empty node case 2018-03-21 16:51:44 -07:00
Alex Dadgar 405dab2253 integration test and basic fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar e63bcb474d Drainer 2018-03-21 16:51:44 -07:00
Alex Dadgar 4754366640 job watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 504bfabb4d Node's being untracked or having updated deadlines, updates the deadliner 2018-03-21 16:51:44 -07:00
Alex Dadgar 66eaaa6a4d node watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 527ac0b39d drain heap 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d4c193a0a Initial design 2018-03-21 16:51:44 -07:00
Alex Dadgar 33ca319080 System test runs on mac 2018-03-21 16:51:44 -07:00
Alex Dadgar f8d4a3a9e6 Fix file names 2018-03-21 16:51:44 -07:00
Michael Schurter 32a7649359 refactor main drainloop into 2 more methods 2018-03-21 16:51:44 -07:00
Michael Schurter 5e52f84bb7 drainer: refactor newStopAllocs, applyMigrations 2018-03-21 16:51:44 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Michael Schurter b7c993f0e5 drainer: convert fsm errors to go errors 2018-03-21 16:51:44 -07:00
Michael Schurter ab0de41884 drainer: factor job & node watchers out of drainer.go 2018-03-21 16:51:44 -07:00
Michael Schurter 5922aef623 Restart every time SetEnabled(true) is called 2018-03-21 16:51:44 -07:00
Michael Schurter 959d447d38 Remove unused context 2018-03-21 16:51:44 -07:00
Michael Schurter 8b41e9b2e1 drainer: drainer should shutdown with server 2018-03-21 16:51:44 -07:00
Michael Schurter 0a17076ad2 refactor drainer into a subpkg 2018-03-21 16:51:44 -07:00
Alex Dadgar 93871c18f8 Fix retaining the drain 2018-03-21 16:51:44 -07:00
Alex Dadgar 010a6b8ca5 Unblock evals once eligible 2018-03-21 16:51:44 -07:00
Alex Dadgar 8289cc3c6f HTTP and API 2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar b3d2346419 Upgrade path 2018-03-21 16:51:43 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 0965c9ed28 Fix tests 2018-03-21 16:51:43 -07:00
Alex Dadgar 010228577e Drain cli, api, http 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter 03d0e5b8a0 improve drain fsm/statestore tests 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Michael Schurter acf59ee75e drainer: switch to job based watching 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Komlo 6fc9231dac
Merge pull request #3856 from hashicorp/f-client-add-health-checks
Client driver health checks for Docker
2018-03-21 18:05:00 -04:00
Chelsea Holland Komlo 66e44cdb73 Allow TLS configurations for HTTP and RPC connections to be reloaded separately 2018-03-21 17:51:08 -04:00
Preetha 01898b2c25
Merge pull request #4007 from hashicorp/f-show-rescheduling-cli-job-status
Show a section on upcoming delayed evaluations when applicable
2018-03-21 14:37:38 -05:00
Chelsea Holland Komlo f801709a0a fix issue when updating node events 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo 86b7b3d2d9 fix up health check logic comparison; add node events to client driver checks 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c7fd0bd8a1 fix up scheduler mocks 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo c50d02ae93 go style; update comments 2018-03-21 15:15:25 -04:00