Preetha Appan
1da4d88f3d
Make test descriptions better
2018-04-11 15:12:23 -05:00
Preetha Appan
a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 14:56:20 -05:00
Preetha Appan
688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals
2018-04-11 13:58:02 -05:00
Charlie Voiselle
ba88f00ccb
Changed "til" to "until"
...
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha
dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
...
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan
59cce1d620
Fix unit test for core scheduler GC
2018-04-10 17:12:06 -05:00
Preetha Appan
7040884002
Simplify and update allocation gc eligibility logic
2018-04-10 16:08:37 -05:00
Preetha
c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
...
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan
a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust
2018-04-10 15:30:15 -05:00
Preetha Appan
d17bfd8045
Make leader election test run on all three protocol versions
2018-04-10 14:20:02 -05:00
Preetha Appan
b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test.
2018-04-10 14:00:07 -05:00
Preetha Appan
6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted.
2018-04-10 08:47:33 -05:00
Preetha Appan
216c053742
Remove debug print statements
2018-04-10 08:16:50 -05:00
Alex Dadgar
d179a09b83
WIP: Not setting node id properlperly
2018-04-09 18:01:28 -07:00
Preetha Appan
868f4f19f4
Unit tests for rolling upgrade and killing a leader
2018-04-09 17:42:30 -05:00
Preetha Appan
24203ae2f7
Remove duplicate commit
2018-04-09 15:08:09 -05:00
Preetha Appan
d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID
2018-04-09 14:05:31 -05:00
Michael Schurter
d086f17708
rpc: wrap up old version check in a helper
...
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter
e1cbcf0b3c
rpc: give min rpc version variable a better name
2018-04-09 11:09:05 -07:00
Michael Schurter
88a9409f8e
rpc: only attempt NodeRpc for nodes>=0.8
...
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha
6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
...
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests.
2018-04-04 14:38:15 -05:00
Michael Schurter
b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
...
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar
4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
...
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar
af1b185ce4
Fix flaky deadline tests
2018-04-03 16:51:57 -07:00
Michael Schurter
ba6628a1b6
drain: return on first error
...
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar
2b14371db5
Fix spelling
2018-04-03 15:58:03 -07:00
Alex Dadgar
9617a13a2b
Correctly handle the upgrade path of a node being drained when applying Raft logs
2018-04-03 15:32:44 -07:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Michael Schurter
edc4891283
drain: improve tests and fix spelling
...
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter
6840becf46
drain: refactor batch_future into its own file
...
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter
44a749a7cc
drain: fix double-close panic on drain future
2018-04-02 16:39:18 -07:00
Alex Dadgar
86f9044676
remove generated files
2018-03-30 16:52:49 -07:00
Alex Dadgar
af81349dbe
Generated files
2018-03-30 16:14:40 -07:00
Alex Dadgar
23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
...
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar
7f28cfcdfe
small cleanup
2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo
a77dd08dd9
prevent double close due to error in creating listener
2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo
402a026c88
add further error handling for rpc connection handling
2018-03-30 17:03:36 -04:00
Alex Dadgar
e8809f40dc
Test transistion from both infinite and a future deadline to force
2018-03-30 11:24:39 -07:00
Alex Dadgar
32a673a7e1
Fix force deadline notification
2018-03-30 09:58:29 -07:00
Alex Dadgar
1aa415b0d8
Integration test
2018-03-30 09:33:23 -07:00
Alex Dadgar
dc03fab29b
Canonicalize migrate
2018-03-29 17:42:58 -07:00
Alex Dadgar
e458ab9031
Merge branch 'master' into b-drain-batch
2018-03-29 17:10:34 -07:00
Michael Schurter
62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
...
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
301704091b
Handle upgrade where Node doesn't have eligiblity
...
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar
7d2aae2c11
test handleTaskGroup
2018-03-29 16:38:47 -07:00
Alex Dadgar
049a9213d2
Watch batch jobs
2018-03-29 16:07:51 -07:00
Preetha
9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
...
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar
f12194328c
Integration test for batch complete case
2018-03-29 13:51:04 -07:00
Preetha
81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
...
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan
c8317532ff
Use time from task events if task state does not have FinishedAt set
2018-03-29 14:05:56 -05:00
Alex Dadgar
b194f93f2f
Disallow Update stanza on Batch
2018-03-29 11:28:56 -07:00
Michael Schurter
91b5bb58d9
add HasHealth helper for nil checks
...
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
...
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan
5090fefe96
Filter out allocs with DesiredState = stop, and unit tests
2018-03-29 09:28:52 -05:00
Preetha Appan
8776f4b942
Fix failing test
2018-03-29 07:59:38 -05:00
Preetha Appan
2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic
2018-03-29 07:42:58 -05:00
Alex Dadgar
b18f789020
Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test
2018-03-28 17:25:58 -07:00
Chelsea Holland Komlo
b33d909bf9
add test to assert invalid files return error
2018-03-28 18:31:35 -04:00
Chelsea Holland Komlo
58ada9bc42
return error when setting checksum; don't reload
2018-03-28 18:15:50 -04:00
Chelsea Holland Komlo
2d5af7ff4d
set TLS checksum when parsing config
...
Refactor checksum comparison, always set checksum if it is empty
2018-03-28 09:56:11 -04:00
Michael Schurter
65ddae86f8
Merge pull request #4054 from hashicorp/b-drainer-index-fix
...
drainer: reset index when new job registered
2018-03-27 16:28:25 -07:00
Michael Schurter
79a2781585
Merge pull request #4053 from hashicorp/b-drain-sys-jobs-2
...
drain: fix draining of system jobs
2018-03-27 16:26:45 -07:00
Alex Dadgar
de4b3772f1
Create evals for system jobs when drain is unset
...
This PR creates evals for system jobs when:
* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo
dd5f627feb
set server configuration checksum on reload
2018-03-27 18:03:52 -04:00
Michael Schurter
ec60a1d3e3
drain: improve comments
2018-03-27 14:27:09 -07:00
Michael Schurter
e5dfb7e487
drain: unittest draining node logic
2018-03-27 14:24:01 -07:00
Michael Schurter
a1ed305a24
test: add mock batch and system allocs
...
Since the BatchJob helper had a different task group than the Alloc
helper, it was difficult to create a valid batch alloc.
2018-03-27 14:24:01 -07:00
Michael Schurter
77bddc7941
drain: stop sys jobs after drain completes
...
System allocs should be drained when a node's deadline is hit or when
all other allocs on the node have stopped/migrated.
2018-03-27 14:24:01 -07:00
Michael Schurter
fae77b874b
drainer: reset index when new job registered
2018-03-27 14:12:59 -07:00
Chelsea Holland Komlo
b522a0fadc
fix up to string to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
31557cc44f
move tests to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
6e6d6b7e33
check file contents when determining if agent should reload TLS configuration
2018-03-27 15:42:20 -04:00
Alex Dadgar
59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
...
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar
5dacb057b7
Only track nodes if the conn is from the node
...
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Chelsea Komlo
57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
...
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan
539114124e
Fix too long token test case
2018-03-26 16:28:33 -05:00
Preetha Appan
33e170c15d
s/linear/constant/g
2018-03-26 14:45:09 -05:00
Preetha Appan
7db930b3c3
Extra test case and better error message for ambiguous config
2018-03-26 13:30:09 -05:00
Chelsea Holland Komlo
c2a95f9d7d
add test for upgrading only RPC connections
2018-03-26 10:55:27 -04:00
Preetha Appan
fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set)
2018-03-24 10:29:20 -05:00
Alex Dadgar
39987d5236
Merge branch 'master' into b-acl-name
2018-03-22 14:51:40 -07:00
Michael Schurter
a7f627e34c
eligbile -> eligible
2018-03-21 16:55:22 -07:00
Michael Schurter
a4f346abeb
remove spurious TODOs and FIXMEs
2018-03-21 16:55:22 -07:00
Michael Schurter
9f3086a268
test: must initialize jobResults with new func
2018-03-21 16:51:45 -07:00
Michael Schurter
e432c9af55
test: disable node drainer during tests
...
Node drainer would throw off the index checks
2018-03-21 16:51:45 -07:00
Michael Schurter
5c8c4bce2a
test: disable drain during fsm test
...
drainer was unsetting drain before fsm could read written value
2018-03-21 16:51:45 -07:00
Michael Schurter
341d87aa48
tests: use mock.BatchJob to fix tests
2018-03-21 16:51:45 -07:00
Michael Schurter
8b107acc06
mock: add BatchJob() helper
2018-03-21 16:51:45 -07:00
Michael Schurter
cb61a4bdc7
Fix linting errors
2018-03-21 16:51:45 -07:00
Alex Dadgar
640ebdaef6
fix race in drain integration tests
2018-03-21 16:51:45 -07:00
Michael Schurter
c401d5a098
Refactor assertOps into a helper func
2018-03-21 16:51:45 -07:00
Michael Schurter
187b0e1a48
Remove debug prints
2018-03-21 16:51:45 -07:00
Michael Schurter
f67eca48ac
Deregister garbage collected jobs
2018-03-21 16:51:45 -07:00
Michael Schurter
922842546c
JobNs -> NamespacedID
...
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter
8dc7d9fb6a
drainer: RegisterJob -> RegisterJobs
...
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter
3116897099
Fix deadline heap triggering
...
Chan must be buffered to avoid skipping triggering altogether
Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Alex Dadgar
9d23c965da
fix comment
2018-03-21 16:51:45 -07:00
Alex Dadgar
fb4badf1bc
sharding
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d91b9dfba
Batch drain update
2018-03-21 16:51:44 -07:00
Alex Dadgar
92b636dd32
Fix deadline handling
2018-03-21 16:51:44 -07:00
Michael Schurter
9898edfa90
Switch to drainerv2 impl
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
ad80e655cc
code review
2018-03-21 16:51:44 -07:00
Alex Dadgar
11f9fe4960
spelling fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
bc7385812d
Comments
2018-03-21 16:51:44 -07:00
Alex Dadgar
e87c677a42
handle empty node case
2018-03-21 16:51:44 -07:00
Alex Dadgar
405dab2253
integration test and basic fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
e63bcb474d
Drainer
2018-03-21 16:51:44 -07:00
Alex Dadgar
4754366640
job watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
504bfabb4d
Node's being untracked or having updated deadlines, updates the deadliner
2018-03-21 16:51:44 -07:00
Alex Dadgar
66eaaa6a4d
node watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
527ac0b39d
drain heap
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d4c193a0a
Initial design
2018-03-21 16:51:44 -07:00
Alex Dadgar
33ca319080
System test runs on mac
2018-03-21 16:51:44 -07:00
Alex Dadgar
f8d4a3a9e6
Fix file names
2018-03-21 16:51:44 -07:00
Michael Schurter
32a7649359
refactor main drainloop into 2 more methods
2018-03-21 16:51:44 -07:00
Michael Schurter
5e52f84bb7
drainer: refactor newStopAllocs, applyMigrations
2018-03-21 16:51:44 -07:00
Michael Schurter
62960ed7bd
client: don't monitor health of non-service jobs
...
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
a37329189a
Improve DeadlineTime helper
2018-03-21 16:51:44 -07:00
Michael Schurter
b7c993f0e5
drainer: convert fsm errors to go errors
2018-03-21 16:51:44 -07:00
Michael Schurter
ab0de41884
drainer: factor job & node watchers out of drainer.go
2018-03-21 16:51:44 -07:00
Michael Schurter
5922aef623
Restart every time SetEnabled(true) is called
2018-03-21 16:51:44 -07:00
Michael Schurter
959d447d38
Remove unused context
2018-03-21 16:51:44 -07:00
Michael Schurter
8b41e9b2e1
drainer: drainer should shutdown with server
2018-03-21 16:51:44 -07:00
Michael Schurter
0a17076ad2
refactor drainer into a subpkg
2018-03-21 16:51:44 -07:00
Alex Dadgar
93871c18f8
Fix retaining the drain
2018-03-21 16:51:44 -07:00
Alex Dadgar
010a6b8ca5
Unblock evals once eligible
2018-03-21 16:51:44 -07:00
Alex Dadgar
8289cc3c6f
HTTP and API
2018-03-21 16:51:44 -07:00
Alex Dadgar
0fba0101b6
RPC/FSM/State Store for Eligibility
2018-03-21 16:51:44 -07:00
Alex Dadgar
b3d2346419
Upgrade path
2018-03-21 16:51:43 -07:00
Alex Dadgar
2f5309d82a
Remove update time
2018-03-21 16:51:43 -07:00
Alex Dadgar
0965c9ed28
Fix tests
2018-03-21 16:51:43 -07:00
Alex Dadgar
010228577e
Drain cli, api, http
2018-03-21 16:51:43 -07:00
Alex Dadgar
e459a666ed
Node.Drain takes strategy
2018-03-21 16:49:48 -07:00
Michael Schurter
03d0e5b8a0
improve drain fsm/statestore tests
2018-03-21 16:49:48 -07:00
Michael Schurter
d1ec65d765
switch to new raft DesiredTransition message
2018-03-21 16:49:48 -07:00
Michael Schurter
acf59ee75e
drainer: switch to job based watching
2018-03-21 16:49:48 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter
c0542474db
drain: initial drainv2 structs and impl
2018-03-21 16:49:48 -07:00
Chelsea Komlo
6fc9231dac
Merge pull request #3856 from hashicorp/f-client-add-health-checks
...
Client driver health checks for Docker
2018-03-21 18:05:00 -04:00
Chelsea Holland Komlo
66e44cdb73
Allow TLS configurations for HTTP and RPC connections to be reloaded separately
2018-03-21 17:51:08 -04:00
Preetha
01898b2c25
Merge pull request #4007 from hashicorp/f-show-rescheduling-cli-job-status
...
Show a section on upcoming delayed evaluations when applicable
2018-03-21 14:37:38 -05:00
Chelsea Holland Komlo
f801709a0a
fix issue when updating node events
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
86b7b3d2d9
fix up health check logic comparison; add node events to client driver checks
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d8f68e5ef8
fix up codereview feedback
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c7fd0bd8a1
fix up scheduler mocks
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
c50d02ae93
go style; update comments
2018-03-21 15:15:25 -04:00