Alex Dadgar
d03c881802
small cleanup and logging
2018-04-27 10:36:28 -07:00
Alex Dadgar
da3a552d8d
Fix issue where node connection map wasn't being pruned
2018-04-27 10:16:03 -07:00
Alex Dadgar
35e06ddb31
Remove generated and version bump
2018-04-26 16:49:19 -07:00
Alex Dadgar
43192cefae
generated files
2018-04-26 16:28:58 -07:00
Alex Dadgar
265a6d4f8b
Merge pull request #4224 from hashicorp/b-cron-parse
...
Handle potential panic in cron parsing
2018-04-26 16:22:37 -07:00
Alex Dadgar
05eccb063f
Merge branch 'b-cron-parse' of github.com:hashicorp/nomad into b-cron-parse
2018-04-26 15:51:56 -07:00
Alex Dadgar
ea24513d38
Allow nomad to restore bad periodic job
2018-04-26 15:51:47 -07:00
Chelsea Holland Komlo
ce1c3e0c2d
add unit tests for panic cron parsing bug
...
add comments for cron parsing wrapper
2018-04-26 18:47:08 -04:00
Alex Dadgar
15ad3f94af
Fix command line
2018-04-26 15:46:22 -07:00
Alex Dadgar
dc2907c2c9
Codecgen full package
2018-04-26 15:24:53 -07:00
Alex Dadgar
d0f237086b
UX touchups
2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo
fca0169dbc
handle potential panic in cron parsing
2018-04-26 16:57:45 -04:00
Alex Dadgar
ff7e2b960f
Add test
2018-04-26 13:28:24 -07:00
Alex Dadgar
4a23307baf
Track all client connections
2018-04-26 13:22:09 -07:00
Alex Dadgar
5320205853
Sort signals in implicit constraint
...
Fixes https://github.com/hashicorp/nomad/issues/4212
2018-04-26 10:12:47 -07:00
Alex Dadgar
79844f1d01
Safety guard
2018-04-25 16:00:56 -07:00
Alex Dadgar
d45f39f24e
Fix detecting drain strategy on GC'd node
2018-04-25 16:00:56 -07:00
Nick Ethier
2e6c95f511
Merge pull request #4138 from hashicorp/i-hcl-json-endpoint
...
HCL to JSON api endpoint
2018-04-19 14:18:34 -04:00
Alex Dadgar
eeb85299ff
gofmt -s nomad/structs/structs_test.go
2018-04-17 13:39:32 -07:00
Chelsea Holland Komlo
788b23e17e
add test for node copy
2018-04-17 12:58:07 -04:00
Nick Ethier
31da01856a
command/agent: add HCL mock for parse endpoint
2018-04-16 19:21:09 -04:00
Alex Dadgar
4f2a7b6949
Fix copying drivers
2018-04-16 15:45:51 -07:00
Alex Dadgar
adaf4fa7e0
Remove generated structs
2018-04-12 16:35:31 -07:00
Alex Dadgar
663c4d0433
Version bump and generated files
2018-04-12 16:21:50 -07:00
Preetha
bdc17ebf10
Merge pull request #4139 from hashicorp/b-reschedule-invalid-system-jobs
...
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 20:01:19 -05:00
Preetha Appan
9f84e17bfd
dont print reschedule policy in error message
2018-04-11 17:07:14 -05:00
Preetha Appan
fa90f036c6
Fix more tests
2018-04-11 15:51:24 -05:00
Preetha Appan
81f856e7c9
Fix one more failing test
2018-04-11 15:49:23 -05:00
Preetha
0b6fbb8e16
Merge pull request #4131 from hashicorp/b-rescheduling-fix-gc
...
Update garbage collection logic to make sure allocs with pending evals are not GCed
2018-04-11 15:44:36 -05:00
Preetha Appan
1da4d88f3d
Make test descriptions better
2018-04-11 15:12:23 -05:00
Preetha Appan
a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza
2018-04-11 14:56:20 -05:00
Preetha Appan
688fd9ee37
Update alloc GC eligility logic to not rely on follow up evals
2018-04-11 13:58:02 -05:00
Charlie Voiselle
ba88f00ccb
Changed "til" to "until"
...
Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.
2018-04-11 12:36:28 -05:00
Preetha
dec5b99478
Merge pull request #4120 from hashicorp/b-rescheduling-minimize-evals
...
Batch evals for rescheduling failed allocs correctly
2018-04-10 17:18:35 -05:00
Preetha Appan
59cce1d620
Fix unit test for core scheduler GC
2018-04-10 17:12:06 -05:00
Preetha Appan
7040884002
Simplify and update allocation gc eligibility logic
2018-04-10 16:08:37 -05:00
Preetha
c88fef4c4b
Merge pull request #4127 from hashicorp/b-autopilot-removepeer-fixes
...
Add node id persistence
2018-04-10 16:05:00 -05:00
Preetha Appan
a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust
2018-04-10 15:30:15 -05:00
Preetha Appan
d17bfd8045
Make leader election test run on all three protocol versions
2018-04-10 14:20:02 -05:00
Preetha Appan
b3402efd0b
Adds a new custom description for update alloc triggered evals to make it easier to unit test.
2018-04-10 14:00:07 -05:00
Preetha Appan
6d0e1c9fea
Use preconfigured nodeID if there isn't a persisted node ID, and persist it if its not persisted.
2018-04-10 08:47:33 -05:00
Preetha Appan
216c053742
Remove debug print statements
2018-04-10 08:16:50 -05:00
Alex Dadgar
d179a09b83
WIP: Not setting node id properlperly
2018-04-09 18:01:28 -07:00
Preetha Appan
868f4f19f4
Unit tests for rolling upgrade and killing a leader
2018-04-09 17:42:30 -05:00
Preetha Appan
24203ae2f7
Remove duplicate commit
2018-04-09 15:08:09 -05:00
Preetha Appan
d1cb5df477
Batch evals for rescheduling failed allocs correctly and group them by job ID
2018-04-09 14:05:31 -05:00
Michael Schurter
d086f17708
rpc: wrap up old version check in a helper
...
DRY it up
2018-04-09 11:09:05 -07:00
Michael Schurter
e1cbcf0b3c
rpc: give min rpc version variable a better name
2018-04-09 11:09:05 -07:00
Michael Schurter
88a9409f8e
rpc: only attempt NodeRpc for nodes>=0.8
...
Attempting NodeRpc (or streaming node rpc) for clients that do not
support it causes it to hang indefinitely because while the TCP
connection exists, the client will never respond.
2018-04-09 11:08:06 -07:00
Preetha
6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
...
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan
5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests.
2018-04-04 14:38:15 -05:00
Michael Schurter
b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
...
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar
4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
...
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar
af1b185ce4
Fix flaky deadline tests
2018-04-03 16:51:57 -07:00
Michael Schurter
ba6628a1b6
drain: return on first error
...
If one error is encountered it is unlikely any further attempts will
succeed, so fail fast.
2018-04-03 16:46:35 -07:00
Alex Dadgar
2b14371db5
Fix spelling
2018-04-03 15:58:03 -07:00
Alex Dadgar
9617a13a2b
Correctly handle the upgrade path of a node being drained when applying Raft logs
2018-04-03 15:32:44 -07:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
58a3ec3fb2
Improve Vault error handling
2018-04-03 14:29:22 -07:00
Michael Schurter
edc4891283
drain: improve tests and fix spelling
...
* transistion -> transition
* don't t.Fatal in goroutines
* don't mutate global state
2018-04-02 16:40:47 -07:00
Michael Schurter
6840becf46
drain: refactor batch_future into its own file
...
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Michael Schurter
44a749a7cc
drain: fix double-close panic on drain future
2018-04-02 16:39:18 -07:00
Alex Dadgar
86f9044676
remove generated files
2018-03-30 16:52:49 -07:00
Alex Dadgar
af81349dbe
Generated files
2018-03-30 16:14:40 -07:00
Alex Dadgar
23ec54a372
Merge pull request #4089 from hashicorp/tls-error-fix
...
Check for nil for RPC listener; prevent double closing of listener channel
2018-03-30 16:08:13 -07:00
Alex Dadgar
7f28cfcdfe
small cleanup
2018-03-30 15:49:56 -07:00
Chelsea Holland Komlo
a77dd08dd9
prevent double close due to error in creating listener
2018-03-30 17:15:56 -04:00
Chelsea Holland Komlo
402a026c88
add further error handling for rpc connection handling
2018-03-30 17:03:36 -04:00
Alex Dadgar
e8809f40dc
Test transistion from both infinite and a future deadline to force
2018-03-30 11:24:39 -07:00
Alex Dadgar
32a673a7e1
Fix force deadline notification
2018-03-30 09:58:29 -07:00
Alex Dadgar
1aa415b0d8
Integration test
2018-03-30 09:33:23 -07:00
Alex Dadgar
dc03fab29b
Canonicalize migrate
2018-03-29 17:42:58 -07:00
Alex Dadgar
e458ab9031
Merge branch 'master' into b-drain-batch
2018-03-29 17:10:34 -07:00
Michael Schurter
62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
...
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar
301704091b
Handle upgrade where Node doesn't have eligiblity
...
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Alex Dadgar
7d2aae2c11
test handleTaskGroup
2018-03-29 16:38:47 -07:00
Alex Dadgar
049a9213d2
Watch batch jobs
2018-03-29 16:07:51 -07:00
Preetha
9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
...
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Alex Dadgar
f12194328c
Integration test for batch complete case
2018-03-29 13:51:04 -07:00
Preetha
81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
...
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan
c8317532ff
Use time from task events if task state does not have FinishedAt set
2018-03-29 14:05:56 -05:00
Alex Dadgar
b194f93f2f
Disallow Update stanza on Batch
2018-03-29 11:28:56 -07:00
Michael Schurter
91b5bb58d9
add HasHealth helper for nil checks
...
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Chelsea Komlo
607e631714
Merge pull request #4046 from hashicorp/tls-same-file-reload
...
Check file contents when determining if agent should reload TLS confi…
2018-03-29 10:51:32 -04:00
Preetha Appan
5090fefe96
Filter out allocs with DesiredState = stop, and unit tests
2018-03-29 09:28:52 -05:00
Preetha Appan
8776f4b942
Fix failing test
2018-03-29 07:59:38 -05:00
Preetha Appan
2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic
2018-03-29 07:42:58 -05:00
Alex Dadgar
b18f789020
Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test
2018-03-28 17:25:58 -07:00
Chelsea Holland Komlo
b33d909bf9
add test to assert invalid files return error
2018-03-28 18:31:35 -04:00
Chelsea Holland Komlo
58ada9bc42
return error when setting checksum; don't reload
2018-03-28 18:15:50 -04:00
Chelsea Holland Komlo
2d5af7ff4d
set TLS checksum when parsing config
...
Refactor checksum comparison, always set checksum if it is empty
2018-03-28 09:56:11 -04:00
Michael Schurter
65ddae86f8
Merge pull request #4054 from hashicorp/b-drainer-index-fix
...
drainer: reset index when new job registered
2018-03-27 16:28:25 -07:00
Michael Schurter
79a2781585
Merge pull request #4053 from hashicorp/b-drain-sys-jobs-2
...
drain: fix draining of system jobs
2018-03-27 16:26:45 -07:00
Alex Dadgar
de4b3772f1
Create evals for system jobs when drain is unset
...
This PR creates evals for system jobs when:
* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo
dd5f627feb
set server configuration checksum on reload
2018-03-27 18:03:52 -04:00
Michael Schurter
ec60a1d3e3
drain: improve comments
2018-03-27 14:27:09 -07:00
Michael Schurter
e5dfb7e487
drain: unittest draining node logic
2018-03-27 14:24:01 -07:00
Michael Schurter
a1ed305a24
test: add mock batch and system allocs
...
Since the BatchJob helper had a different task group than the Alloc
helper, it was difficult to create a valid batch alloc.
2018-03-27 14:24:01 -07:00
Michael Schurter
77bddc7941
drain: stop sys jobs after drain completes
...
System allocs should be drained when a node's deadline is hit or when
all other allocs on the node have stopped/migrated.
2018-03-27 14:24:01 -07:00
Michael Schurter
fae77b874b
drainer: reset index when new job registered
2018-03-27 14:12:59 -07:00
Chelsea Holland Komlo
b522a0fadc
fix up to string to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
31557cc44f
move tests to use time.Time
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
003bc209b9
use time.Time for node events for compatibility
2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo
6e6d6b7e33
check file contents when determining if agent should reload TLS configuration
2018-03-27 15:42:20 -04:00
Alex Dadgar
59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
...
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar
5dacb057b7
Only track nodes if the conn is from the node
...
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Chelsea Komlo
57e2cd04bd
Merge pull request #4025 from hashicorp/reload-http-tls
...
Allow TLS configurations for HTTP and RPC connections to be reloaded …
2018-03-26 18:00:30 -04:00
Preetha Appan
539114124e
Fix too long token test case
2018-03-26 16:28:33 -05:00
Preetha Appan
33e170c15d
s/linear/constant/g
2018-03-26 14:45:09 -05:00
Preetha Appan
7db930b3c3
Extra test case and better error message for ambiguous config
2018-03-26 13:30:09 -05:00
Chelsea Holland Komlo
c2a95f9d7d
add test for upgrading only RPC connections
2018-03-26 10:55:27 -04:00
Preetha Appan
fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set)
2018-03-24 10:29:20 -05:00
Alex Dadgar
39987d5236
Merge branch 'master' into b-acl-name
2018-03-22 14:51:40 -07:00
Michael Schurter
a7f627e34c
eligbile -> eligible
2018-03-21 16:55:22 -07:00
Michael Schurter
a4f346abeb
remove spurious TODOs and FIXMEs
2018-03-21 16:55:22 -07:00
Michael Schurter
9f3086a268
test: must initialize jobResults with new func
2018-03-21 16:51:45 -07:00
Michael Schurter
e432c9af55
test: disable node drainer during tests
...
Node drainer would throw off the index checks
2018-03-21 16:51:45 -07:00
Michael Schurter
5c8c4bce2a
test: disable drain during fsm test
...
drainer was unsetting drain before fsm could read written value
2018-03-21 16:51:45 -07:00
Michael Schurter
341d87aa48
tests: use mock.BatchJob to fix tests
2018-03-21 16:51:45 -07:00
Michael Schurter
8b107acc06
mock: add BatchJob() helper
2018-03-21 16:51:45 -07:00
Michael Schurter
cb61a4bdc7
Fix linting errors
2018-03-21 16:51:45 -07:00
Alex Dadgar
640ebdaef6
fix race in drain integration tests
2018-03-21 16:51:45 -07:00
Michael Schurter
c401d5a098
Refactor assertOps into a helper func
2018-03-21 16:51:45 -07:00
Michael Schurter
187b0e1a48
Remove debug prints
2018-03-21 16:51:45 -07:00
Michael Schurter
f67eca48ac
Deregister garbage collected jobs
2018-03-21 16:51:45 -07:00
Michael Schurter
922842546c
JobNs -> NamespacedID
...
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter
8dc7d9fb6a
drainer: RegisterJob -> RegisterJobs
...
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter
3116897099
Fix deadline heap triggering
...
Chan must be buffered to avoid skipping triggering altogether
Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Alex Dadgar
9d23c965da
fix comment
2018-03-21 16:51:45 -07:00
Alex Dadgar
fb4badf1bc
sharding
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d91b9dfba
Batch drain update
2018-03-21 16:51:44 -07:00
Alex Dadgar
92b636dd32
Fix deadline handling
2018-03-21 16:51:44 -07:00
Michael Schurter
9898edfa90
Switch to drainerv2 impl
2018-03-21 16:51:44 -07:00
Alex Dadgar
7b2bad8c5e
Toggle Drain allows resetting eligibility
...
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar
ad80e655cc
code review
2018-03-21 16:51:44 -07:00
Alex Dadgar
11f9fe4960
spelling fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
bc7385812d
Comments
2018-03-21 16:51:44 -07:00
Alex Dadgar
e87c677a42
handle empty node case
2018-03-21 16:51:44 -07:00
Alex Dadgar
405dab2253
integration test and basic fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar
e63bcb474d
Drainer
2018-03-21 16:51:44 -07:00
Alex Dadgar
4754366640
job watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
504bfabb4d
Node's being untracked or having updated deadlines, updates the deadliner
2018-03-21 16:51:44 -07:00
Alex Dadgar
66eaaa6a4d
node watcher
2018-03-21 16:51:44 -07:00
Alex Dadgar
527ac0b39d
drain heap
2018-03-21 16:51:44 -07:00
Alex Dadgar
2d4c193a0a
Initial design
2018-03-21 16:51:44 -07:00
Alex Dadgar
33ca319080
System test runs on mac
2018-03-21 16:51:44 -07:00
Alex Dadgar
f8d4a3a9e6
Fix file names
2018-03-21 16:51:44 -07:00
Michael Schurter
32a7649359
refactor main drainloop into 2 more methods
2018-03-21 16:51:44 -07:00
Michael Schurter
5e52f84bb7
drainer: refactor newStopAllocs, applyMigrations
2018-03-21 16:51:44 -07:00
Michael Schurter
62960ed7bd
client: don't monitor health of non-service jobs
...
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00